Bibliography Management using RSS Technology (BuRST)

Draft 14 May 2005

Editors:
Peter Mika (Department of Computer Science, Vrije Universiteit Amsterdam)

Abstract

BuRST is a lightweight specification for publishing bibliographic information using RSS 1.0 and bibliography-related metadata standards. This specification does not define any additional vocabulary (except for a single element introduced for technical reasons), but rather describes an agreement over how to use existing vocabularies in combination, in order to increase interpretability among software that publish and consume bibliographic data in a web-based setting.

Status of this document

Initial version for comment.

Comments are welcome at pmika@cs.vu.nl.


1. Introduction

This document describes the Bibliography Management Using RSS Technology (BuRST) format for publishing bibliographic information using RSS 1.0. In terms of the RSS specification, we define an RSS module for bibliographic items using the SWRC and FOAF ontologies. When necessary, we also make recommendations over the specific usage of RSS and Dublin Core terms in combination with this module.

This specification is intended to be minimal in that beyond these recommendations it does not put any constraints on the use of elements not mentioned herein ("whatever is not forbidden is allowed").

This specification is intended to be compliant with the following versions of the above mentioned specifications:

2 Use case

The envisioned use case of this specification is the sharing of bibliographic data in a distributed (web-based) environment. In this setting, each researcher or organization independently maintains a set of bibliographic items that it wishes to make available for others (publish).

The primary task in such a scenario is the discovery of publications relevant to one's research interests. Retrieving bibliographic information, i.e. information relating to a particular publication, is a secondary task.

It is expected that each researcher or organization would primarily maintain information relating to their own publications, but this is not necessarily the case. Thus several researchers might have information about the same publication. This means that the aggregation (merging) of information from different sources needs to be supported.

3. Technical solution

BuRST addresses the requirements of the above use case by leveraging RDF technology. In particular, BuRST documents are RDF documents using the vocabularies of the RSS 1.0, DC, FOAF, and SWRC ontologies. As all of these ontologies are based on RDF, it is possible to combine information expressed in these ontologies on the level of RDF statements.

NOTE the FOAF and SWRC ontologies are expressed partly in OWL (the Web Ontology Language of the W3C). The combination of these ontologies with RDF-based ontologies result in ontologies that are in the OWL Full class of ontologies (species).

BuRST documents are intended to be stored in a distributed manner. The use of semantic technology facilities the efficient discovery and retrieval of publications by reducing ambiguity. The use of semantics also facilitates the merging of bibliographic information from multiple information sources.

In comparison to P2P systems based on semantic technology (such as the Bibster system), the choice for web technology has the advantage of decoupling data and services. This specification only defines a data format that should enable the development of interoperable services. However, it does not specify what these services should be.

3.1 Background

RSS is widely used on the Web for publishing a set of items belonging to channel. RSS was originally conceived for publishing blog-entries (items) posted to a blog (channel), along with simple metadata about the channel and the items (creator, link). However, RSS has been since re-purposed for many other applications, including the sharing of news headlines, discussion forums, software announcements, and various bits of proprietary data. (http://web.resource.org/rss/1.0/spec) Unfortunately, the development of RSS diverged into a number of incompatible versions, all based on plain XML (except for the RDF-based RSS 1.0 standard referenced here). For an overview of RSS, we recommend reading http://sw.deri.ie/svn/sw/2005/04/blogs_and_the_semantic_web/blogs_and_the_semantic_web.pdf.

Despite incompatibilities, a variety of web services, software tools and APIs are available for reading and aggregating RSS feeds (RSS documents), taking care of the relatively small differences between versions. Nevertheless, for backward (and forward) compatibility, it has been a concern in the development of RSS 1.0 to remain as close to plain XML as possible, and many parsers still interpret RSS 1.0 on the XML level.

Dublin Core (DC) is general purpose vocabulary for providing metadata about any kind of web resource. DC is also used in combination with RSS 1.0, as mandated by the RSS 1.0 specification. For more information, please see the website of the Dublin Core Metadata Initiative.

FOAF is widely used ontology on the Web for describing personal profiles and social networks, the information that is typically describe on a personal home page. FOAF documents are typically linked from a user's homepage and linked together through the RDFS seeAlso relationship. This interlinkage means that FOAF profiles can be discovered in a decentralized manner by crawling. Several blogsites also produce FOAF information. Services for FOAF are typically geared toward visualizing communities.

SWRC is an ontology originating from the Semantic Web community project OntoWeb, which can be used to provide detailed information about publications, with properties equivalent to those available in BibTex format. Although there is very little data and services related to SWRC, its close mapping of the BibTex format makes it appealing to use in an academic context. A BibTex-2-RDF (SWRC) translator has been made available by Michel Klein.

3.2 BuRST documents

A BuRST document is a valid RSS 1.0 document (also known as an RSS feed) where a single channel is defined, consisting of a set of items, each referencing a single publication. Note that Dublin Core is a standard RSS module, i.e. the general use of Dublin Core in combination with RSS is defined by the RSS specification. Here, we only detail the particular interpretation of DC terms applied to channels and items defined in a BuRST document.

We assign no explicit meaning to the rss:channel element. The motivation of publishing a set of items together as a channel may differ, although we expect that in most practical uses the publications on a channel are related by either a shared topic or common authorship. Textual explanation regarding the content of the channel should be given using the rss:title and rss:description elements, while the rss:image element can be used to assign an image to the channel. Per RSS, the value of the rss:link element should point to an HTML rendering of the channel.

NOTE as opposed to convention, RSS uses lowercase labels for class names. RSS uses some resources as both properties and classes. This is the case, for example, with the rss:image resource, which is used as a class to define images, but also as a property to link those images to a channel.

IMPORTANT the URI of the channel must be unique to the generating agent of the channel. (This is to avoid that different agents generate channels with the same URI.) We recommend that the URI is chosen from a domain controlled by the agent that generated the document.

In case a processing agent finds evidence that the unique identifier assumption is violated (such as two or more values for the rss:items property), the agent should ignore the contents of the channel.

The DC vocabulary should be used to provide metadata about the channel. It is recommended to use the dc:creator property to identify the generating agent of the rss:channel object and the dc:dateCreated element (alternatively, the more general dc:date element) to identify the date of the creation of the channel.

NOTE Per DC, the recommended best practice for encoding the date value is defined in a profile of ISO 8601 [W3CDTF] and includes (among others) dates of the form YYYY-MM-DD.

NOTE that the generating agent of the channel and the generating agent of the item(s) may be different. For example, this is the case of a collection of publications of an institution, where the individual descriptions of publications are provided by individual members of the institution. Both of these can be different from the author(s) of the publication(s) described.

The definition of the channel should be followed by the definition of rss:item instances. BuRST processors should ignore items that are referenced in the channel description, but not defined in the document or vice versa.

In the context of BuRST, an rss:item instance corresponds to a unique description of a single publication by a single agent. Again, a unique identifier assumption is required to avoid clashes in cases where multiple agents would like to describe the same publication, possibly within the same channel.

IMPORTANT The URI of the item must be unique to the generating agent of the item, to the publication and to the time of annotation. Uniqueness to the agent and publication are required to avoid the situation where the same or different agents use the same URI for the same publication or different publications are described using the same URI by the same or different agents. Again, it is recommended that the URI is in a domain owned by the generator agent of the item. in particular it must not be equal to the URL of the publication described. Uniqueness to the time of annotation is required to support updates to annotations.
In case a processing agent finds evidence that the unique identifier assumption is violated (such as two distinct values for the dc:creator property), it should ignore the item and it may choose to ignore the entire channel.

When applied to an item, the rss:title element should contain the title of the publication, while the rss:link element should point to the URL of the publication. The recommended use of the rss:description element is to contain an abstract of the publication, when available. The suggested length for item descriptions is between 1-500 characters in length.

NOTE the rss:link property takes a Literal as value

The DC vocabulary should be used to provide metadata about the item in order to support syndication. In this context, all properties should refer to the rss:item (corresponding to a description of a publication), not the publication itself. For example, the dc:creator property should identify the generating agent of the rss:item (NOT the author of the publication) and the dc:dateCreated element (alternatively, the more general dc:date element) should identify the date of the publishing of the item on the channel (NOT the publication date of the publication).

The burst:publication property should be used to relate the rss:item instance to a single instance of the swrc:Publication class. This property is the only property introduced in this specification. The full URI of this property is http://xmlns.com/burst/1.0/publication. The domain of this property is rss:item and the range is swrc:Publication. The burst:publication property is functional, i.e. every rss:item can be associated with at most one publication.

NOTE Processing agents should ignore other usage of this property.

The SWRC ontology should be used to provide metadata about the publication to be described.

NOTE The URI of the publication instance needs to follow the same conventions as for rss:item, i.e. uniqueness with respect to the generating agent, the publication and the creation date. It is strongly recommended to represent publications as blank nodes (bnodes) to avoid identifier clashes.

We recommend the use of the FOAF vocabulary to provide detailed descriptions of persons related to publications (authors, editors). This is to facilitate the semantic-based merging of bibliographic items.

NOTE For backward and forward compatibility, every RSS 1.0 document should remain as close as possible to the structure of XML-based versions of RSS, in particular the DTD of RSS 0.9. This compatibility is accomplished by the assumption and stipulation that basic RSS parsers, modules, and libraries ignore what they weren't designed to understand. Thus it is no recommended to use the flexibility of RDF to deviate from RSS 0.9.

For example, typed elements should be used instead of rdf:type statements. Use

<item rdf:resource='http://purl.org/net/dajobe/' >
...
</item>

instead of

<rdf:Description rdf:resource='http://purl.org/net/dajobe/'>
<rdf:type rdf:resource='http://purl.org/rss/1.0/item'/>
...
</rdf:Description>

Similarly, the use of property attributes should be avoided: Use

<item rdf:about='http://www.cs.vu.nl/~pmika/BuRST#2'>
<title>Bootstrapping the FOAF-Web: An Experiment in Social Network Mining</title>
</item>

instead of

<item rdf:about='http://www.cs.vu.nl/~pmika/BuRST#2' title=' Bootstrapping the FOAF-Web: An Experiment in Social Network Mining' />

4. Example

An example BuRST file can be also downloaded (RSS 1.0). You can also look at a pretty-printed (HTML) version. (Pretty-printing seems to work better in Mozilla Firefox than Internet Explorer.) This file is valid RSS 1.0 according to the following validators:

5. Notes

RSS feeds should be served as application/rss+xml (RSS 1.0 is an RDF format, so it may be served as application/rdf+xml instead).

5.1 Telling People About Your BuRST feed

An important step after publishing a feed is letting your viewers know that it exists; there are a lot of feeds available on the Web now, but it's hard to find them, making it difficult for viewers to utilize them.

Pages that have an associated RSS feed should clearly indicate this to viewers by using a link containing like 'RSS feed'. For example,

<a type='application/rss+xml' href='feed.rss'>RSS feed for this page</a>

where 'feed.rss' is the URL for the feed. the 'type' attribute tells browsers that this is a link to an RSS feed in a way that they understand.

Additionally, some programs look for a link tag in the <head> section of your HTML. To support this, insert the tag as follows:

<head>
<title>My Page</title>
<link rel='alternate' type='application/rss+xml' href='feed.rss' title='RSS feed for My Page'>
</head>

These links should be placed on the Web page that is most similar to the feed content; this enables people to find them as the browse.

Finally, there are a number of guides and registries for RSS feeds that people can search and browse through, much like the Yahoo directory for Web sites; it's a good idea to register your feed. See "Related Resources" for more information.

Further useful information: http://www.mnot.net/rss/tutorial/#RSS09x

5.3 Services for BuRST

A number of tools and publicly accessible Web services for BuRST have been made available by the Department of Computer Science at the Vrije Universiteit, Amsterdam.