Walk through the components of an RSS 2.0 feed for building your own feeds either by hand or programmatically.
RSS 2.0 is the latest version of the development tree of RSS. It passes through the UserLand version of RSS 0.91 and RSS 0.92. In common with RSS 0.9x, RSS 1.0 avoids the use of RDF. It seems reasonable to conclude that the avoidance of RDF is partly a result of Dave Winer's preference for simplicity and his assumption that RSS feeds transmit information of largely transitory interest. As discussed in Chapter 11 "RDF: The Resource Description Framework" of Beginning RSS and Atom Programming (Wrox, 2005, ISBN: 0-7645-7916-9), those who see information feeds as conduits for disposable information don't see the value of using RDF in a feed, because metadata is unimportant for those developers who see information feeds as containing disposable, transitory information.
The changes in RSS 2.0 from RSS 0.92 are, with one exception, fairly minor. For example, there are a few new elements and a few changes regarding what particular elements should contain. The one substantive change in RSS 2.0 is the use of XML namespaces. Using XML namespaces opens up possibilities for extending RSS 2.0 using modules.
The RSS 2.0 specification is located at http://blogs.law.harvard.edu/tech/rss.
XML Namespaces in RSS 2.0
RSS 2.0 uses the XML namespaces technique specified in the Namespaces in XML recommendation, located at www.w3.org/TR/REC-xml-names/.
Because versions 0.91 and 0.92 of RSS don't use XML namespaces, all the elements associated with those specifications are, inevitably, not in any XML namespace. Therefore, unlike the situation with RSS 1.0 where a namespace URI is defined, in RSS 2.0 all the RSS elements are in no namespace.
The availability of XML namespaces allows RSS 2.0 documents to use elements from other namespaces, provided that an appropriate namespace declaration has been made. For example, you can use the Dublin Core module described in Chapter 10 "RSS 1.0 Modules" of Beginning RSS and Atom Programming.
New Elements in RSS 2.0
There are several new elements in RSS 2.0. Each is briefly described in the following list. The use of these new elements is described in more detail in the discussion on RSS 2.0 document structure.
- author:An optional child element of the item element
- comments: An optional child element of the item element
- generator: An optional child element of the channel element
- guid: An optional child element of the item element
- pubDate: An optional child element of the item element
- ttl: An optional child element of the channel element
The RSS 2.0 Document Structure
The RSS 2.0 document structure has many similarities to the structure of RSS 0.91 and RSS 0.92 documents.
The rss Element
The rss element is the document element of an RSS 2.0 document. It has a required version attribute with a value of 2.0. Supposedly, RSS 0.91 and 0.92 documents are legal RSS 2.0 documents but they have a different value for the version attribute. In practice this works, despite the different and theoretically illegal values in the version attribute in RSS 0.91 and 0.92 documents, because many aggregators ignore the value in the version attribute.
If you are writing an RSS 2.0 document, the start tag of the rss element should be written as follows:
You can also write it using single quote marks:
The rss element has a single child element, the channel element. All content of the information feed document is contained in the channel element.
Notice that there is no namespace declaration on the preceding rss start tag. For consistency with RSS 0.91 and 0.92, the elements of RSS 2.0 are in no namespace. This has the advantage of backwards compatibility but does mean there is a risk of naming collisions. In practice, the risk of naming collisions is slight because most non-RSS 2.0 elements likely to be found in an RSS 2.0 document are in a namespace, which allows the aggregator or other user agent to distinguish those elements from RSS 2.0 elements.
More RSS 2.0 Elements: The channel Element
The channel element is the only permitted child element of the rss element. The channel element has no attributes. The remainder of an RSS 2.0 document consists of child elements or descendant elements of the channel element.
The following child elements of the channel element are required in all RSS 2.0 documents.
- title: Contains the name that refers to the information feed. If the information feed refers back to a Web site or blog, the value of the title element is typically the name of that site or blog.
- link: Contains a URL that allows linking to the Web site or blog that's associated with the information feed.
- description: Contains a brief description of the information feed.
A minimalist RSS 2.0 document would therefore look like the following document:
<title>Reflecting on Microsoft</title>
<description>The Reflecting on Microsoft blog discusses
issues relating to specific Microsoft products
as well as the much larger issue of
the competition between the proprietary
and open-source approaches to software
This document is of little value in an aggregator because it contains no item elements. Surprisingly, the item element is optional in RSS 2.0 although, in practice, a typical RSS 2.0 document will have several.
The following elements are optional child elements of the channel element. Some elements, which have their own child elements, are discussed further following the list.
- category: This element contains information about the categories of information contained in the information feed. There can be several category elements as child elements of a channel element.
- cloud: This element has several attributes that contain information specifying how a connection can be made to a cloud, allowing subscription to an information feed to be always up to date.
- copyright: This contains copyright information relating to the feed.
- docs: This contains a URL pointing to the RSS 2.0 specification.
- generator: This contains information about the software that was used to produce the information feed.
- image: This contains information so an aggregator can locate an image (in GIF, JPEG, or PNG format) to display in connection with the information feed.
- language: This contains a two-letter language code, with optional extensions. Example values are en and en-us.
- managingEditor: This contains the e-mail address of the contact for queries about editorial content.
- pubDate: This contains the publication date for the feed.
- rating: The PICS (Platform for Internet Content Selection) rating for the channel.
- skipDays: This contains information indicating to an aggregator the days of the week when a feed is not expected to be updated.
- skipHours: This contains information indicating to an aggregator the hours when a feed is not expected to be updated.
- textinput: This displays a text box to allow the users to input information for processing on a server, typically (if the textinput element is present) on the server from which the feed originates.
- ttl: This contains information about the period of time before the aggregator should check for new content.
- webMaster: This contains the e-mail address of the contact for queries about technical issues relating to the information feed.
Several of the previous elements are shown in the example RSS 2.0 document later in this chapter.
The image Element
The image element specifies an image that can be displayed along with the channel in an aggregator or other user agent. The image element has the following required child elements:
- link: The value of this element is a URL representing the feed or Web site.
- title: This describes the image. If the feed is being rendered as HTML, the content of the title element may be used as the value of the alt attribute of the img element in HTML/XHTML.
- url: The content of this element is a URL that specifies the location from which the image can be retrieved.
The link element, a child of the image element, is required, although it seems simply to duplicate the content of the link element child of the channel element. The RSS 2.0 specification is not clear about the consequences should these two link elements contain different URLs.
There are three optional child elements of the link element:
- description: This contains a short description of the image. The specification suggests that it be used in the title attribute of the link in the corresponding HTML.
- height: This contains the height of the image in pixels.
- width: This contains the width of the image in pixels.
The cloud Element
The cloud element is a child element of the channel element. The attributes of the cloud element are used to specify a Web service that implements the rssCloud interface. A useful way to look on a cloud is as a Web application. A cloud acts as a central coordinator for subscriptions to an information feed. Instead of an aggregator polling a server at specified intervals (often hourly) the cloud (the coordinator) informs subscribed users when a change has taken place.
A cloud element would appear similar to the following markup:
<cloud domain="rpc.sys.com" port="80" path="/RPC2"
registerProcedure="myCloud.rssPleaseNotify" protocol="xml-rpc" />
The textinput Element
The textinput element allows a user to enter text to be sent to a server-side process, such as a CGI script. Some people question the appropriateness of the textinput element, seeing that such functionality belongs more appropriately inside an individual Web page.
- description: This contains a short description of the text input area.
- link: This contains a URL which specifies a server-side process, for example a CGI script, to which the text entered by the user is sent.
- name: This contains a name for the text in the text input area.
- title: This contains the label for the submit button associated with the text input functionality.
The item Element
The item element may occur any number of times in an RSS 2.0 information feed document. Its child elements are described in the following list. The specification is unclear about whether or not these child elements are required. In practice, you can use which child elements you want and omit those you don't. There are theoretically some situations in which you could be in conflict with the wording of the RSS 2.0 specification but this won't arise with real-world items with a title and at least some content.
- author: This contains an e-mail address for a person with responsibility for authoring the content of the item.
- category: An item element can have multiple category element children. The content of the category element is information about a category into which the content of the item may be assigned. Each category element has an optional domain attribute, the value of which may specify a taxonomy to which the content of the item belongs. For example, in an item about XML the domain might be "markup languages."
- comments: This contains a URL of a Web page where a user can enter comments about the item.
- description: This contains a summary of the item or, in the case of items with a relatively small amount of text, might contain the full text of the item.
- enclosure: This contains information specifying a media object associated with the item. This is an empty element with three attributes. The url attribute contains a URL from which the media object can be retrieved. The length attribute specifies the size of the serialized object in bytes. The type attribute specifies the media type of the object.
- guid: This contains a value that uniquely identifies the item. The RSS 2.0 specification does not specify rules intended to achieve uniqueness. One typical approach is to use a URL from which the item can be retrieved. In that situation the guid element is likely to have an isPermaLink attribute with a value of true.
- link: This contains a URL that can be used to retrieve the full text of the item. When the item contains its full text in the description element, the link element is optional; otherwise, it is required.
- pubDate: This contains information about when the item was published. It includes both date and time components.
- source: This contains information about the channel (perhaps on another site) that the item originally came from. It has a url attribute that contains the URL for the source information feed. The content of the source element is, typically, the title of the feed.
- title: This contains a title for the item.
An example item element is shown in the following example RSS 2.0 document.
An Example RSS 2.0 Document
Having looked at the individual parts of the document structure of an RSS 2.0 document, you can now take a look at a sample RSS 2.0 document that happens to contain my first author blog post on Wrox.com.
<?xml version="1.0" ?>
<title>Wrox P2P Blogs - Andrew Watt</title>
<description>Wrox.com P2P Community Blogs</description>
<copyright>Copyright (c) 2000-2004 by
John Wiley & Sons, Inc. or related companies.
All rights reserved.</copyright>
<title>Wrox P2P Blogs - Andrew Watt</title>
<title>Firefox 1.0 is available</title>
<description>Firefox 1.0 is available now for download from <a
</a>.<br /><br />It downloaded quickly
for me, although that could change as the servers
get busier, and it installed smoothly. <br />
<br />If you haven't already spotted the new
functionality to add a live RSS or Atom feed to your Firefox
bookmarks using the button at the extreme bottom
right of the Firefox window give it a go....</description>
<pubDate>Tue, 9 Nov 2004 12:01:11 GMT</pubDate>
The example document contains only one item and it does not use all of the many optional elements that the RSS 2.0 specification allows. Hopefully, it will give you an impression of what a simple RSS 2.0 document is like.
RSS 2.0 Extensions
The RSS 2.0 specification does not say much about extensions. All extension elements must be in a namespace (all RSS 2.0 elements are in no namespace). It is not clearly specified that extension elements can be inserted anywhere in an RSS 2.0 information feed document, but this seems to be the most likely meaning of the specification.
The blogChannel RSS Module
In the month following the release of the RSS 2.0 specification, Dave Winer issued a document relating to the blogChannel RSS module. The document is located at http://backend.userland.com/blogChannelModule. It is intended to relate to the context of a blog which has an associated information feed.
A typical namespace declaration for the blogChannel namespace is:
The following elements are in the blogChannel module:
OPML, Outline Processor Markup Language, is an XML language that can express the structure of an outline. In the context of information feeds, an OPML document is often used to contain a list of information feeds to which a blogger subscribes, the so-called blogroll.
- blink: This contains a URL that links to a blog that the author of the information feed wants to promote in some way.
- blogRoll: This contains a URL that specifies the location of an OPML (Outline Processor Markup Language) file containing the blogroll for the information feed.
- changes: This contains a URL that specifies the location of a changes.xml file. The idea behind the changes file is that bandwidth use may be reduced.
- mySubscriptions: This contains a URL that specifies the location of an OPML file containing the subscriptions of the blog author.
Whether or not an extension module is supported can vary from one aggregator tool to another. A quasi-official list of RSS 2.0 extensions. Examples include Danny Ayer's Simple Semantic Resolution module and Joe Gregorio's Comment API.
This article is adapted from Professional .NET Framework 2.0 by Andrew Watt (Wrox, 2006, ISBN: 0-7645-7916-9), from Chapter 12, "RSS 2.0: Really Simple Syndication."
Copyright 2007 by WROX. All rights reserved. Reproduced here by permission of the publisher.