How to Make a Semantic Web Browser
Two important architectural choices underlie the success of the Web: numerous, independently operated servers speak a common protocol, and a single type of client—the Web browser—provides point-and-click access to the content and services on these decentralized servers. However, because HTML marries content and presentation into a single representation, end users are often stuck with inappropriate choices made by the Web site designer of how to work with and view the content. RDF metadata on the Semantic Web does not have this limitation: users can gain direct access to the underlying information and control how it is presented for themselves. This principle forms the basis for our Semantic Web browser—an end user application that automatically locates metadata and assembles point-and-click interfaces from a combination of relevant information, ontological specifications, and presentation knowledge, all described in RDF and retrieved dynamically from the Semantic Web. With such a tool, naïve users can begin to discover, explore, and utilize Semantic Web data and services. Because data and services are accessed directly through a standalone client and not through a central point of access (e.g., a portal), new content and services can be consumed as soon as they become available. In this way we take advantage of an important sociological force that encourages the production of new Semantic Web content by remaining faithful to the decentralized nature of the Web. Categories and Subject Descriptors
The World Wide Web revolutionized the Internet by providing a number of mutually reinforcing capabilities. HTTP offered a simple standard by which information could be fetched from any Web server. HTML provided a uniform syntax in which publishers could present information that would be rendered in human-readable form in a Web browser. And URLs gave a way for any Web page to refer to any other Web page, regardless of its location. Taken together, these capabilities meant that a lay user could seamlessly browse the entire space of Web information, viewing information without concern for location and using a simple point and click interface to navigate from any Web page to others that it referenced. Though substantial, the powerful information access capability engendered by the Web has its limitations. Through its use of HTML and HTTP servers, the Web demands the production of content already formatted for presentation in a particular human-readable fashion. Implicit is the idea that a publisher will be able to figure out the right way to present its information to end users. It should be clear, however, that the information consumer will often have the best sense of what is important in the fetched information and how best to make use of it.1 Every Web browser offers its user some limited ability to override presentation characteristics such as the font and font size. Stronger evidence of the need for clients to control the presentation of information can be seen in the development of HTTP content negotiation standards, in which the client describes its capabilities and hopes that the server will deliver something that can be presented reasonably , as well as the Web Accessibility Initiative’s attempt to develop guidelines for crafting presentations of Web pages so that they can be used by people with disabilities . Finally, efforts such as NewsIsFree show that means as extreme as screen scraping are employed in order to enable Web site content to be viewed in alternate ways (in this case, as RSS news feeds in news tickers, news alert tools, etc.) . The Semantic Web offers a particularly extreme example of differently- abled clients: nonhumans. In the Semantic Web vision, autonomous agents will be able to pull information from the Web and manipulate it on behalf of their users. HTML is clearly a terrible data presentation language for such applications—its visual markup hides the semantic content that agents actually care about. This problem has motivated the development of RDF, a language for describing semantic information in a machine-readable form without the distraction of presentation markup.