The Semantic Web

The Semantic Web (a.k.a Web 3.0) is the next generation of Web technologies that allows machines to process Web information more efficiently. Currently, much of the Web information does not carry semantics, that means that machines are unable to understand the meaning of your query and the data.

In the Semantic Web, the information is encoded using the RDF language which expresses the semantics using statements. Basically, it imposes that the information is represented as a set of statements formed by three parts: a subject, a predicate and an object. An example of RDF statement is:

<http://www.vu.nl> <http://www.w3.org/1999/02/22-rdf-syntax-ns#type> <http://dbpedia.org/resource/University>

With such statement, we say that the VU is an instance of the class University represented by the last URI.

Without going into more details, you have to imagine the Semantic Web as a very large graph, where information from various sources is connected with “meaningful” links. For example, information that comes from historical archives can be linked with geographic information, so that machines are able to enrich the results of the query with relevant results.

In the image above you can see a visual representation of the Semantic Web as it is in 2010. As you can see, data from many different websites is linked together. The size of the Semantic Web is growing at a very fast rate. Many institutions, like government agencies, or companies, like Yahoo! or Google, are encouraging the adoption of these technologies. For example, consider the new feature of Facebook called opengraph, that allows external websites being linked on Facebook: here, we use RDF to express the webpage content.

Currently, the size of the Semantic Web is estimated to be in the order of dozen of billions of statement. Obviously, we cannot process such large amount of data with sequential programs, but we must research efficient parallel programming models that are able to deal with a very large input like this.