10 Years of Semantic Web:

does it work in theory?

Keynote at ISWC 2011,

Frank van Harmelen

Original PowerPoint source without the script is here.

Video registration of the talk (including synchronised slides) are at VideoLectures.net.

Duck & Birdie

Apology for wrong title

Jeff Naughton slide

Philosophical confession

Telescope slide

I do not believe that scientific knowledge is just a mental or social construction, and that our scientific laws have only relative and subjective value.

Laws about the Information Universe

Distorted Mirror slide

Physics slide

Alchemy slide

So, the central question that I will boldly (and perhaps rather foolishly tackle in the rest of this talk is this one:

Question slide

Did a decade of Semantic Web work help to discover any Computing Science laws?

What have we built over the past 10 years

So let's first take a look at what we actually built in the past decade.

We can characterise what we have built over the past 10 years in 3 parts:

Babel Towers slide

  1. We built a whole lot of vocabularies (including the languages to represent them, the tools to construct and deploy them, etc)

Naming slide

  1. We built a whole lot of URI's to name lots of things in the world, in fact, many billions of URI's

Neural Network slide

  1. We connected all of these in a very large network

Engineer slide

But all of these have been mostly treated as one very large engineering exercise.

And it's obvious that as engineers we have succeeded.

Now, remember the goal of this talk is:

10 years experiment

So what I'm going to do now, is to treat the past 10 years of SemWeb engineering as one giant experiment:

So take that as a giant experiment and ask the question:

If we would build the Semantic Web again, surely some things would end up looking different, but are there things that would end up looking the same, simply because they have to be that way?

for example

So, let's see if we can discover any of such laws, such stable patterns that we would rediscover by necessity every time we ran the experiment.

Now, fortunately, we don't have to start from scratch. Some well known laws of Computer Science already can be seen to apply to our 10 year experiment as well. I'll give you two examples:

Zipf law

We know from our 10 year long experiment that our datasets also obey Zipf's law, and this has been well documented in a number of empirical studies.

It's important to realise that knowing Zipf's law helps us deal with the phenomenon, both in the cases where it's a blessing (so we can exploit it) and in the cases where it's a curse (so that we can try to avoid it).

that's why it is worth trying to discover these laws.

Here's a second well known law from Computer Science:

Use vs Re-use

Another known law also applies:

OK, so now I'll start proposing some "laws" that originate from our own field, and from our own 10 year experiment:

Factual knowledge is a graph

the dominant life-forms in our information space is the graph.

Now this may sound obvious to us in this community, but stating that factual knowledge is a graph is not obvious at all.

For example, if you would ask this question to a DB person, they'd say: factual knowledge is a table. And a logician would say: knowledge is a set of sentences.

I know that you can convert one form into the other

but that's a bit beside the point: just because all our programming language are Turing complete doesn't mean that there aren't very real and important differences between them.

So in the same way, graphs, tables and sets of sentences are all really different representations, even with the theoretical transformations.

So let's switch to a less controversial law;

Terminological knowledge is a hierarchy

And this observed repeated invention, makes this a much stronger law.

So to say: this experiment has already been rerun many times in the history of computer science, and this has proven to be a stable finding.

So now I've talked about both factual and hierarchical knowledge. But how do these two types of knowledge compare?

Terminological knowledge is much smaller than the factual knowledge

or alternatively, in a picture:

Small hierarchy, big graph

And again, this may sound obvious to all of us in this audience, but really it wasn't all that obvious before we started the 10 year experiment. And in fact, it sharply contrasts with a long history of knowledge representation

And with this law, we can even venture to go beyond just a qualitative law, and put some quantitative numbers on it.

Jacopo numbers

Here are some numbers obtained by a Jacopo Urbani, a PhD student in our lab (and some of you will have seen these figures in his presentation yesterday), in the session on reasoners:

notice that this is now using an interesting measure of "size" here: we're not just counting triples, but we're measuring somehow the complexity of these triples by seeing how expensive it is to do deduction over them.

And we observe that the graph is 1-2 orders "larger" or than the schema.

So, if we revisit the diagram I sketched before:

Small hierarchy, big graph

then the size of the hierarchy (although already small) is actually still vastly overstated. If we have to believe the numbers on the previous slide, the real size of the terminological knowledge wrt to the size of the factual knowledge is like this

Now the black dot representing terminological knowledge is 2 orders of magnitude smaller than the size of the factual graph.

To put this in a slogan:

Apparently, the power of represented knowledge comes from from representing a very small set of general rules that are true about the world in general,

together with a huge body of rather trivial assertions that describe things as they happen to be in the current world (even though they could easily have been different).

And again, understanding this law helps us to design our distributed reasoners. It is the justification that when building parallel reasoners, many of us just take the small schema and simply replicate it across all the machines: it's small enough that we can afford to do this.

We've already seen that the factual knowledge is very large but very simple. We can ask ourselves how simple or complex the terminological knowledge is.

Terminological knowledge is of low complexity

When we go around with our data telescope, and we try to observe what real ontologies look like when they are out there in the world, what do we see?

Telescope with OWL

We see very wide spread of expressivity in ontologies, all the way from undecidable OWL Full to very simple RDF hierarchies. But this spread is very uneven: there are very many lightweight ontologies, and very few heavyweight ones.

This is of course well captured by Jim Hendler's timeless phrase:

And combining both this law and the previous law, we can now see that his "little semantics" means both: low expressivity and low volume

We could also phrase this as "the unreasonable effectiveness of low-expressive KR"

And there is another way in which this law is true:

If the world of information would be worst case, we wouldn'í»t have been able to deal with it, but apparently the laws of information make the world such that we can deal with the practical cases.

So: for highly expressive KR we could say that it works better in practice then in theory

The next law has of course been staring us in the face ever since we started this work on the semantic web (and it has been staring database people in the face for quite a bit longer):

Heterogeneity is unavoidable

It's for a good reason of course that I choose a Tower of Babel to symbolise our vocabularies:

Tower of Babel slide

A crucial insight that perhaps distinguishes the work in this community from many earlier pieces of work is that instead of fighting heterogeneity, we have seen that it's inevitable anyway, and that we might as well live with it.

And actually, I would claim that the fact that we have embraced this law (instead of fighting it) has enabled the enormous growth of the Web of Data.

Compared to many previous attempts, which try to impose a single ontology, the approach of let a 1000 ontologies blossom has been a key factor for the growth of our datasets.

But of course, embracing heterogeneity is nice when you are publishing data, but it's not so nice when you are consuming data. So heterogeneity is not only an opportunity, it's also a problem. And the question is: can we solve that problem.

Heterogeneity is solvable

I'll argue that yes, heterogeneity is solvable, but maybe not in the way that our community likes to hear).

We can see what's going on by looking at the Linked Data cloud.

LOD cloud

But that's not actually the structure of the Linked Data cloud.

Instead, the Linked Data cloud looks like this:

circular cluster map

linear cluster map


For the next law, we must remember that we are not only a semantic web community, but also a semantic web community. So let's look at distribution:

speed decreases with distribution, centralisation is necessary

The original dream of this community has sometimes been formulated as turning the Web into a database.

earth globe slide

But unfortunately, observations from our 10 year experiment tell us rather the opposite:

Indeed, the distributed model for data-publishing is a key factor that has enabled the growth of the Web and indeed of the Web of Data, but for data-consumption, physical centralisation works surprisingly well.

And this is not just us finding this out.

The Web is not a database, and I don't think it ever will be.

So if all this massive data has to be in one central place to process it, how are we going to cope? Well, the good news from the Information Universe is that

speed increases with parallelisation

at least for our types of data. I'll show you how well this works.

Jacopo graph 1

This was the performance of triple stores on forward inferencing, somewhere in 2009.

Jacopo graph 2

and this is how much parallelisation improved the performance. So apparently, the types of knowledge and data that we deal with are very suitable for parallelisation.

And it's interesting to see that the previous laws actually help us to make this possible: the combination of

(which were my proposed laws 1-4) make the design of our parallel reasoners possible.

So, that brings me to the final law

knowledge is layered

Contrary to the other laws, this law does not come so much yet from our own observations in this field. But other fields tell us that knowledge is like a set of Russian dolls:

Russian dolls

with one doll nested inside the other.

From fields like

we know that statements of knowledge need not only refer to the world, but that they may refer to other bits of knowledge, creating a multi-layered structure.

The examples are plenty: we may say that a fact in the world is true, and then we can say

Now curiously enough, there is lots and lots of demand in our community for this kind of layered representation, but our representation language serve this need very poorly. Re-ification can be seen as a failed experiment to obtain such layering, and now people are abusing named graphs because there is nothing better.

So, being more aware of this law would have helped us to create better representation language sooner.

So, we're reaching the end of the talk, final slide in sight:

Final slide

and I'll end with the same slide that I started with:

My hope for this talk is that - many of you might disagree with some of my proposed "laws" - and some of you may even disagree with all of them

And this has very concrete impact on how we organise our community:

Of course we won't really redo the last 10 years of our experiment, but when you do your research and write your papers, try to think about what are the repeatable patterns, these laws, and try to separate the incidental choices you make from the fundamental patterns you are uncovering.