Original PowerPoint source without the script is here.
Video registration of the talk (including synchronised slides) are at VideoLectures.net.
Apology for wrong title
Jeff Naughton (leader in DB field) recently gave a talk at ICDE, which was all about how we have organised the scientific process, and he used the following health warning at the start of his talk. I've freely borrowed his slide to give you the same health warning:
But notice the flawed piece of logic in the final bullet:
if you don't give the keynote, you might still well be a washed up has
been. In which case I thought I might as well give the keynote
anyway.
talking about generic laws is certainly a pretentious thing do (and certainly in computer science), so you have been warned.
My view on science is that of a "realist".
Quote: philosophical realism is the belief that our reality, is ontologically independent of our conceptual schemes, linguistic practices, beliefs, etc.
believes in a world out there, existing independently from us
so, I'm not a *constructivist,
Constructivists, maintain that scientific knowledge is constructed by scientists and not discovered from the world. Constructivists claim that the concepts of science are mental constructs .
I do not believe that scientific knowledge is just a mental or social construction, and that our scientific laws have only relative and subjective value.
that there are laws that govern these structures & properties.
I believe we can discover these laws (just like we can discover physics laws).
of course it is the case that our perception of these laws at any particular time during our scientific progress will be somehow coloured by our perceptions and social and mental constructs.
what we perceive to be the universe may well be coloured
and in general it is hard to distinguish the "real" laws about the external universe from cognitive artifacts and observational bias.
but that doesn't imply that all laws are only fictions of our culturally biased imaginations.
and before we can write the beautiful sets of concise equations about the information universe.
we cannot yet hope for such beautifully mathematised laws, in such a concise language that fits on a very compact space
developing their experimental apparatus
and this is now recognised as having lead to the more mature sciences of chemistry and physics that we now know.
So, the central question that I will boldly (and perhaps rather foolishly tackle in the rest of this talk is this one:
Did a decade of Semantic Web work help to discover any Computing Science laws?
So let's first take a look at what we actually built in the past decade.
We can characterise what we have built over the past 10 years in 3 parts:
But all of these have been mostly treated as one very large engineering
exercise.
And it's obvious that as engineers we have succeeded.
Now, remember the goal of this talk is:
So what I'm going to do now, is to treat the past 10 years of SemWeb engineering as one giant experiment:
So take that as a giant experiment and ask the question:
If we would build the Semantic Web again, surely some things would end up looking different, but are there things that would end up looking the same, simply because they have to be that way?
for example
languages full of angle brackets. If you reran the experiment, surely it would be different, because it's just an accidental choice. That feature isn't governed by any "law in the Information Universe" (or at least not one that I can imagine).
but other features of what we've built what turn out in essentially the same way,
So, let's see if we can discover any of such laws, such stable patterns that we would rediscover by necessity every time we ran the experiment.
Now, fortunately, we don't have to start from scratch. Some well known laws of Computer Science already can be seen to apply to our 10 year experiment as well. I'll give you two examples:
We know from our 10 year long experiment that our datasets also obey Zipf's law, and this has been well documented in a number of empirical studies.
It's important to realise that knowing Zipf's law helps us deal with the phenomenon, both in the cases where it's a blessing (so we can exploit it) and in the cases where it's a curse (so that we can try to avoid it).
that's why it is worth trying to discover these laws.
Here's a second well known law from Computer Science:
Another known law also applies:
(of course don¡¯t take linear form literally)
lesson from ontologies
OK, so now I'll start proposing some "laws" that originate from our own field, and from our own 10 year experiment:
the dominant life-forms in our information space is the graph.
Now this may sound obvious to us in this community, but stating that factual knowledge is a graph is not obvious at all.
For example, if you would ask this question to a DB person, they'd say: factual knowledge is a table. And a logician would say: knowledge is a set of sentences.
I know that you can convert one form into the other
but that's a bit beside the point: just because all our programming language are Turing complete doesn't mean that there aren't very real and important differences between them.
So in the same way, graphs, tables and sets of sentences are all really different representations, even with the theoretical transformations.
So let's switch to a less controversial law;
And this observed repeated invention, makes this a much stronger law.
So to say: this experiment has already been rerun many times in the history of computer science, and this has proven to be a stable finding.
So now I've talked about both factual and hierarchical knowledge. But how do these two types of knowledge compare?
or alternatively, in a picture:
And again, this may sound obvious to all of us in this audience, but really it wasn't all that obvious before we started the 10 year experiment. And in fact, it sharply contrasts with a long history of knowledge representation
traditionally, KR has focussed on small and very intricate sets of axioms: a bunch of universally quantified complex sentences
but now it turns out that much of our knowledge comes in the form of very large but shallow sets of axioms.
And with this law, we can even venture to go beyond just a qualitative law, and put some quantitative numbers on it.
Here are some numbers obtained by a Jacopo Urbani, a PhD student in our lab (and some of you will have seen these figures in his presentation yesterday), in the session on reasoners:
notice that this is now using an interesting measure of "size" here: we're not just counting triples, but we're measuring somehow the complexity of these triples by seeing how expensive it is to do deduction over them.
And we observe that the graph is 1-2 orders "larger" or than the schema.
So, if we revisit the diagram I sketched before:
then the size of the hierarchy (although already small) is actually still vastly overstated. If we have to believe the numbers on the previous slide, the real size of the terminological knowledge wrt to the size of the factual knowledge is like this
Now the black dot representing terminological knowledge is 2 orders of magnitude smaller than the size of the factual graph.
To put this in a slogan:
Apparently, the power of represented knowledge comes from from representing a very small set of general rules that are true about the world in general,
together with a huge body of rather trivial assertions that describe things as they happen to be in the current world (even though they could easily have been different).
And again, understanding this law helps us to design our distributed reasoners. It is the justification that when building parallel reasoners, many of us just take the small schema and simply replicate it across all the machines: it's small enough that we can afford to do this.
We've already seen that the factual knowledge is very large but very simple. We can ask ourselves how simple or complex the terminological knowledge is.
When we go around with our data telescope, and we try to observe what real ontologies look like when they are out there in the world, what do we see?
We see very wide spread of expressivity in ontologies, all the way from undecidable OWL Full to very simple RDF hierarchies. But this spread is very uneven: there are very many lightweight ontologies, and very few heavyweight ones.
This is of course well captured by Jim Hendler's timeless phrase:
And combining both this law and the previous law, we can now see that his "little semantics" means both: low expressivity and low volume
We could also phrase this as "the unreasonable effectiveness of low-expressive KR"
And there is another way in which this law is true:
And some of these languages have very scary worst-case complexity bounds.
But when writing ontologies in these expressive languages, we often find that the behaviour of the reasoners for these expressive languages perform quite well.
In other words: the information universe is apparently structured in such a way that the double exponential worse case complexity bounds don't hit us in practice.
If the world of information would be worst case, we wouldn'¡¯t have been able to deal with it, but apparently the laws of information make the world such that we can deal with the practical cases.
So: for highly expressive KR we could say that it works better in practice then in theory
The next law has of course been staring us in the face ever since we started this work on the semantic web (and it has been staring database people in the face for quite a bit longer):
It's for a good reason of course that I choose a Tower of Babel to symbolise our vocabularies:
A crucial insight that perhaps distinguishes the work in this community from many earlier pieces of work is that instead of fighting heterogeneity, we have seen that it's inevitable anyway, and that we might as well live with it.
And actually, I would claim that the fact that we have embraced this law (instead of fighting it) has enabled the enormous growth of the Web of Data.
Compared to many previous attempts, which try to impose a single ontology, the approach of let a 1000 ontologies blossom has been a key factor for the growth of our datasets.
But of course, embracing heterogeneity is nice when you are publishing data, but it's not so nice when you are consuming data. So heterogeneity is not only an opportunity, it's also a problem. And the question is: can we solve that problem.
I'll argue that yes, heterogeneity is solvable, but maybe not in the way that our community likes to hear).
We can see what's going on by looking at the Linked Data cloud.
but actually the picture is also somewhat misleading.
It (no doubt unintentionally) suggests an evenly spread out cloud of lots of colourful datasets.
But that's not actually the structure of the Linked Data cloud.
Instead, the Linked Data cloud looks like this:
it shows a heavily clustered structure.
And here's the same picture,
and low links between the clusters)
And how did these clusters come about? T
but mostly by a combination of social, economic and cultural processes:
Why is SNOMED so important in the medical domain? Partly because it was the first to be around
etc.
For the next law, we must remember that we are not only a semantic web community, but also a semantic web community. So let's look at distribution:
The original dream of this community has sometimes been formulated as turning the Web into a database.
But unfortunately, observations from our 10 year experiment tell us rather the opposite:
Indeed, the distributed model for data-publishing is a key factor that has enabled the growth of the Web and indeed of the Web of Data, but for data-consumption, physical centralisation works surprisingly well.
And this is not just us finding this out.
Wikipedia, etc.
So, you might think that centralisation would become a bottle neck. wrong, distribution is the bottle neck,
The Web is not a database, and I don't think it ever will be.
So if all this massive data has to be in one central place to process it, how are we going to cope? Well, the good news from the Information Universe is that
at least for our types of data. I'll show you how well this works.
This was the performance of triple stores on forward inferencing, somewhere in 2009.
and this is how much parallelisation improved the performance. So apparently, the types of knowledge and data that we deal with are very suitable for parallelisation.
And it's interesting to see that the previous laws actually help us to make this possible: the combination of
(which were my proposed laws 1-4) make the design of our parallel reasoners possible.
So, that brings me to the final law
Contrary to the other laws, this law does not come so much yet from our own observations in this field. But other fields tell us that knowledge is like a set of Russian dolls:
with one doll nested inside the other.
From fields like
we know that statements of knowledge need not only refer to the world, but that they may refer to other bits of knowledge, creating a multi-layered structure.
The examples are plenty: we may say that a fact in the world is true, and then we can say
Now curiously enough, there is lots and lots of demand in our community for this kind of layered representation, but our representation language serve this need very poorly. Re-ification can be seen as a failed experiment to obtain such layering, and now people are abusing named graphs because there is nothing better.
So, being more aware of this law would have helped us to create better representation language sooner.
So, we're reaching the end of the talk, final slide in sight:
and I'll end with the same slide that I started with:
My hope for this talk is that - many of you might disagree with some of my proposed "laws" - and some of you may even disagree with all of them
And this has very concrete impact on how we organise our community:
Of course we won't really redo the last 10 years of our experiment, but when you do your research and write your papers, try to think about what are the repeatable patterns, these laws, and try to separate the incidental choices you make from the fundamental patterns you are uncovering.