Information retrieval, according to [Baeza-Yates and Ribeiro-Neto (1999)], deals with the representation, storage, organisation of, and access to information items.
To see what is involved, imagine that we have a (user) query like:
find me the pages containg information on ...
Then the goal of the information retrieval system is
to retrieve information that is useful or relevant to
the user, in other words: information that
satisfies the user's information need.
Given an information repository,
which may consist of web pages but also multimedia objects,
the information retrieval system must extract syntactic
and semantic information from these (information) items
and use this to match the user's information need.
Effective information retrieval is determined
by, on the one hand, the user task
and, on the other hand, the logical view
of the documents or media objects that constitute
the information repository.
As user tasks, we may distinguish between retrieval
(by query) and browsing (by navigation).
To obtain the relevant information in retrieval we generally
apply filtering, which may also be regarded
as a ranking based on the attributes considered most
The logical view of text documents generally amounts
to a set of index terms characterizing theb document.
To find relevant index terms, we may apply operations
to the document, such as the elimination of stop words
or text stemming.
As you may easily see, full text provides the most
complete logical view, whereas a small set of categories
provides the most concise logical view.
Generally, the user task will determine whether semantic richness
or efficiency of search will be considered as more
important when deciding on the obvious tradeoffs involved.
information retrieval models
In [Baeza-Yates and Ribeiro-Neto (1999)], a great variety of information retrieval
models is described.
For your understanding, an information retrieval model
makes explicit how index terms are represented
and how the index terms characterizing an information
item are matched with a query.
When we limit ourselves to the classic models for
search and filtering, we may distinguish between:
information retrieval models
- boolean or set-theoretic models
- vector or algebraic models
- probabilistic models
Boolean models typically allow for yes/no answers only.
The have a set-theoretic basis, and include models
based on fuzzy logic, which allow for somewhat more refined
Vector models use algebraic operations on vectors
of attribute terms to determine possible matches.
The attributes that make up a vector must in principle
Attributes may be given a weight, or even be ignored.
Much research has been done on how to find an optimal selection
of attributes for a given information repository.
Probabilistic models include general inference networks,
and belief networks based on Bayesan logic.
Although it is somewhat premature to compare these
models with respect to their effectiveness in
actual information etrieval tasks, there is,
according to [Baeza-Yates and Ribeiro-Neto (1999)], a general consensus that
vector models will outperform the probabilistic
models on general collections of text documents.
How they will perform for arbitrary collections of
multimedia objects might be an altogether different question!
Nevertheless, in the sections to follow we will
focus primarily on generalized vector representations
of multimedia objects.
So, let's conclude with listing the advantages of
- attribute term weighting scheme improves performance
- partial matching strategy allows retrieval of approximate material
- metric distance allows for sorting according to degree of similarity
Reading the following sections, you will come to understand
how to adopt an attribute weighting scheme,
how to apply partial matching and how to define a suitable
So, let me finish with posing a research issue:
How can you improve a particular information retrieval
model or matching scheme by using a suitable method
of knowledge representation and reasoning?
To give you a point of departure, look
at the logic-based multimedia information retrieval
system proposed in [Fuhr et al. (1998)].
An image may tell you more than 1000 words.
Well, whether images are indeed a more powerful medium
of expression is an issue I'd rather leave aside.
The problem how to get information out of an image,
or more generally how to query image databases is,
in the context of our Amsterdam Drugport
operation more relevant.
There are two issues here
These issues are quite distinct, although descriptive information
may be used to establish similarity.
- obtaining descriptive information
- establishing similarity
When we want to find, for example, all images
that contain a person with say sunglasses, we need to
have of the images in our database that includes this information
one way or another.
One way would be to annotate all images with (meta) information
and describe the objects in the picture to some degree of detail.
More challenging would be to extract image content
by image analysis, and produce the description
- objects in image
- shape descriptor -- shape/region of object
- property description -- cells in image
According to [Subrahmanian (1998)],
content-based description of images
involves the identification of objects, as well as an indication
of where these objects are located in the image, by using a
shape descriptor and possibly
property descriptors indicating the pictorial
properties of a particular region of the object
Shape and property descriptors may take a form
as indicated below.
- bounding box -- (XLB,XUB,YLB,YUB)
As an example of applying these descriptors.
shape descriptor: XLB=10; XUB=60; YLB=3; YUB=50 (rectangle)
property descriptor: pixel(14,7): R=5; G=1; B=3
Now, instead of taking raw pixels as the unit of analysis,
we may subdivide an image in a grid of cells
and establish properties of cells,
by some suitable algorithm.
As an example, we can define a property that indicates
whether a particular cell if black or white.
- image grid: cells of equal size
- cell property: (Name, Value, Method)
The actual algorithm used to establish such a property
might be a matter of choice.
So, in the example it is given as an explicit parameter.
From here to automatic content description is, admittedly,
still a long way.
We will indicate some research directions at the end
of this section.
We need not necessarily know what an image (or segment of it)
depicts to establish whether there are other images that
contain that same thing, or something similar to it.
We may, following [Subrahmanian (1998)], formulate the problem of similarity-based
retrieval as follows:
How do we determine whether the content of a segment
(of a segmented image) is similar to another image (or set
Think of, for example, the problem of finding all photos
that match a particular face.
According to [Subrahmanian (1998)], there are two solutions:
- metric approach -- distance between two image objects
- transformation approach -- relative to specification
As we will see later, the transformation approach in
some way subsumes the metric approach, since we can formulate
a distance measure for the transformation approach as well.
What does it mean when we say, the distance between
two images is less than the distance between this
image and that one.
What we want to express is that the first two
images (or faces) are more alike, or maybe even identical.
Abstractly, something is a distance measure
if it satisfies certain criteria.
distance is distance measure if:
d(x,y) = d(y,x)
d(x,y) d(x,z) + d(z,y)
d(x,x) = 0
For your intuition, it is enough when you limit
yourself to what you are familiar with,
that is measuring distance in ordinary (Euclidian) space.
Now, in measuring the distance between two
images, or segments of images,
we may go back to the level of pixels,
and establish a distance metric on pixel properties,
by comparing all properties pixel-wise and establishing
Leaving the details for your further research, it
is not hard to see that even if the absolute value
of a distance has no meaning, relative distances do.
So, when an image contains a face with dark sunglasses,
it will be closer to (an image of) a face with
dark sunglasses than a face without sunglasses,
other things being equal.
It is also not hard to see that a pixel-wise
approach is, computationally, quite complex.
An object is considered as
- objects with pixel properties
- object contains w x h (n+2)-tuples
a set of points in k-dimensional space for k = n + 2
In other words, to establish similarity between two images
(that is, calculate the distance)
requires n+2 times the number of pixels comparisons.
Obviously,we can do better than that by restricting
ourselves to a pre-defined set of properties or features.
For example, one of the features could indicate
whether or not it was a face with dark sunglasses.
So, instead of calculating the distance by
establishing color differences of between regions
of the images where sunglasses may be found,
we may limit ourselves to considering a binary value,
yes or no, to see whether the face has sunglasses.
Once we have determined a suitable set of features
that allow us to establish similarity between images,
we no longer need to store the images themselves,
and can build an index based on feature vectors only,
that is the combined value on the selected properties.
Feature vectors and extensive comparison are not exclusive,
and may be combined to get more precise results.
Whatever way we choose, when we present an image
we may search in our image database
and present all those objects that fall within
a suitable similarity range,
that is the images (or segments of images)
that are close enough according to the
distance metric we have chosen.
- maps object into s-dimensional space
Instead of measuring the distance between two
images (objects) directly,
we can take one image and start modifying that
until it exactly
equals the target image.
In other words, as phrased in [Subrahmanian (1998)],
the principle underlying the transformation approach
Given two objects o1 and o2,
the level of dissimilarity is proportional
to the (minimum) cost of transforming object o1
into object o2 or vice versa
Now, this principle might be applied to any representation
of an object or image, including feature vectors.
Yet, on the level of images, we may think of the
-- translation, rotation, scaling
Moreover, we can attach a cost to each of these
operations and calculate the cost of
a transformation sequence TSby summing the costs
of the individual operations.
Based on the cost function we can define a distance metric,
which we call for obvious reasons the edit distance,
to establish similarity between objects.
An obvious advantage of the edit distance
over the pixel-wise distance metric is thatwe may
have a rich choice of transformation operators
that we can attach (user-defined) cost to at will.
- user-defined similarity -- choice of transformation operators
- user-defined cost-function
For example, we could define low costs for normalization operations,
such as scaling and rotation,
and attach more weight tooperations that modify color values
or add shapes.
For face recognition, for example,
we could attribute low cost to adding sunglasses
but high cost to changing the sex.
To support the transformation approach
at the image level,
our image database needs to include suitable operations.
See [Subrahmanian (1998)].
We might even think of storing images,
not as a collection of pixels,
but as a sequence of operations
on any one of a given set of base images.
This is not such a strange idea
as it may seem.
information about faces we may take a base
collection of prototype faces
and define an individual face
by selecting a suitable prototype
and a limited number of operations or additional
example(s) -- match of the day
The images in this section present
a match of the day, which is
is part of the project split rpresentation by the Dutch media
artist Geert Mul.
As explain in the email sending the images, about once a week,
Television images are recorded at random from satellite television and compared with each other. Some 1000.000.000 (one billion) equations are done every day.
The split representation project
uses the image analyses and image composition software
which was developed by Geert Mul (concept)
and Carlo Preize (programming ∓ software design).
research directions -- multimedia repositories
What would be the proper format to store multimedia
In other words, what is the shape multimedia repositories
Some of the issues involved are discussed in chapter
, which deals with
information system architectures.
With respect to image repositories, we may rephrase the question into
what support must an image repository provide, minimally,
to allow for efficient access and search?.
In [Subrahmanian (1998)], we find the following answer:
- storage -- unsegmented images
- description -- limited set of features
- index -- feature-based index
- retrieval -- distance between feature vectors
And, indeed, this seems to be what most image
Note that the actual encoding is not of importance.
The same type of information can be encoded using
either XML, relational tables or object databases.
What is of importance is the functionality that is
offered to the user, in terms of storage and retrieval
as well as presentation facilities.
What is the relation between presentation facilities
and the functionality of multimedia repositories?
Consider the following mission statement,
which is taken from my research and projects page.
Our goal is to study aspects of the deployment and architecture of virtual environments as an interface to (intelligent) multimedia information systems ...
Obviously, the underlying multimedia repository
must provide adequate retrieval facilities
and must also be able to deliver the desired objects
in a format suitable for the representation and
possibly incorporation in such an environment.
Actually, at this stage, I have only some vague ideas
about how to make this vision come through.
Look, however, at chapter
and appendix [platform]
for some initial ideas.
Even in the presence of audiovisual media,
text will remain an important vehicel for human
In this section, we will look at the issues that
arise in querying a text or document database.
First we will characterize more precisely what
we mean by effective search,
and then we will study techniques to realize effective
search for document databases.
Basically, answering a query to a document database
comes down to string matching.
However, some problems may occur such as synonymy and polysemy.
- document database + string matching
- synonymy -- topic T does not occur literally in document D
- polysemy -- some words may have many meanings
As an example, church and house of prayer
have more or less the same meaning.
So documents about churches and cathedrals
should be returned when you ask for information about
'houses of prayer'.
As an exampleof polysemy, think of the word drum,
which has quite a different meaning when taken from a musical
perspective than from a transport logistics perspective.
precision and recall
Suppose that, when you pose a query,
everything that is in the database is returned.
You would probably not be satisfied,
although every relevant document will be included,
that is for sure.
On the other hand, when nothing is returned,
at least you cannot complain about non-relevant
documents that are returned, or can you?
In [Subrahmanian (1998)], the notions of precision and
recall are proposed to measure the effectiveness
of search over a document database.
In general, precision and recall can be defined as follows.
- precision -- how many answers are correct
- recall -- how many of the right documents are returned
For your intuition,
just imagine that you have a database of documents.
With full knowledge of the database you can delineate
a set of documentsthatare of relevance to a particular query.
Also, you can delineate a set that will be returned
by some given search algorithm.
Then, precision is the intersection of the two sets
in relation to whatthe search algorithm returns,
and recall that same intersection in relation to what is
In pseudo-formulas, we can express this
precision and recall
precision = ( returned and relevant ) / returned
recall = ( returned and relevant ) / relevant
Now, as indicated in the beginning,
it is not to difficult to get either perfect recall
(by returning all documents)
or perfect precision (by returning almost nothing).
- return all documents: perfect recall, low precision
- return 'nothing': 'perfect' precision, low recall
But these must be considered anomalies
(that is, sick cases),
and so the problem is to find an algorithm
that performs optimally with respect to both
precision and recall.
For the total database we can extend these measures
by taking the averages of precision and recall
for all topics that the database may be queried about.
Can these measures only be applied
to document databases?
Of course not,
these are general measures that can be applied
to search over any media type!
A frequency table is an example
of a way to improve search.
Frequency tables, as discussed in [Subrahmanian (1998)],
are useful for documents only.
Let's look at an example first.
Basically, what a frequency table does is, as the name implies,
give a frequency count for particular words or phrases
for a number of documents.
In effect, a complete document database may be summarized
in a frequency table.
In other words, the frequency table may be considered as an
index to facilitate the search for similar documents.
To find a similar document, we can simply make
a word frequency count for the query,
and compare that with the colums in the table.
As with images, we can apply a simpledistance metric to find the
nearest (matching) documents.
(In effect, we may take the square root for the
sum of the squared differences between the entries
in the frequence count as our distance measure.)
The complexity of this algorithm may be characterized
compare term frequencies per document -- O(M*N)
where M is the number of terms and N is the number
Since both M and N can become very large we need
to make an effort to reduce the size of
the frequency table.
We can, for example, introduce a stop list
to prevent irrelevant words to enter the table,
and we may restrict ourselves to
including word stems only,
to bring back multiple entries to
one canonical form. With some additional effort
we could even deal with synonymy and polysemy
by introducing, respectively equivalence classes,
and alternatives (although we then need a suitable
way for ambiguation).
By the way, did you notice that frequency tables
may be regarded as feature vectors for documents?
- stop list -- irrelevant words
- word stems -- reduce different words to relevant part
research directions -- user-oriented measures
Even though the reductions proposed may result in
limiting the size of the frequency tables,
we may still be faced with frequency tables
of considerable size.
One way to reduce the size further, as discussed
in [Subrahmanian (1998)], is to apply latent sematic indexing
which comes down to clustering the document database,
and limiting ourselves to the most relevant words only,
where relevance is determined by the ratio of occurrence
over the total number of words.
In effect, the less the word occurs, the more discriminating
it might be.
Alternatively,the choice of what words are
considered relevant may be determined by
taking into account the area of application
or the interest of a particular group of users.
Observe that, when evaluating a particular information
retrieval system, the notions of precision and recall
as introduced before are rather system-oriented measures,
based on the assumption of a user-independent notion of
However, as stated in [Baeza-Yates and Ribeiro-Neto (1999)],
different users might have a different interpretation
on which document is relevant.
In [Baeza-Yates and Ribeiro-Neto (1999)], some user-oriented measures are briefly discussed,
that to some extent cope with this problem.
- coverage ratio -- fraction of known documents
- novelty ratio -- fraction of new (relevant) documents
- relative recall -- fraction of expected documents
- recall effort -- fraction of examined documents
Consider a reference collection,
an example information request
and a retrieval strategy to be evaluated.
Then the coverage ratio
may be defined as the fraction of the documents
known to be relevant, or more precisely the number
of (known) relevant documents retrieved divided by the
total number of documents known to be relevant by the user.
The novelty ratio may then be defined as the
fraction of the documents retrieved which were not known
to be relevant by the user, or more precisely
the number of relevent documents that were not known
by the user divided by the total number of relevant documents
The relative recall is obtained by dividing
the number of relevant documents found by the number
of relevant documents the user expected to be found.
Finally, recall effortmay be characterized as
the ratio of the number of relevant documents
expected and the total number of documents that
has to be examined to retrieve these documents.
Notice that these measures all have a clearly 'subjective'
element, in that, although they may be
generalized to a particular group of users,
they will very likely not generalize to all
groups of users.
In effect, this may lead to different retrieval
strategies for different categories of users,
taking into account levelof expertise and familiarity
with the information repository.
development(s) -- unconvenient truth(s)
Since the 1970's, Dutch universities have enormously grown in size,
due to the ever larger number of students that aim at having university
As departments become bigger, however, staff members no longer know eachother
personally. The impersonal and anonymous atmosphere is increasingly an
issue of concern for the management, and various initiatives have been taken,
including collective trips into nature, as well as cultural events, not to much
avail for that matter.
An additional problem is that more and more members of the staff
come from different countries and cultures, often only with a temporal contract and residence permit.
Yet, during heir stay, they also have the desire to communicate and learn about
the other staff members and their culture(s).
As you can see, it is not only the climate or science itself that provides us
with unconvenient truths. Life at the university, not only in Second Life, apparently suffers
from the changing times.
In [Eliens & Vyas (2007)], we wrote: in september 2006, the idea came up to use a large screen display in one of the public
spaces in our department, to present, one way or another, the 'liveliness' of the work place,
and to look for ways that staff members might communicate directly or indirectly with eachother
through this display.
Observing that communications often took place during casual encounters at the coffee machine or printer,
we decided that monitoring the interactions at such places might give a clue about the liveliness
of the work place.
In addition, we noted that the door and one of the walls in the room where the coffee machine
stood, was used by staff members to display personal items, such as birth announcement cards
or sport trophees.
In that same room, mostly during lunch time, staff members also gathered to play cards.
Taking these observations together, we decided to develop a system, nicknamed PANORAMA,
to present these ongoing activities and interactions on a large display, which was to
be positioned in the coffee room.
The name of our system is derived from the famous
in The Hague, which gives a view on (even in that time nostalgic rendering of) Scheveningen.
However, it was explicitly not our intention to give an in any sense realistic/naturalistic
remdering of the work place, but rather, inspired by artistic interpretations of
panoramic applications as presented in [Grau (2003)], to
find a more art-ful way of visualizing the social structure and dynamics of the work place.
|(a) context||(b) self-reflection|
At this stage, about one year later, we have a running prototype (implemented in DirectX),
for which we did perform a preliminary field study, see the figure above, as well as a first user evaluation, [Panorama],
and also we have experimented with a light-weight web-based variant, allowing
access from the desktop, [Si & Eliens (2007)].
Our primary focus, however, we stated in [Eliens & Vyas (2007)], was to establish
the relation between interaction aesthetics and game play, for which PANORAMA served as a vehicle.
When we think of
media as an extension of our senses, as we remarked in chapter 1 following [Zielinski (2006)],
reformulate the question of interaction aesthetics as the
problem of clarifying the aesthetics of media rich interactive applications.
However, what do we mean exactly by the phrase aesthetics?
The Internet Enceclopedia of Philosophy
discusses under the heading of aesthetics topics such as
- intentions -- motives of the artist
- expression -- where form takes over
- representation -- the relation of art to reality
As we wrote in [Saw (1971)], these topics obviously do not cover what we want,
so we took a call for contributions to the
aesthetics of interaction as a good chance
to dust of our old books, and rekindle our interest
in this long forgotten branch of philosophy, aesthetics.
It may come as a shock to realize how many perspectives
apply to the notion of aesthetics.
First of all, we may take an analytical approach, as we do in section 2, to see in what ways
the phrase aesthetics is used, and derive its meaning from its usage in a variety of contexts.
However, we find it more worthwhile to delve into the history of thought
and clarify the meaning of aesthetics from an epistemological point of view,
following [Kant (1781)], as an abstract a priori form of awareness, which is in later
phenomenological thinking augmented with a notion of self-consciousness. See section 12.4.
In this line of thinking we also encounter the distinction between aesthetic awareness
and aesthetic judgement, the dialectic relationship of which becomes evident
in for example the occurrence of aestheticism in avant-garde art, [Burger (1981)].
When writing [Saw (1971)], we came along a report of how the Belgium curator
Jan Hoet organized the Documenta IX, a famous yearly art event in Germany,
and we were struck by the phrase art and the public sharing accomodation, [Documenta],
which in a way that we have yet to clarify expresses some of our intuition
we have with respect to the role the new interactive systems may play in our lives.
What can we hope to achieve when taking a more philosophical look at interaction aesthetics?
Not so much, at first sight.
According to [Körner (1973)], aesthetic theory ... will not be able to provide
aesthetic guidance even to the extent to which moral theory can give moral guidance.
The reason is that aesthetic experience and creation defy conceptualization, or in other
words they defy the identification, classification and evaluation of aesthetic objects
by means of non-aesthetic attributes.
However, as [Körner (1973)] observes, in a paradoxical way aesthetic experience not
only defies but also invites conceptualization, and therefore it seems worthwhile
to gain a better understanding in what underlies the experience and creation of (aesthetic)
If we can not change the world to become a better place, we might perhaps be concerned
with making it a more beautiful place ...
projects & further reading
As a project, you may implement simple
image analysis algorithms that, for example, extract
a color histogram, or detect the presence of
a horizon-like edge.
You may further explore
scenarios for information retrieval in the
cultural heritage domain.
and compare this with other
applications of multimedia information retrieval,
for example monitoring in hospitals.
For further reading I suggest to make yourself
familiar with common techniques
in information retrieval as described in [Baeza-Yates and Ribeiro-Neto (1999)],
and perhaps devote some time to studying image analisis, [Gonzales and Wintz (1987)].
- artworks -- ..., Miro, Dali, photographed from
Kunstsammlung Nordrhein-Westfalen, see artwork 2.
- left Miro from [Kunst], right: Karel Appel
- match of the day (1) -- Geert Mul
- match of the day (2) -- Geert Mul
- match of the day (3) -- Geert Mul
- mario ware -- taken from gammo/veronica.
- baten kaitos -- eternal ways and the lost ocean, taken from gammo/veronica.
- PANORAMA -- screenshots from field test.
- signs -- people, [ van Rooijen (2003)], p. 252, 253.
The art opening this chapter belongs to
the tradition of 20th century art.
It is playful, experimental,
with strong existential implications,
and it shows an amazing variety of styles.
The examples of match of the day by Geert Mul
serve to illustrate the interplay between
technology and art,
and may also start you to think about what
Some illustrations from games are added to show the
difference in styles.
You may not copy or print any of this material without explicit permission of the author or the publisher.
In case of other copyright issues, contact the author.