Information retrieval, according to [IR], deals with the representation, storage, organisation of, and access to information items.
To see what is involved, imagine that we have a (user) query like:
find me the pages containg information on ...
Then the goal of the information retrieval system is
to retrieve information that is useful or relevant to
the user, in other words: information that
satisfies the user's information need.
Given an information repository,
which may consist of web pages but also multimedia objects,
the information retrieval system must extract syntactic
and semantic information from these (information) items
and use this to match the user's information need.
Effective information retrieval is determined
by, on the one hand, the user task
and, on the other hand, the logical view
of the documents or media objects that constitute
the information repository.
As user tasks, we may distinguish between retrieval
(by query) and browsing (by navigation).
To obtain the relevant information in retrieval we generally
apply filtering, which may also be regarded
as a ranking based on the attributes considered most
The logical view of text documents generally amounts
to a set of index terms characterizing theb document.
To find relevant index terms, we may apply operations
to the document, such as the elimination of stop words
or text stemming.
As you may easily see, full text provides the most
complete logical view, whereas a small set of categories
provides the most concise logical view.
Generally, the user task will determine whether semantic richness
or efficiency of search will be considered as more
important when deciding on the obvious tradeoffs involved.
information retrieval models
In [IR], a great variety of information retrieval
models is described.
For your understanding, an information retrieval model
makes explicit how index terms are represented
and how the index terms characterizing an information
item are matched with a query.
When we limit ourselves to the classic models for
search and filtering, we may distinguish between:
information retrieval models
- boolean or set-theoretic models
- vector or algebraic models
- probabilistic models
Boolean models typically allow for yes/no answers only.
The have a set-theoretic basis, and include models
based on fuzzy logic, which allow for somewhat more refined
Vector models use algebraic operations on vectors
of attribute terms to determine possible matches.
The attributes that make up a vector must in principle
Attributes may be given a weight, or even be ignored.
Much research has been done on how to find an optimal selection
of attributes for a given information repository.
Probabilistic models include general inference networks,
and belief networks based on Bayesan logic.
Although it is somewhat premature to compare these
models with respect to their effectiveness in
actual information etrieval tasks, there is,
according to [IR], a general consensus that
vector models will outperform the probabilistic
models on general collections of text documents.
How they will perform for arbitrary collections of
multimedia objects might be an altogether different question!
Nevertheless, in the sections to follow we will
focus primarily on generalized vector representations
of multimedia objects.
So, let's conclude with listing the advantages of
- attribute term weighting scheme improves performance
- partial matching strategy allows retrieval of approximate material
- metric distance allows for sorting according to degree of similarity
Reading the following sections, you will come to understand
how to adopt an attribute weighting scheme,
how to apply partial matching and how to define a suitable
So, let me finish with posing a research issue:
How can you improve a particular information retrieval
model or matching scheme by using a suitable method
of knowledge representation and reasoning?
To give you a point of departure, look
at the logic-based multimedia information retrieval
system proposed in [Dolores].
You may not copy or print any of this material without explicit permission of the author or the publisher.
In case of other copyright issues, contact the author.