topical media & game development

talk show tell print


An image may tell you more than 1000 words. Well, whether images are indeed a more powerful medium of expression is an issue I'd rather leave aside. The problem how to get information out of an image, or more generally how to query image databases is, in the context of our Amsterdam Drugport operation more relevant. There are two issues here

image query

These issues are quite distinct, although descriptive information may be used to establish similarity.

descriptive information

When we want to find, for example, all images that contain a person with say sunglasses, we need to have of the images in our database that includes this information one way or another. One way would be to annotate all images with (meta) information and describe the objects in the picture to some degree of detail. More challenging would be to extract image content by image analysis, and produce the description (semi) automatically.

content-based description

According to  [MMDBMS], content-based description of images involves the identification of objects, as well as an indication of where these objects are located in the image, by using a shape descriptor and possibly property descriptors indicating the pictorial properties of a particular region of the object or image.

Shape and property descriptors may take a form as indicated below.



As an example of applying these descriptors.


  shape descriptor: XLB=10; XUB=60; YLB=3; YUB=50   (rectangle)  
  property descriptor: pixel(14,7): R=5; G=1; B=3 
Now, instead of taking raw pixels as the unit of analysis, we may subdivide an image in a grid of cells and establish properties of cells, by some suitable algorithm.


As an example, we can define a property that indicates whether a particular cell if black or white.


  property: (bwcolor,{b,w},bwalgo) 
The actual algorithm used to establish such a property might be a matter of choice. So, in the example it is given as an explicit parameter.

From here to automatic content description is, admittedly, still a long way. We will indicate some research directions at the end of this section.



similarity-based retrieval

We need not necessarily know what an image (or segment of it) depicts to establish whether there are other images that contain that same thing, or something similar to it. We may, following  [MMDBMS], formulate the problem of similarity-based retrieval as follows:

similarity-based retrieval

How do we determine whether the content of a segment (of a segmented image) is similar to another image (or set of images)?

Think of, for example, the problem of finding all photos that match a particular face.

According to  [MMDBMS], there are two solutions:


  • metric approach -- distance between two image objects
  • transformation approach -- relative to specification
As we will see later, the transformation approach in some way subsumes the metric approach, since we can formulate a distance measure for the transformation approach as well.

metric approach

What does it mean when we say, the distance between two images is less than the distance between this image and that one. What we want to express is that the first two images (or faces) are more alike, or maybe even identical.

Abstractly, something is a distance measure if it satisfies certain criteria.

metric approach

distance d:X->[0,1] is distance measure if:

           d(x,y) = d(y,x)
  	 d(x,y) <= d(x,z) + d(z,y)
  	 d(x,x) = 0
For your intuition, it is enough when you limit yourself to what you are familiar with, that is measuring distance in ordinary (Euclidian) space.

Now, in measuring the distance between two images, or segments of images, we may go back to the level of pixels, and establish a distance metric on pixel properties, by comparing all properties pixel-wise and establishing a distance.

pixel properties

  • objects with pixel properties p_1,...,p_n
  • pixels: (x,y,v1,...,v_n)
  • object contains w x h (n+2)-tuples
Leaving the details for your further research, it is not hard to see that even if the absolute value of a distance has no meaning, relative distances do. So, when an image contains a face with dark sunglasses, it will be closer to (an image of) a face with dark sunglasses than a face without sunglasses, other things being equal. It is also not hard to see that a pixel-wise approach is, computationally, quite complex. An object is considered as


a set of points in k-dimensional space for k = n + 2

In other words, to establish similarity between two images (that is, calculate the distance) requires n+2 times the number of pixels comparisons.

feature extraction

Obviously,we can do better than that by restricting ourselves to a pre-defined set of properties or features.

feature extraction

  • maps object into s-dimensional space
For example, one of the features could indicate whether or not it was a face with dark sunglasses. So, instead of calculating the distance by establishing color differences of between regions of the images where sunglasses may be found, we may limit ourselves to considering a binary value, yes or no, to see whether the face has sunglasses.

Once we have determined a suitable set of features that allow us to establish similarity between images, we no longer need to store the images themselves, and can build an index based on feature vectors only, that is the combined value on the selected properties.

Feature vectors and extensive comparison are not exclusive, and may be combined to get more precise results. Whatever way we choose, when we present an image we may search in our image database and present all those objects that fall within a suitable similarity range, that is the images (or segments of images) that are close enough according to the distance metric we have chosen.



transformation approach

Instead of measuring the distance between two images (objects) directly, we can take one image and start modifying that until it exactly equals the target image. In other words, as phrased in  [
MMDBMS], the principle underlying the transformation approach is:

transformation approach

Given two objects o1 and o2, the level of dissimilarity is proportional to the (minimum) cost of transforming object o1 into object o2 or vice versa

Now, this principle might be applied to any representation of an object or image, including feature vectors. Yet, on the level of images, we may think of the following operations:

transformation operators

    to_1,...,to_r  -- translation, rotation, scaling
Moreover, we can attach a cost to each of these operations and calculate the cost of a transformation sequence TSby summing the costs of the individual operations. Based on the cost function we can define a distance metric, which we call for obvious reasons the edit distance, to establish similarity between objects.


  • cost(TS) = %S_{i=1}^{r} cost(to_{i})


  • d(o,o') = min { cost(TS) | TS in TSeq(o,o') }
An obvious advantage of the edit distance over the pixel-wise distance metric is thatwe may have a rich choice of transformation operators that we can attach (user-defined) cost to at will.


  • user-defined similarity -- choice of transformation operators
  • user-defined cost-function

For example, we could define low costs for normalization operations, such as scaling and rotation, and attach more weight tooperations that modify color values or add shapes. For face recognition, for example, we could attribute low cost to adding sunglasses but high cost to changing the sex.

To support the transformation approach at the image level, our image database needs to include suitable operations. See  [MMDBMS].


   segment(image-id, predicate)
   edit(image-id, edit-op)
We might even think of storing images, not as a collection of pixels, but as a sequence of operations on any one of a given set of base images. This is not such a strange idea as it may seem. For example, to store information about faces we may take a base collection of prototype faces and define an individual face by selecting a suitable prototype and a limited number of operations or additional properties.



example(s) -- match of the day

The images in this section present a match of the day, which is is part of the project split rpresentation by the Dutch media artist Geert Mul. As explain in the email sending the images, about once a week, Television images are recorded at random from satellite television and compared with each other. Some 1000.000.000 (one billion) equations are done every day.

. The split representation project uses the image analyses and image composition software NOTATION, which was developed by Geert Mul (concept) and Carlo Preize (programming ∓ software design).

research directions -- multimedia repositories

What would be the proper format to store multimedia information? In other words, what is the shape multimedia repositories should take? Some of the issues involved are discussed in chapter 6, which deals with information system architectures. With respect to image repositories, we may rephrase the question into what support must an image repository provide, minimally, to allow for efficient access and search?. In  [MMDBMS], we find the following answer:

image repository

  • storage -- unsegmented images
  • description -- limited set of features
  • index -- feature-based index
  • retrieval -- distance between feature vectors
And, indeed, this seems to be what most image databases provide. Note that the actual encoding is not of importance. The same type of information can be encoded using either XML, relational tables or object databases. What is of importance is the functionality that is offered to the user, in terms of storage and retrieval as well as presentation facilities.

What is the relation between presentation facilities and the functionality of multimedia repositories? Consider the following mission statement, which is taken from my research and projects page.


Our goal is to study aspects of the deployment and architecture of virtual environments as an interface to (intelligent) multimedia information systems ...

Obviously, the underlying multimedia repository must provide adequate retrieval facilities and must also be able to deliver the desired objects in a format suitable for the representation and possibly incorporation in such an environment. Actually, at this stage, I have only some vague ideas about how to make this vision come through. Look, however, at chapter 7 and appendix platform for some initial ideas.

(C) Æliens 04/09/2009

You may not copy or print any of this material without explicit permission of the author or the publisher. In case of other copyright issues, contact the author.