topical media & game development

talk show tell print


Animation in front of television news in ViP


merging video and 3D

In june 2003, I was approached by a theatre production company to advice on the use of "VR in theatre". As described in more detail in section 9.3, I explored what technology was available to realize such VR-augmented theatre. These explorations resulted in the development of the ViP system, that I once announced as follows:

The ViP system enhances your party with innovative multimedia presentations.

It supports multiple webcams and digital video cameras, mixed with video and images, enhanced by 3D animations and text, in an integrated fashion.

For your party, we create a ViP presentation, with your content and special effects, to entertain your audience.

In the course of time, I continued working on the system and, although I do not use it for parties, but rather for enlivening my lectures, it does include many of the features of a VJ system, such as the drumpad described in 3.2.

The major challenge, when I started its development, was to find an effective way to map live video from a low/medium resolution camera as textures onto 3D geometry. I started with looking at the ARToolkit but I was at the time not satisfied with its frame rate. Then, after some first explorations, I discovered that mapping video on 3D was a new (to some extent still experimental) built-in feature of the DirectX 9 SDK, in the form of the VMR9 (video mixing renderer) filter.

the Video Mixing Renderer filter

The VMR filter is a compound class that handles connections, mixing, compositing, as well as synchronization and presentation in an integrated fashion. But before discussing the VMR9 in more detail, let's look first at how a single media stream is processed by the filter graph, as depicted in the figure below.



Basically, the process consists of the phases of parsing, decoding and rendering. For each of these phases, dependent on respectively the source, format and display requirements, a different filter may be used. Synchronization can be either dtermined by the renderer, by pulling new frames in, or by the parser, as in the case of live capture, by pushing data on the stream, possibly causing the loss of data when decoding cannot keep up with the incoming stream.

The VMR was originally introduced to allow for mixing multiple video streams, and allowed for user-defined compositor and allocator/presenter components.


(a) VMR filter(b) multiple VMRs


Before the VMR9, images could be obtained from the video stream by intercepting this stream and copying frames to a texture surface. The VMR9, however, renders the frames directly on Direct3D surfaces, with (obviously) less overhead. Although the VMR9 supports multiple video pins, for combining multiple video streams, it does not allow for independent search or access to these streams. To do this you must deploy multiple video mixing renderers that are connected to a common allocator/presenter component, as depicted on the right of the figure above (b).

When using the VMR9 with Direct3D, the rendering of 3D scenes is driven by the rate at which the video frames are processed.


Lecture on digital dossier for Abramovic, in ViP


the ViP system

In developing the ViP system, I proceeded from the requirement to project live video capture in 3D space. As noted previously, this means that incoming video drives the rendering of 3D scenes and that, hence, capture speed determines the rendering frame rate.

I started with adapting the simple allocator/presenter example from the DirectX 9 SDK, and developed a scene management system that could handle incoming textures from the video stream. The scene class interface, which allows for (one-time) initialization, time-dependent compositing, restore device setting and rendering textures, looks as follows:

  class  scene  {
  	virtual int init(IDirect3DDevice9*);   // initialize scene (once)
  	virtual int compose(float time);  // compose (in the case of an animation)
  	virtual int restore(IDirect3DDevice9*);  // restore device settings
  	virtual int render(IDirect3DDevice9*  device,  IDirect3DTexture9* texture); 
The scene graph itself was constructed as a tree, using both arrays of (sub) scenes as well as a dictionary for named scenes, which is traversed each time a video texture comes in. The requirements the scene management system had to satisfy are further indicated in section 9.3. Leaving further details aside, observe that for the simple case of one incoming video stream, transferring the texture by parameter suffices.

Later on, I adapted the GamePlayer which uses multiple video mixing renderes, and then the need arose to use a different way of indexing and accessing the textures from the video stream(s). So, since it is good practice in object-oriented software engineering to suppress parameters by using object instance variables, the interface for the scene class changed into:

  class  scene  {
  	virtual int load();   // initialize scene (once)
  	virtual int compose();  // compose (in the case of an animation)
  	virtual int restore();  // restore device settings
  	virtual int render();  // display the (sub) scene
Adopting the scene class as the unifying interface for all 3D objects and compound scenes proved to be a convenient way to control the complexity of the ViP application. However, for manipulating the textures and allocating shader effects to scenes, I needed a global data structure (dictionaries) to access these items by name, whenever needed. As a final remark, which is actually more concerned with the software engineering of such systems that its functionality per se, to be able to deal with the multiple variant libraries that existed in the various releases of DirectX 9, it was needed to develop the ViP library and its components as a collection of DLLs, to avoid the name and linking clashes that would otherwise occur.


installationreality of TV news


example(s) -- reality of TV news

The Reality of TV news project by Peter Frucht uses an interesting mix of technology:
  • live video capture from the device of an external USB2.0 TV card
  • live audio capture from the soundcard (line in)
  • display of live audio and video with java3D (had to be invented)
  • autonomous 3D objects with a specified lifetime
  • collision behaviour (had to be invented)
  • changing of texture-, material- and sound characteristics at runtime
  • dual-screen display with each screen rotated toward the other by 45 degrees about the Y-axis
  • 3D sound
In the project, as phrased by Peter Frucht, the permanent flow of the alternating adverts and news reports are captured live and displayed in a 3D virtual-reality installation. The currently captured audio and video data is displayed on the surface of 3D shapes as short loops. The stream enters the 3D universe piece by piece (like water drops), in this way it is getting displaced in time and space - news reports and advertising will be displayed partly in the same time. By colliding to each other the 3D shapes exchange video material. This re-editing mixes the short loops together, for instance some pieces of advertising will appear while the newsreader speaks.

The software was developed by Martin Bouma, Anthony Augustin and Peter Frucht himself, with jdk 1.5, java3d 1.31, Java Media Framework 2.1.1e. The primary technological background of the artist, Peter Frucht, was the book CodeArt,  [CodeArt], by his former professor from the Media Art School in Cologne, Germany. The book is unfortunately only available in German, and should be translated in English!

research directions -- augmented reality

In the theatre production that motivated the development of the ViP system, the idea was to have wearable LCD-projection glasses, with a head-mounted low resolution camera. This setup is common in augmented reality applications, where for example a historic site is enriched with graphics and text, laid on top of the (video rendered) view of the site. Since realtime image analysis is generally not feasible, either positioning and orientation information must be used, or simplified markers indicating the significant spots in the scene, to determine what information to use as an overlay and how it should be displayed.

The ARToolkit is an advanced, freely available, toolkit, that uses fast marker recognition to determine the viewpoint of a spectator. The information that is returned on the recognition of a marker includes both position and orientation, which may be used by the application to draw the overlay graphics in accordance with the spectator's viewpoint.

Augnented reality is likely to become a hot thing. In april 2005 it was featured at BBC World, with a tour through Basel.

(C) Æliens 04/09/2009

You may not copy or print any of this material without explicit permission of the author or the publisher. In case of other copyright issues, contact the author.