topical media & game development

talk show tell print



intelligent agents

Visitors in virtual environments are often represented by so-called avatars. Wouldn't it be nice to have intelligent avatars that can show you around, and tell you more about the (virtual) world you're in.

Now, this is how the idea came up to merge the RIF project, which was about information retrieval, with the WASP project, another acronym, which stands for:


Web Agent Support Program

The WASP project aims at realizing intelligent services using both client-side and server-side agents, and possibly multiple agents. The technical vehicle for realizing agents is the language DLP, which stands for


Distributed Logic Programming

Merging the two projects required providing the full VRML EAI API in DLP, so that DLP could be used for programming the dynamic aspects of VRML worlds.


Historically, the WASP project precedes the RIF project, but we started working on it after the RIF project had already started. Merging these two projects had more consequences than we could predict at the time. Summing up:


The major consequence is that we shifted focus with respect to programming the dynamcis of virtual environments. Instead of scripts (in Javascript), Java (through the EAI), and even C++ (to program blaxxun Community Server extensions), we introduced the distributed logic programming language DLP as a uniform computational platform. In particular, for programming inteligent agents a logic programming language is much more suited than any other language. All we had to do was merge DLP with VRML, which we did by lifting the Java EAI to DLP, so that function calls are available as built-ins in the logic programming language.

When experimenting with agents, and in particular communication between agents, we found that communication between agents may be used to maintain a shared state between multiple users. The idea is simple, for each user there is an agent that observes the world using its 'sensors' and that may change the world using its 'effectors'. When it is notified by some other agent (that is co-located with some other user) it can update its world, according to the notification. Enough background and ideas. Let's look at the prototypes that we developed.



multi-user soccer game

To demonstrate the viability of our approach we developed a multi-user soccer game, using the DLP+VRML platform. We chose for this particular application because it offers us a range of challenges.

multi-user soccer game

Without going into detail, just imagine that you and some others wish to participate in a game, but there are no other players thatwant to join. No problem, we just add some intelligent agent football players. And they might as well be taken out when other (human) players announce themselves.

For each agent player, dependent on its role (which might be goal-keeper, defender, mid-fielder and forward), a simple coginitive loop is defined: sensing, thinking, acting. Based on the information the agent gets, which includes the agent's position, the location of the ball, and the location of the goal, the agents decides which action to take. This can be expressed rather succinctly as rules in the logic programming formalism, and also the actions can be effected using the built-in VRML functionality of DLP.

Basically, the VRML-related built-ins allow for obtaining and modifying the values of control points in the VRML world.

control points

  • get/set -- position, rotation, viewpoint
These control points are in fact the identifiable nodes in the scenegraph (that is, technically, the nodes that have been given a name using the DEF construct).

This approach allows us to take an arbitrarily complex VRML world and manipulate it using the control points. On the other hand, there are also built-ins that allow for the creation of objects in VRML. In that case, we have much finer control from the logic programming language.

All in all we estimate that, in comparison with other approaches, programming such a game in DLP takes far less time than it would have taken using the basic programming capabilities of VRML.

agents in virtual environments

Let us analyse in somewhat more detail why agents in virtual environments may be useful. First of all, observe that the phrase agents in virtual environments has two shades of meaning:

agents in virtual environments

  • virtual environments with embedded autonomous agents
  • virtual environments supported by ACL communication
where ACL stands for Agent Communication Language. Our idea, basically is to use an ACL for realizing shared objects, such as for example the ball in the soccer game.

The general concept op multi-user virtual environments (in VRML) has been studied by the Living Worlds Working Group. Let's look at some definitions provided by this working group first. A scene is defined as a geometrically bounded, continuously navigable part of the world. Then, more specifically a world is defined as a collection of (linked) scenes.

Living Worlds

  • scene -- geometrically bounded, continuously navigable
  • world -- collection of (linked) scenes

Now, multi-user virtual environments distinguish themselves from single-user virtual environments by allowing for so-called Shared Objects in scenes, that is objects that can be seen and interacted with by multiple independent users, simultaneously. This requires synchronization among multiple clients, which may either be realized through a server or through client-to-client communication.

Commonly, a distinction is made between a pilot object and a drone object.

Shared Object

  • pilot -- instance that will be replicated
  • drone -- instance that replicates pilot
So, generally speaking, pilot objects control drone objects. There are many ways to realize a pilot-drone replication scheme. We have chosen to use agent technology, and correspondingly we make a distinction between pilot agents, that control the state of a shared object, and drone agents, that merely replicate the state of a shared object.

  • pilot agents -- control state of a shared object
  • drone agents -- replicate the state of a shared object
Since we have (for example in the soccer game) different types of shared objects, we make a further distinction between agents (for each of which there is a pilot and a drone version). So, we have object agents, which control a single shared object (like the soccerball). For these agents the pilot is at the server, and the drone is at the client. We further have agents that control the users' avatars, for which the pilot at user/client side, and the drone either at the server or the client. Finally, we have autonomous agents, like football players, with their own avatar. For those agents, the pilot is at the server, and the drones at the clients.

  • object agents -- controls a single shared object (like the soccerball) pilot at server, drone at client
  • controls users' avatar pilot at user side, drone at server or clients
  • autonomous agents -- like football player, with own avatar pilot at server, drone at clients

Now, this classification of agents gives us a setup that allows for the realization of shared objects in virtual environments in an efficient manner. See  [Community] for details.

The programming platform needed to implement our proposal must satisfy the following requirements.

programming platform

  • VRML EAI support
  • distributed communication capabilities (TCP/IP)
  • multiple threads of control -- for multiple shared objects
  • declarative language -- for agent support
So, we adapted the distributed logic programming language DLP (which in its own right may be called an agent-oriented language avant la lettre), to include VRML capabilities. See the online reference to the AVID project for a further elaboration of these concepts.




The WASP project's chief focus is to develop architectural support for web-aware (multi) agent systems. So, when we (finally) got started with the project we developed a taxonomy along the following dimensions:

taxonomy of agents

  • 2D/3D -- to distinguish between text-based and avatar embodied agents
  • client/server -- to indicate where agents reside
  • single/multi -- as a measure of complexity
A classification along these dimensions results in a lattice, with as the most complex category a 3D-server-multi-agent system, of which the distributed soccer game is an example. See  [Taxonomy].

When we restrict ourselves to 3D-client-single-agent systems, we may think of,for example, navigation or presentation agents, that may help the user to roam around in the world, or that provide support for presenting the results of a query as objects in a 3D scene.

Our original demonstrator for the WASP project was an agent of the latter kind, with the nickname PAMELA, which is an acronym for:


Personal Assistent for Multimedia Electronic Archives

The PAMELA functional requirements included: autonomous and on-demand search capabilities, (user and system) modifiablepreferences, and multimedia presentation facilities.

  • autonomous and on-demand search capabilities
  • (user and system) modifiablepreferences
  • multimedia presentation facilities
It was, however, only later that we added the requirement that PAMELA should be able to live in 3D space.

In a similar way as the soccer players, PAMELA has control over objects in 3D space. PAMELA now also provides animation facilities for its avatar embodiment. To realize the PAMELA representative, we studied how to effect facial animations and body movements following the Humanoid Animation Working Group proposal.


  • control points -- joints, limbs and facial features
The H-Anim proposal lists a number of control points for (the representation of the) human body and face, that may be manipulated upto six degrees of freedom. Six degrees of freedom allows for movement and rotation along any of the X,Y,Z axes. In practice, movement and rotation for body and face control points will be constrained though.

presentation agent

Now, just imagine how such an assistant could be of help in multimedia information retrieval.

presentation agent

Given any collection of results, PAMELA could design some spatial layout and select suitableobject types, including for example color-based relevance cues, to present the results in a scene. PAMELA could then navigate you through the scene, indicating the possible relevance of particular results.

persuasion games

But we could go one step further than this and, taking inspiration fromthe research field of persuasive technology, think about possible persuasion games we could play, using the (facial and body) animation facilities of PAMELA:

persuasion games

  • single avatar persuasive argumentation
  • multiple avatar dialog games
Just think of a news readerpresenting a hot news item. or a news reader trying to provoke a comment on some hot issue. Playing another trick on the PAMELA acronym, we could think of


Persuasive Agent with Multimedia Enlightened Arguments

I agree, this sounds too flashy for my taste as well. But, what this finale is meant to express is, simply, that I see it as a challenge to create such synthetic actors using the DLP+VRML platform.



research directions -- embodied conversational agents

A variety of applications may benefit from deploying embodied conversational agents, either in the form of animated humanoid avatars or, more simply, as a 'talking head'. An interesting example is provided by Signing Avatar, a system that allows for translating arbitrary text in both spoken language and sign language for the deaf, presented by animated humanoid avatars.

Here the use of animated avatars is essential to communicate with a particular group of users, using the sign language for the deaf.

Other applications of embodied conversational agents include e-commerce and social marketing, although in these cases it may not always be evident that animated avatars or faces actually do provide added value.

Another usage of embodied conversational agents may be observed in virtual environments such as Active Worlds, blaxxun Community and Adobe Atmosphere. Despite the rich literary background of such environments, including Neil Stephenson's Snow Crash, the functionality of such agents is usually rather shallow, due to the poor repertoire of gestures and movements on the one hand and the restricted computational model underlying these agents on the other hand. In effect, the definition of agent avatars in virtual environments generally relies on a proprietary scripting language which, as in the case of blaxxun Agents, offers only limited pattern matching and a fixed repertoire of built-in actions.

In contrast, the scripting language for Signing Avatar is based on the H-Anim standard and allows for a precise definition of a complex repertoire of gestures, as exemplified by the sign language for the deaf. Nevertheless, also this scripting language is of a proprietary nature and does not allow for higher-order abstractions of semantically meaningful behavior.

scripting behavior

In this section we introduced a software platform for agants. This platform not only offers powerful computational capabilities but also an expressive scripting language (STEP) for defining gestures and driving the behavior of our humanoid agent avatars.

The design of the scripting language was motivated by the requirements listed below.


  • convenience -- for non-professional authors
  • compositional semantics -- combining operations
  • re-definability -- for high-level specification of actions
  • parametrization -- for the adaptation of actions
  • interaction -- with a (virtual) environment
Our scripting language STEP meets these requirements. STEP is based on dynamic logic  [DL] and allows for arbitrary abstractions using the primitives and composition operators provided by our logic. STEP is implemented on top of DLP,

As a last bit of propaganda:


The DLP+X3D platform provides together with the STEP scripting language the computational facilities for defining semantically meaningful behaviors and allows for a rich presentational environment, in particular 3D virtual environments that may include streaming video, text and speech.

See appendix D for more details.

evaluation criteria

The primary criterium against which to evaluate applications that involve embodied conversational agents is whether the application becomes more effective by using such agents. Effective, in terms of communication with the user. Evidently, for the Signing Avatar application this seems to be quite obvious. For other applications, for example negotiation in e-commerce, this question might be more difficult to answer.

As concerns the embedding of conversationl agents in VR, we might make a distinction between presentational VR, instructional VR and educational VR. An example of educational VR is described in  [EducationalVR]. No mention of agents was made in the latter reference though. In instructional VR, explaining for example the use of a machine, the appearance of a conversational agent seems to be quite natural. In presentational VR, however, the appearance of such agents might be considered as no more than a gimmick.

Considering the use of agents in applications in general, we must make a distinction between information agents, presentation agents and conversational agents. Although the boundaries between these categories are not clearcut, there seems to be an increasing degree of interactivity with the user.

From a system perspective, we might be interested in what range of agent categories the system covers. Does it provide support for managing information and possibly information retrieval? Another issue in this regard could be whether the system is built around open standards, such as XML and X3D, to allow for the incorporation of a variety of content.

Last but not least, from a user perspective, what seems to matter most is the naturalness of the (conversational) agents. This is determined by the graphical quality, as well as contextual parameters, that is how well the agent is embedded in its environment. More important even are emotive parameters, that is the mood and style (in gestures and possibly speech) with which the agents manifest themselves. In other words, the properties that determine whether an agent is (really) convincing.

(C) Æliens 04/09/2009

You may not copy or print any of this material without explicit permission of the author or the publisher. In case of other copyright issues, contact the author.