The PAMELA functional requirements included:
autonomous and on-demand search capabilities,
(user and system) modifiablepreferences, and
multimedia presentation facilities.
In a similar way as the soccer players, PAMELA has control over
objects in 3D space.
PAMELA now also provides animation facilities
for its avatar embodiment.
To realize the PAMELA representative, we studied
how to effect facial animations and
body movements following the Humanoid
Animation Working Group proposal.
The H-Anim proposal lists a number of control points
for (the representation of the) human body and face,
that may be manipulated upto six degrees of freedom.
Six degrees of freedom allows for movement and rotation
along any of the X,Y,Z axes.
In practice, movement and rotation for body and face
control points will be constrained though.
- control points -- joints, limbs and facial features
Now, just imagine how such an assistant could be of help
in multimedia information retrieval.
Given any collection of results, PAMELA could
design some spatial layout and select suitableobject types,
including for example color-based relevance cues,
to present the results in a scene.
PAMELA could then navigate you through the scene,
indicating the possible relevance of particular results.
But we could go one step further than this and,
taking inspiration fromthe research field
of persuasive technology, think about possible
persuasion games we could play,
using the (facial and body) animation facilities of PAMELA:
Just think of a news readerpresenting a hot news item.
or a news reader trying to provoke a comment
on some hot issue.
Playing another trick on the PAMELA acronym, we could think
- single avatar persuasive argumentation
- multiple avatar dialog games
Persuasive Agent with Multimedia Enlightened Arguments
I agree, this sounds too flashy for my taste as well.
But, what this finale
is meant to express
is, simply, that I see it as a challenge
to create such synthetic actors using the
research directions -- embodied conversational agents
A variety of applications may benefit from deploying
embodied conversational agents, either in the form
of animated humanoid avatars or, more simply, as a 'talking head'.
An interesting example is provided by
a system that allows for translating arbitrary text in both spoken
language and sign language for the deaf, presented by animated
Here the use of animated avatars is essential to communicate
with a particular group of users, using the sign language
for the deaf.
Other applications of embodied conversational agents
include e-commerce and social marketing,
although in these cases it may not always be evident that
animated avatars or faces actually do provide added value.
Another usage of embodied conversational agents
may be observed in virtual environments
such as Active Worlds,
and Adobe Atmosphere.
Despite the rich literary background of such environments,
including Neil Stephenson's Snow Crash,
the functionality of such agents is usually rather shallow,
due to the poor repertoire of gestures and movements
on the one hand and the restricted computational model
underlying these agents on the other hand.
In effect, the definition of agent avatars in virtual
environments generally relies on a proprietary
scripting language which, as in the case
of blaxxun Agents, offers only limited
pattern matching and a fixed repertoire of built-in actions.
In contrast, the scripting language for Signing Avatar
is based on the H-Anim standard
for a precise definition of a complex repertoire
of gestures, as exemplified by the sign language for the deaf.
Nevertheless, also this scripting language is of a
proprietary nature and does not allow for higher-order
abstractions of semantically meaningful behavior.
In this section we introduced a software platform for agants.
This platform not only offers
powerful computational capabilities
but also an expressive
scripting language (STEP) for defining gestures and driving the behavior
of our humanoid agent avatars.
The design of the scripting language was motivated by
the requirements listed below.
- convenience -- for non-professional authors
- compositional semantics -- combining operations
- re-definability -- for high-level specification of actions
- parametrization -- for the adaptation of actions
- interaction -- with a (virtual) environment
Our scripting language STEP
meets these requirements.
STEP is based on dynamic logic [DL] and allows
for arbitrary abstractions using the primitives and
composition operators provided by our logic.
STEP is implemented on top of DLP,
As a last bit of propaganda:
The DLP+X3D platform provides together with the STEP
the computational facilities for defining semantically meaningful
behaviors and allows for a rich presentational
in particular 3D virtual environments that may include
streaming video, text and speech.
See appendix D for more details.
The primary criterium against which to evaluate
applications that involve embodied conversational
agents is whether the application becomes more effective
by using such agents. Effective, in terms of communication
with the user.
Evidently, for the Signing Avatar application this
seems to be quite obvious.
For other applications, for example negotiation in e-commerce,
this question might be more difficult to answer.
As concerns the embedding of conversationl agents in VR,
we might make a distinction between presentational VR,
instructional VR and educational VR.
An example of educational VR is described in [EducationalVR].
No mention of agents was made in the latter reference though.
In instructional VR, explaining for example the use of a machine,
the appearance of a conversational agent seems to be quite natural.
In presentational VR, however,
the appearance of such agents might be considered as no more
than a gimmick.
Considering the use of agents in applications in general,
we must make a distinction between information agents,
presentation agents and conversational agents.
Although the boundaries between these categories are not clearcut,
there seems to be an increasing degree of interactivity with the user.
From a system perspective, we might be interested in what range of
agent categories the system covers.
Does it provide support for managing information and possibly
Another issue in this regard could be whether the system
is built around open standards, such as XML and X3D, to allow for
the incorporation of a variety of content.
Last but not least, from a user perspective, what seems
to matter most is the naturalness of the (conversational) agents.
This is determined by the graphical quality, as well as
contextual parameters, that is how well the agent is embedded
in its environment.
More important even are emotive parameters,
that is the mood and style (in gestures and possibly speech)
with which the agents manifest themselves.
In other words, the properties that determine whether
an agent is (really) convincing.
You may not copy or print any of this material without explicit permission of the author or the publisher.
In case of other copyright issues, contact the author.