Maxine allows the creation, organization, manipulation and animation of the different elements that constitute the virtual
scenario through a scene graph. The following elements can be represented in a scene graph:
• Images and texts. They can be shown and positioned as rectangles oriented in space. The tool allows the loading
of the main graphic formats (bmp, gif, jpeg, pic, png, rgb, tga, tiff) and can manage alpha-channel when it is supported
by the graphic format.
• Simple geometric primitives but also complex geometric models. Maxine can import the most popular formats
(3DS, flt, lwo, md2, obj, osg, DirectX), so that complex virtual scenarios with high level of detail can be created.
• Simple lights.
• 3D and ambient sound.
• Animated characters. They use the format of the Cal3D animation library and can be generated with common
commercial 3D modelling and animation toolkit. Different types of animations are available including secondary
animation to increase the characters’ expressivity and realism.
• Animated Actors. They are a specialization of the previous ones and they are provided with voice synthesis
and facial animation with lip-synch, following the VHML standard and the MPEG4 specification for
• Synthetic voices. Text can be provided by the user or by file, and can be associated to an actor (or not). Several tags
are used to configure and control some audio characteristics like voice selection, volume control, speech speed,
insertion of pauses, tone, word emphasis, pronunciation specification, etc.
The engine can manage several auxiliary elements like cameras, group of elements, animators (for animating group
of elements),... and can also include animations coming from motion capture systems.
The Maxine Modules: Managing the agents
The “Sensory Module” integrates all the information coming from inputs to the system. Special attention was paid
to creating multimodal user interaction, via mouse, via text thanks to the console and the scripting language, via voice by
means of simple orders or questions caught by a microphone and via webcam to obtain from user’s face information such
as his/her emotional state.
The “Perception Module” is responsible for extracting the relevant information present in the input information. The
interpretation of the voice input and the user’s image is of particular interest. The voice recognition engine has been
constructed on the basis of Loquendo ASR (Audio Speech Recognition) software. The objective of the image input
is to detect the user’s emotional state.
The “Deliberative Module” analyses the information from the Perception Module and is responsible for decision making.
It is basically in charge of generating the answers to the user’s questions and for classifying the user’s emotions. In
both cases, a static (fixed answers) and dynamic (precedent questions) knowledge base is required. The answer engine
system is based on the chatbot technology of the Artificial Intelligence Foundation using CyN under GNU
GPL licence. In our case, the system has been designed for short and specific conversations. The knowledge base of the
virtual character is stored in AIML files (Artificial Intelligence Markup Language).
The “Generative Module” is responsible for establishing the avatar’s reactions to the system inputs. It receives information from the Perception Module (in the case of purely reactive actions) or from the Deliberative Module (when decisions are considered). For the time being, the behaviour is based on an action/reaction schema and is managed by a
hierarchical state machine. This module plays around with the avatar’s body movements, facial expressions and voice.
In all three cases, the answer is modified in accordance with avatar’s emotional state.
The “Motor Module” manages and supervises the carrying out of low-level actions involving, in particular, the execution, synchronization and blending of the avatar’s body and face animations, lip-sync and the synchronization of execution
sequences, etc. The skeletal animation technique is used for both facial and body animation and the nomenclature
followed is that of the VHML standard.
The Maxine architecture: Open source libraries
© GIGA - Affective Lab | Departamento de Informática e Ingeniería de Sistemas C/ María de
Luna, 1 50015 Zaragoza