Multimodal affective input

The purpose of the Emotional Model (EM) is to provide a 'multimodal affective fusion between the chosen input modalities (text and speech)'. The Emotional Model acts both as an affective fusion module and as an emotional representation.

It operates on two distinct linguistic modalities: affective speech parameters (from the user's (acoustic) utterance) and sentiment analysis (affective tagging of the ASR transcript of the user utterance). The resulting value is an emotional category attached to a specific part of the utterance instantiating an Information Extracton (IE) template.

The main purpose of the Emotional Model is to provide instantaneous emotions associated to an utterance, but can serve as a basis for temporal integration (mood representation) as part of the affective content of the User Model (UM).

Affective Fusion is dictated by the contents of each modality. Emotional speech recognition (based on the EmoVoice‚Ѣ system) produces emotional categories corresponding to specific quadrants of an arousal/valence surface, whilst Sentiment Analysis tags utterance segments with a valence category (POS vs NEG).

Affective fusion is rule-based and outputs an EmoVoice category {Neg.Active, Neg.Passive, Neutral, Pos.Passive, Pos.Active}, which is a modification of the original speech analysis based on the valence information from SA, which is considered more reliable and overrides opposite values. However, the procedure supports experimentation once error data (confusion matrices) become available.


Professor Marc Cavazza (Project Leader)
University of Teesside
School of Computing
Middlesbrough TS1 3BA, UK
Phone: +44 (0) 1642 342657

Dr Debora Field (Project Manager)
University of Sheffield
Department of Computer Science
211 Portobello St
Regent Court
Sheffield S1 4DP, UK
Phone: +44 (0) 114 222 8359

European Commission • Sixth Framework Programme - Companions Project © 2010. All Rights Reserved | About this website