Spotlight: Interrupting the ECA
The Acoustic Turn-Taking detector (ATT) uses information supplied by the Acoustic Analysis (AA) module to estimate:
- when the user has stopped speaking (ie finished their turn)
- when the user has interrupted the avatar ('barge-in')
End-of-turn information is passed to the Segmenter/Dialogue Act Tagger (DAT), while interrupt notification is passed to the Interruption Manager (IM). In order to detect an interruption, the ATT needs to know when the avatar starts and stops talking. This information is supplied by the Embodied Conversational Agent module (ECA).
In the current (baseline) version of this module, end-of turn information is based simply on the duration of pauses, ie segments of the acoustic input when the user is believed not to be talking. More sophisticated acoustic cues such as falling pitch were used later.
The ATT module determines whether the user is talking by taking the average intensity of all the values in one message from the Acoustic Analyser, ie the intensity over a period determined by the buffer-window-size parameter of the Acoustic Analyser, and comparing this with a ('user-talking') threshold value indicating that the input is of sufficient intensity produced by the user.
A pause is detected when the duration over which a user is deemed not to be talking is greater than the 'pause duration' parameter. Thus, the frequency of Acoustic Analyser messages determines the resolution of the system.
An interruption is detected when the intensity is such that the user is:
- believed to be talking
- is talking at the same time as the avatar has talked for more than a certain duration, or above a second intensity threshold (the 'interrupt' threshold).
These conditions aim to avoid treating back-channel acknowledgements of what the avatar said are interruptions.
Communication with the ATT is possible at run-time, using an additional utility which sends commands via Inamode messages to alter the threshold parameter values and the pause duration.


