Evaluation of Dialogue Systems and Virtual Humans

We use a variety of methods to evaluate the effectiveness of our dialogue systems. These include component evaluation, such as using a gold standard approach for speech recognition and NLU performance, and standard external methods such as user questionnaires and task success rate. An increasing part of our efforts is also put into developing new methods for evaluating virtual humans in complex dialogue situations. Since most of our agents' functions are conversational rather than task-oriented, we focus on methods to uncover what contributes to a sense of quality of the overall dialogue interaction between virtual human and user through analysis of transcripts of dialogues. Through this process, we have developed several annotation schemes for evaluating virtual human performance.

NLD Group Leaders

People

Alumni

Projects

Publications

Nouri E, Georgila K, Traum DR. Culture-specific models of negotiation for virtual characters: multi-attribute decision-making based on culture-specific values. AI & Society, Special Issue on Culturally Motivated Virtual Characters. In Press.
BibTex
Google Scholar
Robinson S, Roque A, Traum DR. Dialogues in Context: An Objective User-Oriented Evaluation Approach for Virtual Human Dialogue. In: 7th International Conference on Language Resources and Evaluation (LREC). Valletta, Malta; 2010. Abstract
BibTex
Google Scholar
Full Text
Yao X, Bhutada P, Georgila K, Sagae K, Artstein R, Traum DR. Practical Evaluation of Speech Recognizers for Virtual Human Dialogue Systems. In: LREC-2010. Valetta, Malta; 2010. Abstract
BibTex
Google Scholar
Full Text
Artstein R, Gandhe S, Gerten J, Leuski A, Traum DR. Semi-formal Evaluation of Conversational Characters. In: Grumberg O, Kaminski M, Katz S, Wintner S, editors. Languages: From Formal to Natural. Essays Dedicated to Nissim Francez on the Occasion of His 65th Birthday. Vol 5533. Heidelberg: Springer; 2009. p. 22-35. (Lecture Notes in Computer Science; vol 5533).
BibTex
Google Scholar
Gandhe S, Traum DR. An evaluation understudy for dialogue coherence models. In: SIGdial '08: Proceedings of the 9th SIGdial Workshop on Discourse and Dialogue. Morristown, NJ, USA: Association for Computational Linguistics; 2008. p. 172-81.
BibTex
Google Scholar
Jan D, Herrera D, Martinovski B, Novick DG, Traum DR. A Computational Model of Culture-Specific Conversational Behavior. In: IVA. Paris, France: Springer; 2007. p. 45-56.
BibTex
Google Scholar
Full Text
Robinson S, Roque A, Vaswani A, Hernandez C, Millspaugh B, Traum DR. Evaluation of a Spoken Dialogue System for Virtual Reality Call for Fire Training. In: 25th Army Science Conference. Orlando, Fl.; 2006. Abstract
BibTex
Google Scholar
Full Text
Traum DR, Robinson S, Stephan J. Evaluation of multi-party virtual reality dialogue interaction. In: Proceedings of Fourth International Conference on Language Resources and Evaluation (LREC 2004).; 2004. p. 1699-702.
BibTex
Google Scholar
Full Text

Natural Language Dialogue group

Primary links

NLD Group Leaders

People

Alumni

Projects

Publications