Practical Evaluation of Human and Synthesized Speech for Virtual Human Dialogue Systems

Title	Practical Evaluation of Human and Synthesized Speech for Virtual Human Dialogue Systems
Publication Type	Conference Paper
Year of Publication	2012
Authors	Georgila, K., A. W. Black, K. Sagae, and D. R. Traum
Conference Name	International Conference on Language Resources and Evaluation (LREC)
Date Published	May 2012
Conference Location	Istanbul, Turkey
Abstract	The current practice in virtual human dialogue systems is to use professional human recordings or limited-domain speech synthesis. Both approaches lead to good performance but at a high cost. To determine the best trade-off between performance and cost, we perform a systematic evaluation of human and synthesized voices with regard to naturalness, conversational aspect, and likability. We also vary the type (in-domain vs. out-of-domain), length, and content of utterances, and take into account the age and native language of raters as well as their familiarity with speech synthesis. We present detailed results from two studies, a pilot one and one run on Amazon's Mechanical Turk. Our results suggest that a professional human voice can supersede both an amateur human voice and synthesized voices. Also, a high-quality general-purpose voice or a good limited-domain voice can perform better than amateur human recordings. We do not find any significant differences between the performance of a high-quality general-purpose voice and a limited-domain voice, both trained with speech recorded by actors. As expected, in most cases, the high-quality general-purpose voice is rated higher than the limited-domain voice for out-of-domain sentences and lower for in-domain sentences. There is also a not statistically significant trend for long or negative-content utterances to receive lower ratings.
URL	http://www.pdfdownload.org/pdf2html/pdf2html.php?url=http%3A%2F%2Fpeople.ict.usc.edu%2F~traum%2FPapers%2Flrec-speechsynthesis2012.pdf&images=yes

Natural Language Dialogue group

Primary links

Practical Evaluation of Human and Synthesized Speech for Virtual Human Dialogue Systems