Title | Practical Evaluation of Human and Synthesized Speech for Virtual Human Dialogue Systems |
Publication Type | Conference Paper |
Year of Publication | 2012 |
Authors | Georgila, K., A. W. Black, K. Sagae, and D. R. Traum |
Conference Name | International Conference on Language Resources and Evaluation (LREC) |
Date Published | May 2012 |
Conference Location | Istanbul, Turkey |
Abstract | The current practice in virtual human dialogue systems is to use professional human recordings or limited-domain speech synthesis. Both approaches lead to good performance but at a high cost. To determine the best trade-off between performance and cost, we perform a systematic evaluation of human and synthesized voices with regard to naturalness, conversational aspect, and likability. We also vary the type (in-domain vs. out-of-domain), length, and content of utterances, and take into account the age and native language of raters as well as their familiarity with speech synthesis. We present detailed results from two studies, a pilot one and one run on Amazon's Mechanical Turk. Our results suggest that a professional human voice can supersede both an amateur human voice and synthesized voices. Also, a high-quality general-purpose voice or a good limited-domain voice can perform better than amateur human recordings. We do not find any significant differences between the performance of a high-quality general-purpose voice and a limited-domain voice, both trained with speech recorded by actors. As expected, in most cases, the high-quality general-purpose voice is rated higher than the limited-domain voice for out-of-domain sentences and lower for in-domain sentences. There is also a not statistically significant trend |
URL | http://www.pdfdownload.org/pdf2html/pdf2html.php?url=http%3A%2F%2Fpeople.ict.usc.edu%2F~traum%2FPapers%2Flrec-speechsynthesis2012.pdf&images=yes |