Improving Spoken Dialogue Understanding Using Phonetic Mixture Models

TitleImproving Spoken Dialogue Understanding Using Phonetic Mixture Models
Publication TypeBook Chapter
Year of Publication2012
AuthorsWang, W. Y., R. Artstein, A. Leuski, and D. R. Traum
Book TitleCross-Disciplinary Advances in Applied Natural Language Processing: Issues and Approaches

Reasoning about sound similarities improves the performance of a Natural Language Understanding component that interprets speech recognizer output: the authors observed a 5% to 7% reduction in errors when they augmented the word strings with a phonetic representation, derived from the words by means of a dictionary. The best performance comes from mixture models incorporating both word and phone features. Since the phonetic representation is derived from a dictionary, the method can be applied easily without the need for integration with a specific speech recognizer. The method has similarities with autonomous (or bottom-up) psychological models of lexical access, where contextual information is not integrated at the stage of auditory perception but rather later.