This paper describes a method for estimating the internal state of a user of a spoken dialog system before his/her first input\r\nutterance. When actually using a dialog-based system, the user is often perplexed by the prompt. A typical system provides more\r\ndetailed information to a user who is taking time to make an input utterance, but such assistance is nuisance if the user is merely\r\nconsidering how to answer the prompt. To respond appropriately, the spoken dialog system should be able to consider the user�s\r\ninternal state before the user�s input. Conventional studies on user modeling have focused on the linguistic information of the\r\nutterance for estimating the user�s internal state, but this approach cannot estimate the user�s state until the end of the user�s first\r\nutterance. Therefore, we focused on the user�s nonverbal output such as fillers, silence, or head-moving until the beginning of the\r\ninput utterance. The experimental data was collected on aWizard ofOz basis, and the labels were decided by five evaluators. Finally,\r\nwe conducted a discrimination experiment with the trained user model using combined features. As a three-class discrimination\r\nresult, we obtained about 85% accuracy in an open test.
Loading....