FBK > IT > Content

Joseph Keshet (Toyota Technological Institute at Chicago)

11 Dec 2009 - 11:00

Discriminative Spoken keyword Detection

Sala Grande, Friday, 11th Dec 2009, h:11.00 
 

 The current state-of-the-art automatic speech recognizers are mostly based on hidden Markov models (HMMs). Despite their popularity, HMM- based approaches have several known drawbacks such as training objective which is not aimed at optimizing the evaluation objective. We proposes a new approach for spoken keyword spotting, which is based on large margin and kernel methods rather than on HMMs. Unlike previous approaches, the proposed method employs a discriminative learning procedure, in which the learning phase aims at achieving a high area under the ROC curve, as this quantity is the most common measure to evaluate keyword spotters.
The keyword spotter we devise is based on mapping the input acoustic representation of the speech utterance along with the target keyword into a vector space. Building on techniques used for large margin and kernel methods for predicting whole sequences, our keyword spotter distills to a classifier in this vector-space, which separates speech utterances in which the keyword is uttered from speech utterances in which the keyword is not uttered. We describe a simple iterative algorithm for training the keyword spotter and discuss its formal properties, showing theoretically that it attains high area under the ROC curve. Experimental results suggest that on variety standard speech recognition datasets our discriminative system outperforms the conventional context-independent HMM-based system.