Joseph Keshet (Toyota Technological Institute at Chicago)
Discriminative Spoken keyword Detection
Sala Grande, Friday, 11th Dec 2009, h:11.00 The current state-of-the-art automatic speech recognizers are mostly
based on hidden Markov models (HMMs). Despite their popularity, HMM-
based approaches have several known drawbacks such as training
objective which is not aimed at optimizing the evaluation objective.
We proposes a new approach for spoken keyword spotting, which is based
on large margin and kernel methods rather than on HMMs. Unlike
previous approaches, the proposed method employs a discriminative
learning procedure, in which the learning phase aims at achieving a
high area under the ROC curve, as this quantity is the most common
measure to evaluate keyword spotters.
The keyword spotter we devise is based on mapping the input acoustic
representation of the speech utterance along with the target keyword
into a vector space. Building on techniques used for large margin and
kernel methods for predicting whole sequences, our keyword spotter
distills to a classifier in this vector-space, which separates speech
utterances in which the keyword is uttered from speech utterances in
which the keyword is not uttered. We describe a simple iterative
algorithm for training the keyword spotter and discuss its formal
properties, showing theoretically that it attains high area under the
ROC curve. Experimental results suggest that on variety standard
speech recognition datasets our discriminative system outperforms the
conventional context-independent HMM-based system.


© 2008 Fondazione Bruno Kessler