Computer Vision And Pattern Analysis Laboratory Home Page  Home
People  People
Publications  Publications
Publications  Databases
Contact Information  Contact
Supported Research Projects  Supported Research Projects
Research Activites  Research Activites
Research Groups
SPIS - Signal Processing and Information Systems Lab.SPIS - Signal Processing and Information Systems Lab.
Medical Vision and Analysis Group  Medical Research Activities
Biometrics Research Group  Biometrics Research Group
SPIS - Signal Processing and Information Systems Lab.MISAM - Machine Intelligence for Speech Audio and Multimedia.
Knowledge Base
  Paper Library
Audio-visual speech recognition in vehicular noise using a multi-classifier approach
Authors: Karabalkan, Harun and Erdoğan, Hakan
Published in: Biennial on DSP for in-Vehicle and Mobile Systems
Publication year: 2007
Abstract: Speech recognition accuracy can be increased and noise robustness can be improved by taking advantage of the visual speech information acquired from the lip region. To combine audio and visual information sources, efficient information fusion techniques are required. In this paper, we propose a novel SVM-HMM tandem hybrid feature extraction and combination method for an audio-visual speech recognition system. From each stream, multiple one-versus-rest support vector machine (SVM) binary classifiers are trained where each word is considered as a class in a limited vocabulary speech recognition scenario. The outputs of the binary classifiers are treated as a vector of features to be combined with the vector from the other stream and new combining binary classifiers are built. The outputs of the classifiers are used as observed features in hidden Markov models (HMM) representing words. The whole process can be considered as a nonlinear feature dimension reduction system which extracts highly discriminatory features from limited amounts of training data. To simulate the performance of the system in a real-world environment, we add vehicular noise at different SNRs to speech data and perform extensive experiments.
  download full paper

Home Back