Computer Vision And Pattern Analysis Laboratory Home Page  Home
People  People
Publications  Publications
Publications  Databases
Contact Information  Contact
Supported Research Projects  Supported Research Projects
Research Activites  Research Activites
Research Groups
SPIS - Signal Processing and Information Systems Lab.SPIS - Signal Processing and Information Systems Lab.
Medical Vision and Analysis Group  Medical Research Activities
Biometrics Research Group  Biometrics Research Group
SPIS - Signal Processing and Information Systems Lab.MISAM - Machine Intelligence for Speech Audio and Multimedia.
Knowledge Base
  Paper Library
Incorporating Language Constraints in Sub-word based Speech Recognition
Authors: H. Erdogan, O. Buyuk, K. Oflazer
Published in: IEEE Automatic Speech Recognition and Understanding Workshop
Publication year: 2005
Abstract: In large vocabulary continuous speech recognition (LVCSR) for agglutinative and inflectional languages, we encounter problems due to theoretically infinite full-word lexicon size. Sub-word lexicon units may be utilized to dramatically reduce the out-of-vocabulary rate in test data. One can develop language models based on sub-word units to perform LVCSR. However, it has not always been beneficial to use sub-word lexicon units, since shorter units have higher acoustic confusability among them and language model history is effectively shorter as compared to the history in full-word language models. To reduce the aforementioned problems, we propose using the longest possible sub-word units in our lexicon, namely half-words and full-words only. We also incorporate linguistic rules of half-word combination into our statistical language model. The language constraints are represented with a rule-based WFSM which can be combined with an N-gram language model to yield a better and smaller language model. We study the performance of the proposed system for Turkish LVCSR, when the language constraint takes the form of enforcing vowel harmony between stems and endings. We also introduce novel error-rate metrics that are more appropriate than word-error-rate for agglutinative languages. Using half-words with a bi-gram model yields a significant reduction in word-error-rate as compared to a bi-gram full-word model. In addition, combining a tri-gram half-word language model with the vowel-harmony WFSM improves the accuracy further when rescoring the bi-gram lattices.
  download full paper

Home Back