In audio-visual speech recognition (AVSR), it is beneficial to use lip boundary information in addition to texture-dependent features. In this research, we propose an automatic lip segmentation method that can be used in AVSR systems.
The algorithm consists of the following steps: face detection, lip corners extraction, adaptive color space training for lip and non-lip regions using Gaussian mixture models (GMMs), and curve evolution using level-set formulation based on region and image gradients fields. Region-based fields are obtained using adapted GMM likelihoods. We have tested the proposed algorithm on a database (SU-TAV) of 100 facial images and obtained objective performance results by comparing automatic lip segmentations with hand-marked ground truth segmentations. Experimental results are promising and much work has to be done to improve the robustness of the proposed method.