Audio-Visual Speech Recognition: Audio Noise, Video Noise, and Pronunciation Variability This talk will describe methods for compensating audio noise, video noise, and pronunciation variability in audio-visual speech recognition. Methods for compensating audio noise are quite traditional, and include beamforming, postfiltering, and voice activity detection. Methods for compensating video noise and pronunciation variability are both graph-theoretic in nature, but use graphs with very different semantics. Compensation for video noise uses a dimensionality reduction technique designed to optimally separate the nearest-neighbor graphs representing different phonemes. Compensation for pronunciation variability, on the other hand, uses a phonologically inspired dynamic Bayesian network called the Articulatory Feature Model. Improvements in word error rate are demonstrated using all three compensation techniques.