Niloy Mukherjee and Deb Roy. (2003). A Visual Context-Aware Multimodal System for Spoken Language Processing. Proc. Eurospeech, 6 pages.