Deb Roy. (2002). A Trainable Visually-Grounded Spoken Language Generation System. In Proceedings of the International Conference of Spoken Language Processing.