Wonjune Kang, Brandon C. Roy, and Wesley Chow. Multimodal Speaker Diarization of Real-World Meetings using D-Vectors with Spatial Features. In Proceedings of the IEEE International Conference on Acoustics, Speech, and Signal Processing (ICASSP), pp. 6509-6513. IEEE, 2020.