We present a general method for integrating visual components into a multi-modal cognitive system. The integration is very generic and can combine an arbitrary set of modalities. We illustrate our integration approach with a specific instantiation of the architecture schema that focuses on integration of vision and language: a cognitive system able to collaborate with a human, learn and display some understanding of its surroundings. As examples of cross-modal interaction we describe mechanisms for clarification and visual learning
To submit an update or takedown request for this paper, please submit an Update/Correction/Removal Request.