PETRA'23 @ Corfu, Greece

Vision and Language Interaction in Open Worlds (VLIOW)

Abstract

Vision and language interaction has become an increasingly important research area that brings together the machine vision and natural language processing communities, attempting to build integrated models that utilize the synergies of both modalities. The integration of vision and language is particularly important for assistive environments where humans might produce natural language utterances that are influenced by their visual perceptions, and conversely, where human natural language understanding is modulated by human perceptions of the environments. Especially embodied assistive technologies such as robotic devices with cameras and natural language interaction capabilities need to be sensitive to the different ways in which humans process multi-modal information. While recent advances in transformer models (e.g., so-called “V&L BERT” models such as LXMERT, ViL-BERT, VL-BERT, and others) have shown impressive results on pre-defined datasets and benchmarks defined for them, it is not clear how well these models would work embedded in larger assistive systems (e.g., natural- language controlled robotic devices, especially in real-time human-robot interaction contexts). Moreover, it is still unclear to what extent these system really utilize both modalities for their performance, and what kind of training data they will need in order to generalize to novel tasks and environments, especially in open-world environments as they are often encoutered in assistive contexts.

Goals

Take stock of the state-of-the-art of vision-language models and their integration of the two modalies with particular focus on how these models can be used in assistive applications.

List of Topics

V&L transformer models and architectures
Neuro-symbolic vision-language integration methods
Applications for vision-language integration
Datasets and benchmarks for vision-language integration
Human vision-language integration

Workshop Organizers

Matthias Scheutz
Tufts University
matthias.scheutz@tufts.edu

Chitta Baral
Arizona State University
chitta@asu.edu

Yezhou Yang
Arizona State University
yang@asu.edu