46,565 research outputs found
Evaluating the Representational Hub of Language and Vision Models
The multimodal models used in the emerging field at the intersection of
computational linguistics and computer vision implement the bottom-up
processing of the `Hub and Spoke' architecture proposed in cognitive science to
represent how the brain processes and combines multi-sensory inputs. In
particular, the Hub is implemented as a neural network encoder. We investigate
the effect on this encoder of various vision-and-language tasks proposed in the
literature: visual question answering, visual reference resolution, and
visually grounded dialogue. To measure the quality of the representations
learned by the encoder, we use two kinds of analyses. First, we evaluate the
encoder pre-trained on the different vision-and-language tasks on an existing
diagnostic task designed to assess multimodal semantic understanding. Second,
we carry out a battery of analyses aimed at studying how the encoder merges and
exploits the two modalities.Comment: Accepted to IWCS 201
Explainable Server Cooling Schedule Prediction Using Machine Learned Model Conditioned on Multimodal Data
Server cooling management frameworks that utilize neural networks are trained with a stream of multimodal sensor data, However, model predictions from such models lack explainability. This disclosure describes a fine-tuned model conditioned on multimodal sensor data to perform cooling schedule prediction. The model can also provide explainability by responding to natural language queries. The approach utilizes a transformer decoder architecture, with the model conditioned on multimodal sensor data from a natural server environment. The conditioning enables localizing the understanding of a language model to the specific context of use for server cooling scheduling decisions
SNeL: A Structured Neuro-Symbolic Language for Entity-Based Multimodal Scene Understanding
In the evolving landscape of artificial intelligence, multimodal and
Neuro-Symbolic paradigms stand at the forefront, with a particular emphasis on
the identification and interaction with entities and their relations across
diverse modalities. Addressing the need for complex querying and interaction in
this context, we introduce SNeL (Structured Neuro-symbolic Language), a
versatile query language designed to facilitate nuanced interactions with
neural networks processing multimodal data. SNeL's expressive interface enables
the construction of intricate queries, supporting logical and arithmetic
operators, comparators, nesting, and more. This allows users to target specific
entities, specify their properties, and limit results, thereby efficiently
extracting information from a scene. By aligning high-level symbolic reasoning
with low-level neural processing, SNeL effectively bridges the Neuro-Symbolic
divide. The language's versatility extends to a variety of data types,
including images, audio, and text, making it a powerful tool for multimodal
scene understanding. Our evaluations demonstrate SNeL's potential to reshape
the way we interact with complex neural networks, underscoring its efficacy in
driving targeted information extraction and facilitating a deeper understanding
of the rich semantics encapsulated in multimodal AI models
- …