132 research outputs found
Adaptive Robotic Information Gathering via Non-Stationary Gaussian Processes
Robotic Information Gathering (RIG) is a foundational research topic that
answers how a robot (team) collects informative data to efficiently build an
accurate model of an unknown target function under robot embodiment
constraints. RIG has many applications, including but not limited to autonomous
exploration and mapping, 3D reconstruction or inspection, search and rescue,
and environmental monitoring. A RIG system relies on a probabilistic model's
prediction uncertainty to identify critical areas for informative data
collection. Gaussian Processes (GPs) with stationary kernels have been widely
adopted for spatial modeling. However, real-world spatial data is typically
non-stationary -- different locations do not have the same degree of
variability. As a result, the prediction uncertainty does not accurately reveal
prediction error, limiting the success of RIG algorithms. We propose a family
of non-stationary kernels named Attentive Kernel (AK), which is simple, robust,
and can extend any existing kernel to a non-stationary one. We evaluate the new
kernel in elevation mapping tasks, where AK provides better accuracy and
uncertainty quantification over the commonly used stationary kernels and the
leading non-stationary kernels. The improved uncertainty quantification guides
the downstream informative planner to collect more valuable data around the
high-error area, further increasing prediction accuracy. A field experiment
demonstrates that the proposed method can guide an Autonomous Surface Vehicle
(ASV) to prioritize data collection in locations with significant spatial
variations, enabling the model to characterize salient environmental features.Comment: International Journal of Robotics Research (IJRR). arXiv admin note:
text overlap with arXiv:2205.0642
Direct Preference Optimization for Neural Machine Translation with Minimum Bayes Risk Decoding
Minimum Bayes Risk (MBR) decoding can significantly improve translation
performance of Multilingual Large Language Models (MLLMs). However, MBR
decoding is computationally expensive and in this paper, we show how recently
developed Reinforcement Learning (RL) technique, Direct Preference Optimization
(DPO) can be used to fine-tune MLLMs so that we get the gains from MBR without
the additional computation in inference. Our fine-tuned models have
significantly improved performance on multiple NMT test sets compared to base
MLLMs without preference optimization. Our method boosts the translation
performance of MLLMs using relatively small monolingual fine-tuning sets
DiSProD: Differentiable Symbolic Propagation of Distributions for Planning
The paper introduces DiSProD, an online planner developed for environments
with probabilistic transitions in continuous state and action spaces. DiSProD
builds a symbolic graph that captures the distribution of future trajectories,
conditioned on a given policy, using independence assumptions and approximate
propagation of distributions. The symbolic graph provides a differentiable
representation of the policy's value, enabling efficient gradient-based
optimization for long-horizon search. The propagation of approximate
distributions can be seen as an aggregation of many trajectories, making it
well-suited for dealing with sparse rewards and stochastic environments. An
extensive experimental evaluation compares DiSProD to state-of-the-art planners
in discrete-time planning and real-time control of robotic systems. The
proposed method improves over existing planners in handling stochastic
environments, sensitivity to search depth, sparsity of rewards, and large
action spaces. Additional real-world experiments demonstrate that DiSProD can
control ground vehicles and surface vessels to successfully navigate around
obstacles.Comment: International Joint Conference on Artificial Intelligence (IJCAI)
2023. For project website, see https://pecey.github.io/DiSProD
Control-DAG: Constrained Decoding for Non-Autoregressive Directed Acyclic T5 using Weighted Finite State Automata
The Directed Acyclic Transformer is a fast non-autoregressive (NAR) model
that performs well in Neural Machine Translation. Two issues prevent its
application to general Natural Language Generation (NLG) tasks: frequent
Out-Of-Vocabulary (OOV) errors and the inability to faithfully generate entity
names. We introduce Control-DAG, a constrained decoding algorithm for our
Directed Acyclic T5 (DA-T5) model which offers lexical, vocabulary and length
control. We show that Control-DAG significantly enhances DA-T5 on the Schema
Guided Dialogue and the DART datasets, establishing strong NAR results for
Task-Oriented Dialogue and Data-to-Text NLG.Comment: 11 pages. NAACL 202
Model-Agnostic Multi-Agent Perception Framework
Existing multi-agent perception systems assume that every agent utilizes the
same model with identical parameters and architecture. The performance can be
degraded with different perception models due to the mismatch in their
confidence scores. In this work, we propose a model-agnostic multi-agent
perception framework to reduce the negative effect caused by the model
discrepancies without sharing the model information. Specifically, we propose a
confidence calibrator that can eliminate the prediction confidence score bias.
Each agent performs such calibration independently on a standard public
database to protect intellectual property. We also propose a corresponding
bounding box aggregation algorithm that considers the confidence scores and the
spatial agreement of neighboring boxes. Our experiments shed light on the
necessity of model calibration across different agents, and the results show
that the proposed framework improves the baseline 3D object detection
performance of heterogeneous agents
A Two Dimensional Feature Engineering Method for Relation Extraction
Transforming a sentence into a two-dimensional (2D) representation (e.g., the
table filling) has the ability to unfold a semantic plane, where an element of
the plane is a word-pair representation of a sentence which may denote a
possible relation representation composed of two named entities. The 2D
representation is effective in resolving overlapped relation instances.
However, in related works, the representation is directly transformed from a
raw input. It is weak to utilize prior knowledge, which is important to support
the relation extraction task. In this paper, we propose a two-dimensional
feature engineering method in the 2D sentence representation for relation
extraction. Our proposed method is evaluated on three public datasets (ACE05
Chinese, ACE05 English, and SanWen) and achieves the state-of-the-art
performance. The results indicate that two-dimensional feature engineering can
take advantage of a two-dimensional sentence representation and make full use
of prior knowledge in traditional feature engineering. Our code is publicly
available at
https://github.com/Wang-ck123/A-Two-Dimensional-Feature-Engineering-Method-for-Entity-Relation-Extractio
Research on digital tool in cognitive assessment: a bibliometric analysis
ObjectiveThe number of research into new cognitive assessment tools has increased rapidly in recent years, sparking great interest among professionals. However, there is still little literature revealing the current status and future trends of digital technology use in cognitive assessment. The aim of this study was to summarize the development of digital cognitive assessment tools through the bibliometric method.MethodsWe carried out a comprehensive search in the Web of Science Core Collection to identify relevant papers published in English between January 1, 2003, and April 3, 2023. We used the subjects such as “digital,” “computer,” and “cognitive,” and finally 13,244 related publications were collected. Then we conducted the bibliometric analysis by Bibliometrix” R-package, VOSviewer and CiteSpace software, revealing the prominent countries, authors, institutions, and journals.Results11,045 articles and 2,199 reviews were included in our analyzes. The number of annual publications in this field was rising rapidly. The results showed that the most productive countries, authors and institutions were primarily located in economically developed regions, especially the North American, European, and Australian countries. Research cooperation tended to occur in these areas as well. The application of digital technology in cognitive assessment appealed to growing attention during the outbreak of the COVID-19 epidemic.ConclusionDigital technology uses have had a great impact on cognitive assessment and health care. There have been substantial papers published in these areas in recent years. The findings of the study indicate the great potential of digital technology in cognitive assessment
Improving hateful memes detection via learning hatefulness-aware embedding space through retrieval-guided contrastive learning
Hateful memes have emerged as a significant concern on the Internet. These
memes, which are a combination of image and text, often convey messages vastly
different from their individual meanings. Thus, detecting hateful memes
requires the system to jointly understand the visual and textual modalities.
However, our investigation reveals that the embedding space of existing
CLIP-based systems lacks sensitivity to subtle differences in memes that are
vital for correct hatefulness classification. To address this issue, we propose
constructing a hatefulness-aware embedding space through retrieval-guided
contrastive training. Specifically, we add an auxiliary loss that utilizes hard
negative and pseudo-gold samples to train the embedding space. Our approach
achieves state-of-the-art performance on the HatefulMemes dataset with an AUROC
of 86.7. Notably, our approach outperforms much larger fine-tuned Large
Multimodal Models like Flamingo and LLaVA. Finally, we demonstrate a
retrieval-based hateful memes detection system, which is capable of making
hatefulness classification based on data unseen in training from a database.
This allows developers to update the hateful memes detection system by simply
adding new data without retraining, a desirable feature for real services in
the constantly-evolving landscape of hateful memes on the Internet
Fine-grained Late-interaction Multi-modal Retrieval for Retrieval Augmented Visual Question Answering
Knowledge-based Visual Question Answering (KB-VQA) requires VQA systems to
utilize knowledge from external knowledge bases to answer visually-grounded
questions. Retrieval-Augmented Visual Question Answering (RA-VQA), a strong
framework to tackle KB-VQA, first retrieves related documents with Dense
Passage Retrieval (DPR) and then uses them to answer questions. This paper
proposes Fine-grained Late-interaction Multi-modal Retrieval (FLMR) which
significantly improves knowledge retrieval in RA-VQA. FLMR addresses two major
limitations in RA-VQA's retriever: (1) the image representations obtained via
image-to-text transforms can be incomplete and inaccurate and (2) relevance
scores between queries and documents are computed with one-dimensional
embeddings, which can be insensitive to finer-grained relevance. FLMR overcomes
these limitations by obtaining image representations that complement those from
the image-to-text transforms using a vision model aligned with an existing
text-based retriever through a simple alignment network. FLMR also encodes
images and questions using multi-dimensional embeddings to capture
finer-grained relevance between queries and documents. FLMR significantly
improves the original RA-VQA retriever's PRRecall@5 by approximately 8\%.
Finally, we equipped RA-VQA with two state-of-the-art large
multi-modal/language models to achieve VQA score in the OK-VQA
dataset.Comment: To appear at NeurIPS 2023. This is the camera-ready version. We fixed
some numbers and added more experiments to address reviewers' comment
- …