Search CORE

132 research outputs found

Adaptive Robotic Information Gathering via Non-Stationary Gaussian Processes

Author: Chen Weizhe
Khardon Roni
Liu Lantao
Publication venue
Publication date: 02/06/2023
Field of study

Robotic Information Gathering (RIG) is a foundational research topic that answers how a robot (team) collects informative data to efficiently build an accurate model of an unknown target function under robot embodiment constraints. RIG has many applications, including but not limited to autonomous exploration and mapping, 3D reconstruction or inspection, search and rescue, and environmental monitoring. A RIG system relies on a probabilistic model's prediction uncertainty to identify critical areas for informative data collection. Gaussian Processes (GPs) with stationary kernels have been widely adopted for spatial modeling. However, real-world spatial data is typically non-stationary -- different locations do not have the same degree of variability. As a result, the prediction uncertainty does not accurately reveal prediction error, limiting the success of RIG algorithms. We propose a family of non-stationary kernels named Attentive Kernel (AK), which is simple, robust, and can extend any existing kernel to a non-stationary one. We evaluate the new kernel in elevation mapping tasks, where AK provides better accuracy and uncertainty quantification over the commonly used stationary kernels and the leading non-stationary kernels. The improved uncertainty quantification guides the downstream informative planner to collect more valuable data around the high-error area, further increasing prediction accuracy. A field experiment demonstrates that the proposed method can guide an Autonomous Surface Vehicle (ASV) to prioritize data collection in locations with significant spatial variations, enabling the model to characterize salient environmental features.Comment: International Journal of Robotics Research (IJRR). arXiv admin note: text overlap with arXiv:2205.0642

arXiv.org e-Print Archive

Direct Preference Optimization for Neural Machine Translation with Minimum Bayes Risk Decoding

Author: Byrne Bill
Chen Jinghong
Lin Weizhe
Yang Guangyu
Publication venue
Publication date: 14/11/2023
Field of study

Minimum Bayes Risk (MBR) decoding can significantly improve translation performance of Multilingual Large Language Models (MLLMs). However, MBR decoding is computationally expensive and in this paper, we show how recently developed Reinforcement Learning (RL) technique, Direct Preference Optimization (DPO) can be used to fine-tune MLLMs so that we get the gains from MBR without the additional computation in inference. Our fine-tuned models have significantly improved performance on multiple NMT test sets compared to base MLLMs without preference optimization. Our method boosts the translation performance of MLLMs using relatively small monolingual fine-tuning sets

arXiv.org e-Print Archive

DiSProD: Differentiable Symbolic Propagation of Distributions for Planning

Author: Chapagain Ashutosh
Chatterjee Palash
Chen Weizhe
Khardon Roni
Publication venue
Publication date: 04/08/2023
Field of study

The paper introduces DiSProD, an online planner developed for environments with probabilistic transitions in continuous state and action spaces. DiSProD builds a symbolic graph that captures the distribution of future trajectories, conditioned on a given policy, using independence assumptions and approximate propagation of distributions. The symbolic graph provides a differentiable representation of the policy's value, enabling efficient gradient-based optimization for long-horizon search. The propagation of approximate distributions can be seen as an aggregation of many trajectories, making it well-suited for dealing with sparse rewards and stochastic environments. An extensive experimental evaluation compares DiSProD to state-of-the-art planners in discrete-time planning and real-time control of robotic systems. The proposed method improves over existing planners in handling stochastic environments, sensitivity to search depth, sparsity of rewards, and large action spaces. Additional real-world experiments demonstrate that DiSProD can control ground vehicles and surface vessels to successfully navigate around obstacles.Comment: International Joint Conference on Artificial Intelligence (IJCAI) 2023. For project website, see https://pecey.github.io/DiSProD

arXiv.org e-Print Archive

Control-DAG: Constrained Decoding for Non-Autoregressive Directed Acyclic T5 using Weighted Finite State Automata

Author: Byrne Bill
Chen Jinghong
Lin Weizhe
Mei Jingbiao
Publication venue
Publication date: 10/04/2024
Field of study

The Directed Acyclic Transformer is a fast non-autoregressive (NAR) model that performs well in Neural Machine Translation. Two issues prevent its application to general Natural Language Generation (NLG) tasks: frequent Out-Of-Vocabulary (OOV) errors and the inability to faithfully generate entity names. We introduce Control-DAG, a constrained decoding algorithm for our Directed Acyclic T5 (DA-T5) model which offers lexical, vocabulary and length control. We show that Control-DAG significantly enhances DA-T5 on the Schema Guided Dialogue and the DART datasets, establishing strong NAR results for Task-Oriented Dialogue and Data-to-Text NLG.Comment: 11 pages. NAACL 202

arXiv.org e-Print Archive

Model-Agnostic Multi-Agent Perception Framework

Author: Chen Weizhe
Liu Lantao
Ma Jiaqi
Xiang Hao
Xu Runsheng
Publication venue
Publication date: 20/09/2022
Field of study

Existing multi-agent perception systems assume that every agent utilizes the same model with identical parameters and architecture. The performance can be degraded with different perception models due to the mismatch in their confidence scores. In this work, we propose a model-agnostic multi-agent perception framework to reduce the negative effect caused by the model discrepancies without sharing the model information. Specifically, we propose a confidence calibrator that can eliminate the prediction confidence score bias. Each agent performs such calibration independently on a standard public database to protect intellectual property. We also propose a corresponding bounding box aggregation algorithm that considers the confidence scores and the spatial agreement of neighboring boxes. Our experiments shed light on the necessity of model calibration across different agents, and the results show that the proposed framework improves the baseline 3D object detection performance of heterogeneous agents

arXiv.org e-Print Archive

A Two Dimensional Feature Engineering Method for Relation Extraction

Author: Chen Yanping
Huang Ruizhang
Qin Yongbin
Wang Hao
Yang Weizhe
Publication venue
Publication date: 07/04/2024
Field of study

Transforming a sentence into a two-dimensional (2D) representation (e.g., the table filling) has the ability to unfold a semantic plane, where an element of the plane is a word-pair representation of a sentence which may denote a possible relation representation composed of two named entities. The 2D representation is effective in resolving overlapped relation instances. However, in related works, the representation is directly transformed from a raw input. It is weak to utilize prior knowledge, which is important to support the relation extraction task. In this paper, we propose a two-dimensional feature engineering method in the 2D sentence representation for relation extraction. Our proposed method is evaluated on three public datasets (ACE05 Chinese, ACE05 English, and SanWen) and achieves the state-of-the-art performance. The results indicate that two-dimensional feature engineering can take advantage of a two-dimensional sentence representation and make full use of prior knowledge in traditional feature engineering. Our code is publicly available at https://github.com/Wang-ck123/A-Two-Dimensional-Feature-Engineering-Method-for-Entity-Relation-Extractio

arXiv.org e-Print Archive

Research on digital tool in cognitive assessment: a bibliometric analysis

Author: Dantao Peng
Dantao Peng
Dantao Peng
Leian Chen
Leian Chen
Weizhe Zhen
Weizhe Zhen
Publication venue: Frontiers Media S.A.
Publication date: 01/08/2023
Field of study

ObjectiveThe number of research into new cognitive assessment tools has increased rapidly in recent years, sparking great interest among professionals. However, there is still little literature revealing the current status and future trends of digital technology use in cognitive assessment. The aim of this study was to summarize the development of digital cognitive assessment tools through the bibliometric method.MethodsWe carried out a comprehensive search in the Web of Science Core Collection to identify relevant papers published in English between January 1, 2003, and April 3, 2023. We used the subjects such as “digital,” “computer,” and “cognitive,” and finally 13,244 related publications were collected. Then we conducted the bibliometric analysis by Bibliometrix” R-package, VOSviewer and CiteSpace software, revealing the prominent countries, authors, institutions, and journals.Results11,045 articles and 2,199 reviews were included in our analyzes. The number of annual publications in this field was rising rapidly. The results showed that the most productive countries, authors and institutions were primarily located in economically developed regions, especially the North American, European, and Australian countries. Research cooperation tended to occur in these areas as well. The application of digital technology in cognitive assessment appealed to growing attention during the outbreak of the COVID-19 epidemic.ConclusionDigital technology uses have had a great impact on cognitive assessment and health care. There have been substantial papers published in these areas in recent years. The findings of the study indicate the great potential of digital technology in cognitive assessment

Directory of Open Access Journals

Improving hateful memes detection via learning hatefulness-aware embedding space through retrieval-guided contrastive learning

Author: Byrne Bill
Chen Jinghong
Lin Weizhe
Mei Jingbiao
Tomalin Marcus
Publication venue
Publication date: 14/11/2023
Field of study

Hateful memes have emerged as a significant concern on the Internet. These memes, which are a combination of image and text, often convey messages vastly different from their individual meanings. Thus, detecting hateful memes requires the system to jointly understand the visual and textual modalities. However, our investigation reveals that the embedding space of existing CLIP-based systems lacks sensitivity to subtle differences in memes that are vital for correct hatefulness classification. To address this issue, we propose constructing a hatefulness-aware embedding space through retrieval-guided contrastive training. Specifically, we add an auxiliary loss that utilizes hard negative and pseudo-gold samples to train the embedding space. Our approach achieves state-of-the-art performance on the HatefulMemes dataset with an AUROC of 86.7. Notably, our approach outperforms much larger fine-tuned Large Multimodal Models like Flamingo and LLaVA. Finally, we demonstrate a retrieval-based hateful memes detection system, which is capable of making hatefulness classification based on data unseen in training from a database. This allows developers to update the hateful memes detection system by simply adding new data without retraining, a desirable feature for real services in the constantly-evolving landscape of hateful memes on the Internet

arXiv.org e-Print Archive

Fine-grained Late-interaction Multi-modal Retrieval for Retrieval Augmented Visual Question Answering

Author: Byrne Bill
Chen Jinghong
Coca Alexandru
Lin Weizhe
Mei Jingbiao
Publication venue
Publication date: 28/10/2023
Field of study

Knowledge-based Visual Question Answering (KB-VQA) requires VQA systems to utilize knowledge from external knowledge bases to answer visually-grounded questions. Retrieval-Augmented Visual Question Answering (RA-VQA), a strong framework to tackle KB-VQA, first retrieves related documents with Dense Passage Retrieval (DPR) and then uses them to answer questions. This paper proposes Fine-grained Late-interaction Multi-modal Retrieval (FLMR) which significantly improves knowledge retrieval in RA-VQA. FLMR addresses two major limitations in RA-VQA's retriever: (1) the image representations obtained via image-to-text transforms can be incomplete and inaccurate and (2) relevance scores between queries and documents are computed with one-dimensional embeddings, which can be insensitive to finer-grained relevance. FLMR overcomes these limitations by obtaining image representations that complement those from the image-to-text transforms using a vision model aligned with an existing text-based retriever through a simple alignment network. FLMR also encodes images and questions using multi-dimensional embeddings to capture finer-grained relevance between queries and documents. FLMR significantly improves the original RA-VQA retriever's PRRecall@5 by approximately 8\%. Finally, we equipped RA-VQA with two state-of-the-art large multi-modal/language models to achieve

\sim61\%

VQA score in the OK-VQA dataset.Comment: To appear at NeurIPS 2023. This is the camera-ready version. We fixed some numbers and added more experiments to address reviewers' comment

arXiv.org e-Print Archive