Search CORE

250 research outputs found

LS-DTKMS: A Local Search Algorithm for Diversified Top-k MaxSAT Problem

Author: He Bo
Liang Jiaxin
Yin Minghao
Zhou Junping
Publication venue: LIPIcs - Leibniz International Proceedings in Informatics. 26th International Conference on Theory and Applications of Satisfiability Testing (SAT 2023)
Publication date: 01/01/2023
Field of study

The Maximum Satisfiability (MaxSAT), an important optimization problem, has a range of applications, including network routing, planning and scheduling, and combinatorial auctions. Among these applications, one usually benefits from having not just one single solution, but k diverse solutions. Motivated by this, we study an extension of MaxSAT, named Diversified Top-k MaxSAT (DTKMS) problem, which is to find k feasible assignments of a given formula such that each assignment satisfies all hard clauses and all of them together satisfy the maximum number of soft clauses. This paper presents a local search algorithm, LS-DTKMS, for DTKMS problem, which exploits novel scoring functions to select variables and assignments. Experiments demonstrate that LS-DTKMS outperforms the top-k MaxSAT based DTKMS solvers and state-of-the-art solvers for diversified top-k clique problem

DROPS Dagstuhl Research Online Publication Server

Energy-efficient Connected Cruise Control with Lean Penetration of Connected Vehicles

Author: Bell A. Harvey
He Chaozhe R.
Molnar Tamas
Orosz Gabor
Shen Minghao
Publication venue
Publication date: 06/05/2022
Field of study

This paper focuses on energy-efficient longitudinal controller design for a connected automated truck that travels in mixed traffic consisting of connected and non-connected vehicles. The truck has access to information about connected vehicles beyond line of sight using vehicle-to-vehicle (V2V) communication. A novel connected cruise control design is proposed which incorporates additional delays into the control law when responding to distant connected vehicles to account for the finite propagation of traffic waves. The speeds of non-connected vehicles are modeled as stochastic processes. A fundamental theorem is proven which links the spectral properties of the motion signals to the average energy consumption. This enables us to tune controller parameters and maximize energy efficiency. Simulations with synthetic data and real traffic data are used to demonstrate the energy efficiency of the control design. It is demonstrated that even with lean penetration of connected vehicles, our controller can bring significant energy savings.Comment: This is submitted to IEEE Transactions on Intelligent Transportation System

arXiv.org e-Print Archive

Learning to Correct Noisy Labels for Fine-Grained Entity Typing via Co-Prediction Prompt Tuning

Author: He Yongquan
Lin Yang
Tang Minghao
Xu Hongbo
Xu Yongxiu
Zhang Wenyuan
Publication venue
Publication date: 23/10/2023
Field of study

Fine-grained entity typing (FET) is an essential task in natural language processing that aims to assign semantic types to entities in text. However, FET poses a major challenge known as the noise labeling problem, whereby current methods rely on estimating noise distribution to identify noisy labels but are confused by diverse noise distribution deviation. To address this limitation, we introduce Co-Prediction Prompt Tuning for noise correction in FET, which leverages multiple prediction results to identify and correct noisy labels. Specifically, we integrate prediction results to recall labeled labels and utilize a differentiated margin to identify inaccurate labels. Moreover, we design an optimization objective concerning divergent co-predictions during fine-tuning, ensuring that the model captures sufficient information and maintains robustness in noise identification. Experimental results on three widely-used FET datasets demonstrate that our noise correction approach significantly enhances the quality of various types of training samples, including those annotated using distant supervision, ChatGPT, and crowdsourcing.Comment: Accepted by Findings of EMNLP 2023, 11 page

arXiv.org e-Print Archive

A Boundary Offset Prediction Network for Named Entity Recognition

Author: He Yongquan
Lin Yang
Tang Minghao
Xu Hongbo
Xu Yongxiu
Zhang Wenyuan
Publication venue
Publication date: 23/10/2023
Field of study

Named entity recognition (NER) is a fundamental task in natural language processing that aims to identify and classify named entities in text. However, span-based methods for NER typically assign entity types to text spans, resulting in an imbalanced sample space and neglecting the connections between non-entity and entity spans. To address these issues, we propose a novel approach for NER, named the Boundary Offset Prediction Network (BOPN), which predicts the boundary offsets between candidate spans and their nearest entity spans. By leveraging the guiding semantics of boundary offsets, BOPN establishes connections between non-entity and entity spans, enabling non-entity spans to function as additional positive samples for entity detection. Furthermore, our method integrates entity type and span representations to generate type-aware boundary offsets instead of using entity types as detection targets. We conduct experiments on eight widely-used NER datasets, and the results demonstrate that our proposed BOPN outperforms previous state-of-the-art methods.Comment: Accepted by Findings of EMNLP 2023, 13 page

arXiv.org e-Print Archive

A Study of Unsupervised Evaluation Metrics for Practical and Automatic Domain Adaptation

Author: Chen Minghao
Gao Zepeng
He Xiaofei
Lin Binbin
Qiu Qibo
Wang Wenxiao
Zhao Shuai
Publication venue
Publication date: 18/09/2023
Field of study

Unsupervised domain adaptation (UDA) methods facilitate the transfer of models to target domains without labels. However, these methods necessitate a labeled target validation set for hyper-parameter tuning and model selection. In this paper, we aim to find an evaluation metric capable of assessing the quality of a transferred model without access to target validation labels. We begin with the metric based on mutual information of the model prediction. Through empirical analysis, we identify three prevalent issues with this metric: 1) It does not account for the source structure. 2) It can be easily attacked. 3) It fails to detect negative transfer caused by the over-alignment of source and target features. To address the first two issues, we incorporate source accuracy into the metric and employ a new MLP classifier that is held out during training, significantly improving the result. To tackle the final issue, we integrate this enhanced metric with data augmentation, resulting in a novel unsupervised UDA metric called the Augmentation Consistency Metric (ACM). Additionally, we empirically demonstrate the shortcomings of previous experiment settings and conduct large-scale experiments to validate the effectiveness of our proposed metric. Furthermore, we employ our metric to automatically search for the optimal hyper-parameter set, achieving superior performance compared to manually tuned sets across four common benchmarks. Codes will be available soon

arXiv.org e-Print Archive

Energy-efficient Reactive and Predictive Connected Cruise Control

Author: Dollar R. Austin
He Chaozhe R.
Molnar Tamas G.
Orosz Gabor
Shen Minghao
Vahidi Ardalan
Publication venue
Publication date: 09/10/2022
Field of study

In this paper, we propose a framework for the longitudinal control of connected and automated vehicles traveling in mixed traffic consisting of connected and non-connected human-driven vehicles. Reactive and predictive controllers are proposed. Reactive controllers are given by explicit feedback control laws. In predictive controllers, the control input is optimized in a receding-horizon fashion, which depends on the predictions of motions of preceding vehicles. Beyond-line-of-sight information is obtained via vehicle-to-vehicle (V2V) communication, and is utilized in the proposed reactive and predictive controllers. Simulations utilizing real traffic data are used to show that connectivity can bring significant energy savings.Comment: 18 pages, 12 figures, submitted to Transportation Research Part C: Emerging Technologie

arXiv.org e-Print Archive

ProtLLM: An Interleaved Protein-Language LLM with Protein-as-Word Pre-Training

Author: Chi Zewen
He Conghui
Huang Heyan
Mao Xian-Ling
Xu Minghao
Zhang Wentao
Zheng Heqi
Zhuo Le
Publication venue
Publication date: 27/02/2024
Field of study

We propose ProtLLM, a versatile cross-modal large language model (LLM) for both protein-centric and protein-language tasks. ProtLLM features a unique dynamic protein mounting mechanism, enabling it to handle complex inputs where the natural language text is interspersed with an arbitrary number of proteins. Besides, we propose the protein-as-word language modeling approach to train ProtLLM. By developing a specialized protein vocabulary, we equip the model with the capability to predict not just natural language but also proteins from a vast pool of candidates. Additionally, we construct a large-scale interleaved protein-text dataset, named InterPT, for pre-training. This dataset comprehensively encompasses both (1) structured data sources like protein annotations and (2) unstructured data sources like biological research papers, thereby endowing ProtLLM with crucial knowledge for understanding proteins. We evaluate ProtLLM on classic supervised protein-centric tasks and explore its novel protein-language applications. Experimental results demonstrate that ProtLLM not only achieves superior performance against protein-specialized baselines on protein-centric tasks but also induces zero-shot and in-context learning capabilities on protein-language tasks.Comment: https://protllm.github.io/project

arXiv.org e-Print Archive

Semantic Segmentation to Extract Coronary Arteries in Invasive Coronary Angiograms

Author: ang Haipeng
Bober Robert
Dong Minghao
et. al.
He Zhuo
Tang Jinshan
Zhang Chaoyang
Zhao Chen
Zhou Weihua
Publication venue: Digital Commons @ Michigan Tech
Publication date: 24/05/2022
Field of study

Accurate semantic segmentation of each coronary artery using invasive coronary angiography (ICA) is important for stenosis assessment and coronary artery disease (CAD) diagnosis. In this paper, we propose a multi-step semantic segmentation algorithm based on analyzing arterial segments extracted from ICAs. The proposed algorithm firstly extracts the entire arterial binary mask (binary vascular tree) using a deep learning-based method. Then we extract the centerline of the binary vascular tree and separate it into different arterial segments. Finally, by extracting the underlying arterial topology, position, and pixel features, we construct a powerful coronary artery segment classifier based on a support vector machine. Each arterial segment is classified into the left coronary artery (LCA), left anterior descending (LAD), and other types of arterial segments. The proposed method was tested on a dataset with 225 ICAs and achieved a mean accuracy of 70.33% for the multi-class artery classification and a mean intersection over union of 0.6868 for semantic segmentation of arteries. The experimental results show the effectiveness of the proposed algorithm, which provides impressive performance for analyzing the individual arteries in ICAs

Michigan Technological University