Search CORE

94 research outputs found

Semantic Structure based Query Graph Prediction for Question Answering over Knowledge Graph

Author: Li Mingchen
Publication venue: ScholarWorks @ Georgia State University
Publication date: 14/12/2022
Field of study

Building query graphs from questions is an important step in complex question answering over knowledge graph (Complex KGQA). In general, a question can be correctly answered if its query graph is built correctly and the right answer is then retrieved by issuing the query graph against the KG. Therefore, this paper focuses on query graph generation from natural language questions. Existing approaches for query graph generation ignore the semantic structure of a question, resulting in a large number of noisy query graph candidates that undermine prediction accuracies. In this paper, we define six semantic structures from common questions in KGQA and develop a novel Structure-BERT to predict the semantic structure of a question, and then rank the remaining candidates with a BERT-based ranking model. Extensive experiments on two popular benchmarks MetaQA and WebQuestionsSP demonstrate the effectiveness of our method as compared to state-of-the-arts

ScholarWorks @ Georgia State University

TemPL: A Novel Deep Learning Model for Zero-Shot Prediction of Protein Stability and Activity Based on Temperature-Guided Language Modeling

Author: Hong Liang
Li Mingchen
Tan Pan
Zhang Liang
Publication venue
Publication date: 07/04/2023
Field of study

We introduce TemPL, a novel deep learning approach for zero-shot prediction of protein stability and activity, harnessing temperature-guided language modeling. By assembling an extensive dataset of ten million sequence-host bacterial strain optimal growth temperatures (OGTs) and {\Delta}Tm data for point mutations under consistent experimental conditions, we effectively compared TemPL with state-of-the-art models. Notably, TemPL demonstrated superior performance in predicting protein stability. An ablation study was conducted to elucidate the influence of OGT prediction and language modeling modules on TemPL's performance, revealing the importance of integrating both components. Consequently, TemPL offers considerable promise for protein engineering applications, facilitating the design of mutation sequences with enhanced stability and activit

arXiv.org e-Print Archive

PeTailor: Improving Large Language Model by Tailored Chunk Scorer in Biomedical Triple Extraction

Author: Chen M.
Li Mingchen
Zhang Rui
Zhou Huixue
Publication venue
Publication date: 27/10/2023
Field of study

The automatic extraction of biomedical entities and their interaction from unstructured data remains a challenging task due to the limited availability of expert-labeled standard datasets. In this paper, we introduce PETAI-LOR, a retrieval-based language framework that is augmented by tailored chunk scorer. Unlike previous retrieval-augmented language models (LM) that retrieve relevant documents by calculating the similarity between the input sentence and the candidate document set, PETAILOR segments the sentence into chunks and retrieves the relevant chunk from our pre-computed chunk-based relational key-value memory. Moreover, in order to comprehend the specific requirements of the LM, PETAI-LOR adapt the tailored chunk scorer to the LM. We also introduce GM-CIHT, an expert annotated biomedical triple extraction dataset with more relation types. This dataset is centered on the non-drug treatment and general biomedical domain. Additionally, we investigate the efficacy of triple extraction models trained on general domains when applied to the biomedical domain. Our experiments reveal that PETAI-LOR achieves state-of-the-art performance on GM-CIHTComment: this is the first preprint versio

arXiv.org e-Print Archive