2,491 research outputs found
LasUIE: Unifying Information Extraction with Latent Adaptive Structure-aware Generative Language Model
Universally modeling all typical information extraction tasks (UIE) with one
generative language model (GLM) has revealed great potential by the latest
study, where various IE predictions are unified into a linearized hierarchical
expression under a GLM. Syntactic structure information, a type of effective
feature which has been extensively utilized in IE community, should also be
beneficial to UIE. In this work, we propose a novel structure-aware GLM, fully
unleashing the power of syntactic knowledge for UIE. A heterogeneous structure
inductor is explored to unsupervisedly induce rich heterogeneous structural
representations by post-training an existing GLM. In particular, a structural
broadcaster is devised to compact various latent trees into explicit high-order
forests, helping to guide a better generation during decoding. We finally
introduce a task-oriented structure fine-tuning mechanism, further adjusting
the learned structures to most coincide with the end-task's need. Over 12 IE
benchmarks across 7 tasks our system shows significant improvements over the
baseline UIE system. Further in-depth analyses show that our GLM learns rich
task-adaptive structural bias that greatly resolves the UIE crux, the
long-range dependence issue and boundary identifying. Source codes are open at
https://github.com/ChocoWu/LasUIE.Comment: NeurIPS2022 conference pape
Novel Natural Language Processing Models for Medical Terms and Symptoms Detection in Twitter
This dissertation focuses on disambiguation of language use on Twitter about drug use, consumption types of drugs, drug legalization, ontology-enhanced approaches, and prediction analysis of data-driven by developing novel NLP models. Three technical aims comprise this work: (a) leveraging pattern recognition techniques to improve the quality and quantity of crawled Twitter posts related to drug abuse; (b) using an expert-curated, domain-specific DsOn ontology model that improve knowledge extraction in the form of drug-to-symptom and drug-to-side effect relations; and (c) modeling the prediction of public perception of the drug’s legalization and the sentiment analysis of drug consumption on Twitter. We collected 7.5 million data from August 2015 to March 2016. This work leveraged a longstanding, multidisciplinary collaboration between researchers at the Population & Center for Interventions, Treatment, and Addictions Research (CITAR) in the Boonshoft School of Medicine and the Department of Computer Science and Engineering. In addition, we aimed to develop and deploy an innovative prediction analysis algorithm for eDrugTrends, capable of semi-automated processing of Twitter data to identify emerging trends in cannabis and synthetic cannabinoid use in the U.S. In addition, the study included aim four, a use case study defined by tweets content analyzing PLWH, medication patterns, and identifying keyword trends via Twitter-based, user-generated content. This case study leveraged a multidisciplinary collaboration between researchers at the Departments of Family Medicine and Population and Public Health Sciences at Wright State University’s Boonshoft School of Medicine and the Department of Computer Science and Engineering. We collected 65K data from February 2022 to July 2022 with the U.S.-based HIV knowledge domain recruited via the Twitter API streaming platform. For knowledge discovery, domain knowledge plays a significant role in powering many intelligent frameworks, such as data analysis, information retrieval, and pattern recognition. Recent NLP and semantic web advances have contributed to extending the domain knowledge of medical terms. These techniques required a bag of seeds for medical knowledge discovery. Various initiate seeds create irrelevant data to the noise and negatively impact the prediction analysis performance. The methodology of aim one, PatRDis classifier, applied for noisy and ambiguous issues, and aim two, DsOn Ontology model, applied for semantic parsing and enriching the online medical to classify the data for HIV care medications engagement and symptom detection from Twitter. By applying the methodology of aims 2 and 3, we solved the challenges of ambiguity and explored more than 1500 cannabis and cannabinoid slang terms. Sentiments measured preceding the election, such as states with high levels of positive sentiment preceding the election who were engaged in enhancing their legalization status. we also used the same dataset for prediction analysis for marijuana legalization and consumption trend analysis (Ohio public polling data). In Aim 4, we applied three experiments, ensemble-learning, the RNN-LSM, the NNBERT-CNN models, and five techniques to determine the tweets associated with medication adherence and HIV symptoms. The long short-term memory (LSTM) model and the CNN for sentence classification produce accurate results and have been recently used in NLP tasks. CNN models use convolutional layers and maximum pooling or max-overtime pooling layers to extract higher-level features, while LSTM models can capture long-term dependencies between word sequences hence are better used for text classification. We propose attention-based RNN, MLP, and CNN deep learning models that capitalize on the advantages of LSTM and BERT techniques with an additional attention mechanism. We trained the model using NNBERT to evaluate the proposed model\u27s performance. The test results showed that the proposed models produce more accurate classification results, and BERT obtained higher recall and F1 scores than MLP or LSTM models. In addition, We developed an intelligent tool capable of automated processing of Twitter data to identify emerging trends in HIV disease, HIV symptoms, and medication adherence
PPI-IRO: A two-stage method for protein-protein interaction extraction based on interaction relation ontology
Mining Protein-Protein Interactions (PPIs) from the fast-growing biomedical literature resources has been proven as an effective approach for the identifi cation of biological regulatory networks. This paper presents a novel method based on the idea of Interaction Relation Ontology (IRO), which specifi es and organises words of various proteins interaction relationships. Our method is a two-stage PPI extraction method. At fi rst, IRO is applied in a binary classifi er to determine whether sentences contain a relation or not. Then, IRO is taken to guide PPI extraction by building sentence dependency parse tree. Comprehensive and quantitative evaluations and detailed analyses are used to demonstrate the signifi cant performance of IRO on relation sentences classifi cation and PPI extraction. Our PPI extraction method yielded a recall of around 80% and 90% and an F1 of around 54% and 66% on corpora of AIMed and Bioinfer, respectively, which are superior to most existing extraction methods. Copyright © 2014 Inderscience Enterprises Ltd
Document-Level Relation Extraction with Reconstruction
In document-level relation extraction (DocRE), graph structure is generally
used to encode relation information in the input document to classify the
relation category between each entity pair, and has greatly advanced the DocRE
task over the past several years. However, the learned graph representation
universally models relation information between all entity pairs regardless of
whether there are relationships between these entity pairs. Thus, those entity
pairs without relationships disperse the attention of the encoder-classifier
DocRE for ones with relationships, which may further hind the improvement of
DocRE. To alleviate this issue, we propose a novel
encoder-classifier-reconstructor model for DocRE. The reconstructor manages to
reconstruct the ground-truth path dependencies from the graph representation,
to ensure that the proposed DocRE model pays more attention to encode entity
pairs with relationships in the training. Furthermore, the reconstructor is
regarded as a relationship indicator to assist relation classification in the
inference, which can further improve the performance of DocRE model.
Experimental results on a large-scale DocRE dataset show that the proposed
model can significantly improve the accuracy of relation extraction on a strong
heterogeneous graph-based baseline.Comment: 9 pages, 5 figures, 6 tables. Accepted by AAAI 2021 (Long Paper
- …