62 research outputs found

    Combining Context and Knowledge Representations for Chemical-Disease Relation Extraction

    Full text link
    Automatically extracting the relationships between chemicals and diseases is significantly important to various areas of biomedical research and health care. Biomedical experts have built many large-scale knowledge bases (KBs) to advance the development of biomedical research. KBs contain huge amounts of structured information about entities and relationships, therefore plays a pivotal role in chemical-disease relation (CDR) extraction. However, previous researches pay less attention to the prior knowledge existing in KBs. This paper proposes a neural network-based attention model (NAM) for CDR extraction, which makes full use of context information in documents and prior knowledge in KBs. For a pair of entities in a document, an attention mechanism is employed to select important context words with respect to the relation representations learned from KBs. Experiments on the BioCreative V CDR dataset show that combining context and knowledge representations through the attention mechanism, could significantly improve the CDR extraction performance while achieve comparable results with state-of-the-art systems.Comment: Published on IEEE/ACM Transactions on Computational Biology and Bioinformatics, 11 pages, 5 figure

    Hedge Scope Detection in Biomedical Texts: An Effective Dependency-Based Method.

    No full text
    Hedge detection is used to distinguish uncertain information from facts, which is of essential importance in biomedical information extraction. The task of hedge detection is often divided into two subtasks: detecting uncertain cues and their linguistic scope. Hedge scope is a sequence of tokens including the hedge cue in a sentence. Previous hedge scope detection methods usually take all tokens in a sentence as candidate boundaries, which inevitably generate a large number of negatives for classifiers. The imbalanced instances seriously mislead classifiers and result in lower performance. This paper proposes a dependency-based candidate boundary selection method (DCBS), which selects the most likely tokens as candidate boundaries and removes the exceptional tokens which have less potential to improve the performance based on dependency tree. In addition, we employ the composite kernel to integrate lexical and syntactic information and demonstrate the effectiveness of structured syntactic features for hedge scope detection. Experiments on the CoNLL-2010 Shared Task corpus show that our method achieves 71.92% F1-score on the golden standard cues, which is 4.11% higher than the system without using DCBS. Although the candidate boundary selection method is only evaluated on hedge scope detection here, it can be popularized to other kinds of scope learning tasks

    A risk stratification and prognostic prediction model for lung adenocarcinoma based on aging-related lncRNA

    No full text
    Abstract To create a risk model of aging-related long non-coding RNAs (arlncRNAs) and determine whether they might be useful as markers for risk stratification, prognosis prediction, and targeted therapy guidance for patients with lung adenocarcinoma (LUAD). Data on aging genes and lncRNAs from LUAD patients were obtained from Human Aging Genomic Resources 3 and The Cancer Genome Atlas, and differential co-expression analysis of established differentially expressed arlncRNAs (DEarlncRNAs) was performed. They were then paired with a matrix of 0 or 1 by cyclic single pairing. The risk coefficient for each sample of LUAD individuals was obtained, and a risk model was constructed by performing univariate regression, least absolute shrinkage and selection operator regression analysis, and univariate and multivariate Cox regression analysis. Areas under the curve were calculated for the 1-, 3-, and 5-year receiver operating characteristic curves to determine Akaike information criterion-based cutoffs to identify high- and low-risk groups. The survival rate, correlation of clinical characteristics, malignant-infiltrating immune-cell expression, ICI-related gene expression, and chemotherapeutic drug sensitivity were contrasted with the high- and low-risk groups. We found that 99 DEarlncRNAs were upregulated and 12 were downregulated. Twenty pairs of DEarlncRNA pairs were used to create a prognostic model. The 1-, 3-, and 5-year survival curve areas of LUAD individuals were 0.805, 0.793, and 0.855, respectively. The cutoff value to classify patients into two groups was 0.992. The mortality rate was higher in the high-risk group. We affirmed that the LUAD outcome-related independent predictor was the risk score (p < 0.001). Validation of tumor-infiltrating immune cells and ICI-related gene expression differed substantially between the groups. The high-risk group was highly sensitive to docetaxel, erlotinib, gefitinib, and paclitaxel. Risk models constructed from arlncRNAs can be used for risk stratification in patients with LUAD and serve as prognostic markers to identify patients who might benefit from targeted and chemotherapeutic agents

    Voting-Based Ensemble Classifiers to Detect Hedges and Their Scopes in Biomedical Texts

    No full text

    System architecture.

    No full text
    <p>System architecture.</p

    An example of the L-scope candidate selection process with DCBS.

    No full text
    <p>(a) Initialize all nodes’ color; Select the L-scope candidate boundary from (b) to (f); (g) Output the L-scope candidate nodes.</p
    • …
    corecore