7,480 research outputs found

    Exploring Pattern Structures of Syntactic Trees for Relation Extraction

    Get PDF
    International audienceIn this paper we explore the possibility of defining an original pattern structure for managing syntactic trees.More precisely, we are interested in the extraction of relations such as drug-drug interactions (DDIs) in medical texts where sentences are represented as syntactic trees.In this specific pattern structure, called STPS, the similarity operator is based on rooted tree intersection.Moreover, we introduce "Lazy Pattern Structure Classification" (LPSC), which is a symbolic method able to extract and classify DDI sentences w.r.t. STPS.To decrease computation time, a projection and a set of tree-simplification operations are proposed.We evaluated the method by means of a 10-fold cross validation on the corpus of the DDI extraction challenge 2011, and we obtained very encouraging results that are reported at the end of the paper

    A Joint Model for Definition Extraction with Syntactic Connection and Semantic Consistency

    Full text link
    Definition Extraction (DE) is one of the well-known topics in Information Extraction that aims to identify terms and their corresponding definitions in unstructured texts. This task can be formalized either as a sentence classification task (i.e., containing term-definition pairs or not) or a sequential labeling task (i.e., identifying the boundaries of the terms and definitions). The previous works for DE have only focused on one of the two approaches, failing to model the inter-dependencies between the two tasks. In this work, we propose a novel model for DE that simultaneously performs the two tasks in a single framework to benefit from their inter-dependencies. Our model features deep learning architectures to exploit the global structures of the input sentences as well as the semantic consistencies between the terms and the definitions, thereby improving the quality of the representation vectors for DE. Besides the joint inference between sentence classification and sequential labeling, the proposed model is fundamentally different from the prior work for DE in that the prior work has only employed the local structures of the input sentences (i.e., word-to-word relations), and not yet considered the semantic consistencies between terms and definitions. In order to implement these novel ideas, our model presents a multi-task learning framework that employs graph convolutional neural networks and predicts the dependency paths between the terms and the definitions. We also seek to enforce the consistency between the representations of the terms and definitions both globally (i.e., increasing semantic consistency between the representations of the entire sentences and the terms/definitions) and locally (i.e., promoting the similarity between the representations of the terms and the definitions)

    Using percolated dependencies for phrase extraction in SMT

    Get PDF
    Statistical Machine Translation (SMT) systems rely heavily on the quality of the phrase pairs induced from large amounts of training data. Apart from the widely used method of heuristic learning of n-gram phrase translations from word alignments, there are numerous methods for extracting these phrase pairs. One such class of approaches uses translation information encoded in parallel treebanks to extract phrase pairs. Work to date has demonstrated the usefulness of translation models induced from both constituency structure trees and dependency structure trees. Both syntactic annotations rely on the existence of natural language parsers for both the source and target languages. We depart from the norm by directly obtaining dependency parses from constituency structures using head percolation tables. The paper investigates the use of aligned chunks induced from percolated dependencies in French–English SMT and contrasts it with the aforementioned extracted phrases. We observe that adding phrase pairs from any other method improves translation performance over the baseline n-gram-based system, percolated dependencies are a good substitute for parsed dependencies, and that supplementing with our novel head percolation-induced chunks shows a general trend toward improving all system types across two data sets up to a 5.26% relative increase in BLEU

    Extraction of Transcript Diversity from Scientific Literature

    Get PDF
    Transcript diversity generated by alternative splicing and associated mechanisms contributes heavily to the functional complexity of biological systems. The numerous examples of the mechanisms and functional implications of these events are scattered throughout the scientific literature. Thus, it is crucial to have a tool that can automatically extract the relevant facts and collect them in a knowledge base that can aid the interpretation of data from high-throughput methods. We have developed and applied a composite text-mining method for extracting information on transcript diversity from the entire MEDLINE database in order to create a database of genes with alternative transcripts. It contains information on tissue specificity, number of isoforms, causative mechanisms, functional implications, and experimental methods used for detection. We have mined this resource to identify 959 instances of tissue-specific splicing. Our results in combination with those from EST-based methods suggest that alternative splicing is the preferred mechanism for generating transcript diversity in the nervous system. We provide new annotations for 1,860 genes with the potential for generating transcript diversity. We assign the MeSH term “alternative splicing” to 1,536 additional abstracts in the MEDLINE database and suggest new MeSH terms for other events. We have successfully extracted information about transcript diversity and semiautomatically generated a database, LSAT, that can provide a quantitative understanding of the mechanisms behind tissue-specific gene expression. LSAT (Literature Support for Alternative Transcripts) is publicly available at http://www.bork.embl.de/LSAT/
    corecore