10 research outputs found

    Learning Dictionaries for Named Entity Recognition using Minimal Supervision

    Full text link
    This paper describes an approach for automatic construction of dictionaries for Named Entity Recognition (NER) using large amounts of unlabeled data and a few seed examples. We use Canonical Correlation Analysis (CCA) to obtain lower dimensional embeddings (representations) for candidate phrases and classify these phrases using a small number of labeled examples. Our method achieves 16.5% and 11.3% F-1 score improvement over co-training on disease and virus NER respectively. We also show that by adding candidate phrase embeddings as features in a sequence tagger gives better performance compared to using word embeddings.Comment: In 14th Conference of the European Chapter of the Association for Computational Linguistic, 201

    Unregistered Biological Words Recognition by Q-Learning with Transfer Learning

    Get PDF
    Unregistered biological words recognition is the process of identification of terms that is out of vocabulary. Although many approaches have been developed, the performance approaches are not satisfactory. As the identification process can be viewed as a Markov process, we put forward a Q-learning with transfer learning algorithm to detect unregistered biological words from texts. With the Q-learning, the recognizer can attain the optimal solution of identification during the interaction with the texts and contexts. During the processing, a transfer learning approach is utilized to fully take advantage of the knowledge gained in a source task to speed up learning in a different but related target task. A mapping, required by many transfer learning, which relates features from the source task to the target task, is carried on automatically under the reinforcement learning framework. We examined the performance of three approaches with GENIA corpus and JNLPBA04 data. The proposed approach improved performance in both experiments. The precision, recall rate, and F score results of our approach surpassed those of conventional unregistered word recognizer as well as those of Q-learning approach without transfer learning

    Enhancing HMM-based biomedical named entity recognition by studying special phenomena

    No full text
    10.1016/j.jbi.2004.08.005Journal of Biomedical Informatics376411-422JBIO

    A Novel Approach for Protein-Named Entity Recognition and Protein-Protein Interaction Extraction

    Get PDF
    Many researchers focus on developing protein-named entity recognition (Protein-NER) or PPI extraction systems. However, the studies about these two topics cannot be merged well; then existing PPI extraction systems’ Protein-NER still needs to improve. In this paper, we developed the protein-protein interaction extraction system named PPIMiner based on Support Vector Machine (SVM) and parsing tree. PPIMiner consists of three main models: natural language processing (NLP) model, Protein-NER model, and PPI discovery model. The Protein-NER model, which is named ProNER, identifies the protein names based on two methods: dictionary-based method and machine learning-based method. ProNER is capable of identifying more proteins than dictionary-based Protein-NER model in other existing systems. The final discovered PPIs extracted via PPI discovery model are represented in detail because we showed the protein interaction types and the occurrence frequency through two different methods. In the experiments, the result shows that the performances achieved by our ProNER and PPI discovery model are better than other existing tools. PPIMiner applied this protein-named entity recognition approach and parsing tree based PPI extraction method to improve the performance of PPI extraction. We also provide an easy-to-use interface to access PPIs database and an online system for PPIs extraction and Protein-NER

    Multi-criteria-based active learning for named entity recognition

    Get PDF
    Master'sMASTER OF SCIENC

    Biomedical name recognition: A machine learning approach

    Get PDF
    Master'sMASTER OF SCIENC

    Identifying disease-relevant interactions in schizophrenia

    Get PDF
    Analyses of genome-wide association study data have demonstrated that there are potentially thousands of loci associated with schizophrenia (Sullivan et al. 2003). Although risk is partially explained by the additive effects of top-ranking polymorphisms, genetic interactions may help to explain additional heritability (Hemani et al. 2014; Zuk et al. 2012). However, attempts to identify disease-associated pair-wise interactions through exhaustive testing have so far been unsuccessful due to the large burden of multiple testing and the absence of easily discoverable interactions of large effect (Moskvina et al. 2011). Here we investigate whether evidence for a contribution to disease risk from SNP-SNP interactions can be found by searching for sets of genes enriched for nominally associated interactions. When performing interaction analyses covariates were introduced to account for population structure. Where the effect of covariates needs to be accounted for, the most widely used method modifies the basic logistic regression interaction analysis by simply adding covariate terms into the model. The performance of this method was compared to two alternative approaches: adding covariate-SNP interactions terms in addition to the individual covariate terms, as suggested by (Yzerbyt et al. 2004); and testing for interactions in each population separately, then using meta-analysis to combine interaction effects. Results and running time were similar whether SNP-covariate terms were included or not, while the meta-analytic approach was found to be the most efficient in terms of running time. To try and identify sets of genes enriched for nominally associated interactions, two approaches were investigated: one based on genetic information alone, and one based on functional information using protein-protein interactions (PPI). The first approach analyzed the distribution of interaction p-values after ranking them by the gene-wide main effects of the contributing genes, allowing a comparison to be made between genes with high/low gene-wide association. The second approach asked whether genes encoding directly interacting proteins were enriched for nominally associated interactions, drawing upon two PPI datasets: one from a large experimental (yeast two-hybrid) screen, the other consisting of PPI data curated from the literature. In both of the genetic datasets studied there was evidence for enrichment of nominally associated interactions amongst genes with highest gene-wide association for schizophrenia. There was no evidence for an excess of nominally associated interactions when investigating either PPI dataset
    corecore