21 research outputs found

    A Practical Incremental Learning Framework For Sparse Entity Extraction

    Get PDF
    This work addresses challenges arising from extracting entities from textual data, including the high cost of data annotation, model accuracy, selecting appropriate evaluation criteria, and the overall quality of annotation. We present a framework that integrates Entity Set Expansion (ESE) and Active Learning (AL) to reduce the annotation cost of sparse data and provide an online evaluation method as feedback. This incremental and interactive learning framework allows for rapid annotation and subsequent extraction of sparse data while maintaining high accuracy. We evaluate our framework on three publicly available datasets and show that it drastically reduces the cost of sparse entity annotation by an average of 85% and 45% to reach 0.9 and 1.0 F-Scores respectively. Moreover, the method exhibited robust performance across all datasets.Comment: https://www.aclweb.org/anthology/C18-1059

    Bioinformatics tools in predictive ecology: Applications to fisheries

    Get PDF
    This article is made available throught the Brunel Open Access Publishing Fund - Copygith @ 2012 Tucker et al.There has been a huge effort in the advancement of analytical techniques for molecular biological data over the past decade. This has led to many novel algorithms that are specialized to deal with data associated with biological phenomena, such as gene expression and protein interactions. In contrast, ecological data analysis has remained focused to some degree on off-the-shelf statistical techniques though this is starting to change with the adoption of state-of-the-art methods, where few assumptions can be made about the data and a more explorative approach is required, for example, through the use of Bayesian networks. In this paper, some novel bioinformatics tools for microarray data are discussed along with their ‘crossover potential’ with an application to fisheries data. In particular, a focus is made on the development of models that identify functionally equivalent species in different fish communities with the aim of predicting functional collapse

    ISMB/ECCB 2009 Stockholm

    Get PDF
    The International Society for Computational Biology (ISCB; http://www.iscb.org) presents the Seventeenth Annual International Conference on Intelligent Systems for Molecular Biology (ISMB), organized jointly with the Eighth Annual European Conference on Computational Biology (ECCB; http://bioinf.mpi-inf.mpg.de/conferences/eccb/eccb.htm), in Stockholm, Sweden, 27 June to 2 July 2009. The organizers are putting the finishing touches on the year's premier computational biology conference, with an expected attendance of 1400 computer scientists, mathematicians, statisticians, biologists and scientists from other disciplines related to and reliant on this multi-disciplinary science. ISMB/ECCB 2009 (http://www.iscb.org/ismbeccb2009/) follows the framework introduced at the ISMB/ECCB 2007 (http://www.iscb.org/ismbeccb2007/) in Vienna, and further refined at the ISMB 2008 (http://www.iscb.org/ismb2008/) in Toronto; a framework developed to specifically encourage increased participation from often under-represented disciplines at conferences on computational biology. During the main ISMB conference dates of 29 June to 2 July, keynote talks from highly regarded scientists, including ISCB Award winners, are the featured presentations that bring all attendees together twice a day. The remainder of each day offers a carefully balanced selection of parallel sessions to choose from: proceedings papers, special sessions on emerging topics, highlights of the past year's published research, special interest group meetings, technology demonstrations, workshops and several unique sessions of value to the broad audience of students, faculty and industry researchers. Several hundred posters displayed for the duration of the conference has become a standard of the ISMB and ECCB conference series, and an extensive commercial exhibition showcases the latest bioinformatics publications, software, hardware and services available on the market today. The main conference is preceded by 2 days of Special Interest Group (SIG) and Satellite meetings running in parallel to the fifth Student Council Symposium on 27 June, and in parallel to Tutorials on 28 June. All scientific sessions take place at the Stockholmsmässan/Stockholm International Fairs conference and exposition facility

    Improved residue contact prediction using support vector machines and a large feature set

    Get PDF
    BACKGROUND: Predicting protein residue-residue contacts is an important 2D prediction task. It is useful for ab initio structure prediction and understanding protein folding. In spite of steady progress over the past decade, contact prediction remains still largely unsolved. RESULTS: Here we develop a new contact map predictor (SVMcon) that uses support vector machines to predict medium- and long-range contacts. SVMcon integrates profiles, secondary structure, relative solvent accessibility, contact potentials, and other useful features. On the same test data set, SVMcon's accuracy is 4% higher than the latest version of the CMAPpro contact map predictor. SVMcon recently participated in the seventh edition of the Critical Assessment of Techniques for Protein Structure Prediction (CASP7) experiment and was evaluated along with seven other contact map predictors. SVMcon was ranked as one of the top predictors, yielding the second best coverage and accuracy for contacts with sequence separation >= 12 on 13 de novo domains. CONCLUSION: We describe SVMcon, a new contact map predictor that uses SVMs and a large set of informative features. SVMcon yields good performance on medium- to long-range contact predictions and can be modularly incorporated into a structure prediction pipeline

    Gene expression-based prediction of malignancies

    Get PDF
    Molecular classification of malignancies can potentially stratify patients into distinct subclasses not detectable using traditional classification of tumors, opening new perspectives on the diagnosis and personalized therapy of polygenic diseases. In this paper we present a brief overview of our work on gene expression based prediction of malignancies, starting from the dichotomic classification problem of normal versus tumoural tissues, to multiclasss cancer diagnosis and to functional class discovery and gene selection problems. The last part of this work present preliminary results about the applicatin of ensembles of SVMs based on bias-variance decomposition of the error to the analysis of gene expression data of malignant tissues

    Anaphora Resolution for Biomedical Literature by Exploiting Multiple Resources

    Full text link

    Application of Gene Shaving and Mixture Models to Cluster Microarray Gene Expression Data

    Get PDF
    Researchers are frequently faced with the analysis of microarray data of a relatively large number of genes using a small number of tissue samples. We examine the application of two statistical methods for clustering such microarray expression data: EMMIX-GENE and GeneClust. EMMIX-GENE is a mixture-model based clustering approach, designed primarily to cluster tissue samples on the basis of the genes. GeneClust is an implementation of the gene shaving methodology, motivated by research to identify distinct sets of genes for which variation in expression could be related to a biological property of the tissue samples. We illustrate the use of these two methods in the analysis of Affymetrix oligonucleotide arrays of well-known data sets from colon tissue samples with and without tumors, and of tumor tissue samples from patients with leukemia. Although the two approaches have been developed from different perspectives, the results demonstrate a clear correspondence between gene clusters produced by GeneClust and EMMIX-GENE for the colon tissue data. It is demonstrated, for the case of ribosomal proteins and smooth muscle genes in the colon data set, that both methods can classify genes into co-regulated families. It is further demonstrated that tissue types (tumor and normal) can be separated on the basis of subtle distributed patterns of genes. Application to the leukemia tissue data produces a division of tissues corresponding closely to the external classification, acute myeloid meukemia (AML) and acute lymphoblastic leukemia (ALL), for both methods. In addition, we also identify genes specific for the subgroup of ALL-Tcell samples. Overall, we find that the gene shaving method produces gene clusters at great speed; allows variable cluster sizes and can incorporate partial or full supervision; and finds clusters of genes in which the gene expression varies greatly over the tissue samples while maintaining a high level of coherence between the gene expression profiles. The intent of the EMMIX-GENE method is to cluster the tissue samples. It performs a filtering step that results in a subset of relevant genes, followed by gene clustering, and then tissue clustering, and is favorable in its accuracy of ranking the clusters produced
    corecore