116 research outputs found

    Information Extraction from Text for Improving Research on Small Molecules and Histone Modifications

    Get PDF
    The cumulative number of publications, in particular in the life sciences, requires efficient methods for the automated extraction of information and semantic information retrieval. The recognition and identification of information-carrying units in text – concept denominations and named entities – relevant to a certain domain is a fundamental step. The focus of this thesis lies on the recognition of chemical entities and the new biological named entity type histone modifications, which are both important in the field of drug discovery. As the emergence of new research fields as well as the discovery and generation of novel entities goes along with the coinage of new terms, the perpetual adaptation of respective named entity recognition approaches to new domains is an important step for information extraction. Two methodologies have been investigated in this concern: the state-of-the-art machine learning method, Conditional Random Fields (CRF), and an approximate string search method based on dictionaries. Recognition methods that rely on dictionaries are strongly dependent on the availability of entity terminology collections as well as on its quality. In the case of chemical entities the terminology is distributed over more than 7 publicly available data sources. The join of entries and accompanied terminology from selected resources enables the generation of a new dictionary comprising chemical named entities. Combined with the automatic processing of respective terminology – the dictionary curation – the recognition performance reached an F1 measure of 0.54. That is an improvement by 29 % in comparison to the raw dictionary. The highest recall was achieved for the class of TRIVIAL-names with 0.79. The recognition and identification of chemical named entities provides a prerequisite for the extraction of related pharmacological relevant information from literature data. Therefore, lexico-syntactic patterns were defined that support the automated extraction of hypernymic phrases comprising pharmacological function terminology related to chemical compounds. It was shown that 29-50 % of the automatically extracted terms can be proposed for novel functional annotation of chemical entities provided by the reference database DrugBank. Furthermore, they are a basis for building up concept hierarchies and ontologies or for extending existing ones. Successively, the pharmacological function and biological activity concepts obtained from text were included into a novel descriptor for chemical compounds. Its successful application for the prediction of pharmacological function of molecules and the extension of chemical classification schemes, such as the the Anatomical Therapeutic Chemical (ATC), is demonstrated. In contrast to chemical entities, no comprehensive terminology resource has been available for histone modifications. Thus, histone modification concept terminology was primary recognized in text via CRFs with a F1 measure of 0.86. Subsequent, linguistic variants of extracted histone modification terms were mapped to standard representations that were organized into a newly assembled histone modification hierarchy. The mapping was accomplished by a novel developed term mapping approach described in the thesis. The combination of term recognition and term variant resolution builds up a new procedure for the assembly of novel terminology collections. It supports the generation of a term list that is applicable in dictionary-based methods. For the recognition of histone modification in text it could be shown that the named entity recognition method based on dictionaries is superior to the used machine learning approach. In conclusion, the present thesis provides techniques which enable an enhanced utilization of textual data, hence, supporting research in epigenomics and drug discovery

    miRDiabetes : A microRNA-Diabetes Association Database Constructed With Data Mining on Literature

    Get PDF
    MicroRNAs (miRNAs) are a growing class of non-coding RNAs that regulate gene expression by translational repression. A role for miRNA in diabetes was first established in 2004 and research in miRNA-diabetes association has been an increasing interest since then. However, no effort or computational tool has been put forward to retrieve and gather literature on this topic. In this research, we have designed and implemented a method of utilizing data mining techniques on textual data on this subject, which can automatically determine relevancy of new entries with high accuracy. With this method, we have constructed miRDiabetes, the first comprehensive database to collect information in publications from PubMed that profiles relations between miRNAs and diabetes. We have also developed an application to facilitate future updates and built a website for researchers to search and download the miRDiabetes database.  M.S

    MilkMine: text-mining, milk proteins and hypothesis generation

    Get PDF
    The vast and increasing volume of biological data can make it a struggle for scientists to keep up-to-date with the latest research and as a consequence they may miss significant biological links, particularly those that extend outwith their own area of expertise. MilkMine is an attempt to provide a single informatics resource to help milk protein scientists mine this information mountain more effectively, by integrating standard experimental data types with data generated by emerging text-mining techniques. A method was initially developed to identify milk-related terminology from peer-reviewed biological literature and this was used to complement the Unified Medical Language System (UMLS), a large thesaurus of biological concepts, their variant names and their types. The resultant enriched ontology was then mapped to the free text of peer-reviewed biological literature using the MMTx program producing a database of semantically enriched sentences. A co-occurrence relation extraction algorithm was written to identify relationships between milk proteins and peptides, and other biological concepts, such as diseases or biological processes. Using these literature relation sets new hypotheses can be generated using the basic principle that if “A is linked to B”, and if “B is linked to C” then we can infer an association between A and C. Filtering and downstream processing of the many generated relationships promotes significant interactions. These literature relations and hypotheses are integrated with biological data into the MilkMine database. The MilkMine database is built upon on a generic data warehousing system, InterMine. This tool enabled the integration of traditional data types, such as protein sequence or structural data, from a variety of sources (e.g. UniProt). However, the standard InterMine model was also extended by the author to include other data sources (e.g. the Protein Data Bank) and to incorporate the output of the text-mining algorithm. This integration of otherwise disparate information allows more complex querying of the data, across many data types. For example, protein sequences are mapped to instances of the names, synonyms or symbols of the protein in text, therefore a raw fragment of amino acid sequence (e.g. a particular binding region) can be used to search the MilkMine database for literature information as well as the interactions and hypotheses of those proteins that contain the sequence. The MilkMine resource is accessible online (www.bioinformatics.ed.ac.uk/milkmine) through a professional level query interface offering many features such as an interactive query builder, standard ready-to-run queries, bulk downloads and the ability to store user preferences and query histories. Evaluation of MilkMine showed that the text-mining algorithm, as well as the data integration, could provide the user with interesting connections for further study

    Improving literature searching in systematic reviews: the application of tailored literature searching compared to ‘the conventional approach’

    Get PDF
    Background Literature searching is acknowledged as a crucial step in a systematic review. Information professionals, in response to the needs of intervention effectiveness systematic reviews, have developed a systematic process of literature searching which aims to be comprehensive, transparent and reproducible, and to minimise the introduction of bias in systematic reviews. The process which has evolved has not been examined in detail before but it has been adopted as the principal approach to literature searching in other types of systematic review. It is not clear if this is appropriate and if an alternative approach might be more effective. Thesis aims The aims of this thesis are to: 1) examine approaches to systematic literature searching for systematic reviews; and 2) propose and test a method of systematic literature searching for reviews which do not focus on the effectiveness of clinical interventions. Methods Two literature reviews, one systematic review and two comparative case studies were undertaken to meet the aims of the thesis. Results A critical literature review identified and described a conventional approach to literature searching common to nine leading handbooks of systematic review. An alternative, tailored approach to literature searching was developed. Two case studies illustrated that the tailored approach was more effective, and potentially offered better value, than the conventional approach. Conclusions Information professionals can develop tailored literature search approaches for use in systematic reviews and as a useful alternative to the conventional approach, particularly for reviews including study designs beyond controlled trials. The role of the information professional as decision maker, the involvement of the research team and experts, preparing for literature searching and the use of supplementary search methods, are important to the success of tailored literature search approaches

    The concept of justifiable healthcare and how big data can help us to achieve it

    Get PDF
    Over the last decades, the face of health care has changed dramatically, with big improvements in what is technically feasible. However, there are indicators that the current approach to evaluating evidence in health care is not holistic and hence in the long run, health care will not be sustainable. New conceptual and normative frameworks for the evaluation of health care need to be developed and investigated. The current paper presents a novel framework of justifiable health care and explores how the use of artificial intelligence and big data can contribute to achieving the goals of this framework

    Special Libraries, Summer 1994

    Get PDF
    Volume 85, Issue 3https://scholarworks.sjsu.edu/sla_sl_1994/1002/thumbnail.jp

    Special Libraries, Summer 1994

    Get PDF
    Volume 85, Issue 3https://scholarworks.sjsu.edu/sla_sl_1994/1002/thumbnail.jp
    • 

    corecore