16,492 research outputs found

    Boosting Drug Named Entity Recognition using an Aggregate Classifier

    Get PDF
    AbstractObjectiveDrug named entity recognition (NER) is a critical step for complex biomedical NLP tasks such as the extraction of pharmacogenomic, pharmacodynamic and pharmacokinetic parameters. Large quantities of high quality training data are almost always a prerequisite for employing supervised machine-learning techniques to achieve high classification performance. However, the human labour needed to produce and maintain such resources is a significant limitation. In this study, we improve the performance of drug NER without relying exclusively on manual annotations.MethodsWe perform drug NER using either a small gold-standard corpus (120 abstracts) or no corpus at all. In our approach, we develop a voting system to combine a number of heterogeneous models, based on dictionary knowledge, gold-standard corpora and silver annotations, to enhance performance. To improve recall, we employed genetic programming to evolve 11 regular-expression patterns that capture common drug suffixes and used them as an extra means for recognition.MaterialsOur approach uses a dictionary of drug names, i.e. DrugBank, a small manually annotated corpus, i.e. the pharmacokinetic corpus, and a part of the UKPMC database, as raw biomedical text. Gold-standard and silver annotated data are used to train maximum entropy and multinomial logistic regression classifiers.ResultsAggregating drug NER methods, based on gold-standard annotations, dictionary knowledge and patterns, improved the performance on models trained on gold-standard annotations, only, achieving a maximum F-score of 95%. In addition, combining models trained on silver annotations, dictionary knowledge and patterns are shown to achieve comparable performance to models trained exclusively on gold-standard data. The main reason appears to be the morphological similarities shared among drug names.ConclusionWe conclude that gold-standard data are not a hard requirement for drug NER. Combining heterogeneous models build on dictionary knowledge can achieve similar or comparable classification performance with that of the best performing model trained on gold-standard annotations

    Biomolecular Event Extraction using Natural Language Processing

    Get PDF
    Biomedical research and discoveries are communicated through scholarly publications and this literature is voluminous, rich in scientific text and growing exponentially by the day. Biomedical journals publish nearly three thousand research articles daily, making literature search a challenging proposition for researchers. Biomolecular events involve genes, proteins, metabolites, and enzymes that provide invaluable insights into biological processes and explain the physiological functional mechanisms. Text mining (TM) or extraction of such events automatically from big data is the only quick and viable solution to gather any useful information. Such events extracted from biological literature have a broad range of applications like database curation, ontology construction, semantic web search and interactive systems. However, automatic extraction has its challenges on account of ambiguity and the diverse nature of natural language and associated linguistic occurrences like speculations, negations etc., which commonly exist in biomedical texts and lead to erroneous elucidation. In the last decade, many strategies have been proposed in this field, using different paradigms like Biomedical natural language processing (BioNLP), machine learning and deep learning. Also, new parallel computing architectures like graphical processing units (GPU) have emerged as possible candidates to accelerate the event extraction pipeline. This paper reviews and provides a summarization of the key approaches in complex biomolecular big data event extraction tasks and recommends a balanced architecture in terms of accuracy, speed, computational cost, and memory usage towards developing a robust GPU-accelerated BioNLP system

    A semiotic analysis of the genetic information

    Get PDF
    Terms loaded with informational connotations are often employed to refer to genes and their dynamics. Indeed, genes are usually perceived by biologists as basically ‘the carriers of hereditary information.’ Nevertheless, a number of researchers consider such talk as inadequate and ‘just metaphorical,’ thus expressing a skepticism about the use of the term ‘information’ and its derivatives in biology as a natural science. First, because the meaning of that term in biology is not as precise as it is, for instance, in the mathematical theory of communication. Second, because it seems to refer to a purported semantic property of genes without theoretically clarifying if any genuinely intrinsic semantics is involved. Biosemiotics, a field that attempts to analyze biological systems as semiotic systems, makes it possible to advance in the understanding of the concept of information in biology. From the perspective of Peircean biosemiotics, we develop here an account of genes as signs, including a detailed analysis of two fundamental processes in the genetic information system (transcription and protein synthesis) that have not been made so far in this field of research. Furthermore, we propose here an account of information based on Peircean semiotics and apply it to our analysis of transcription and protein synthesis

    Twenty years of "Lipid World": a fertile partnership with David Deamer

    Get PDF
    "The Lipid World" was published in 2001, stemming from a highly effective collaboration with David Deamer during a sabbatical year 20 years ago at the Weizmann Institute of Science in Israel. The present review paper highlights the benefits of this scientific interaction and assesses the impact of the lipid world paper on the present understanding of the possible roles of amphiphiles and their assemblies in the origin of life. The lipid world is defined as a putative stage in the progression towards life's origin, during which diverse amphiphiles or other spontaneously aggregating small molecules could have concurrently played multiple key roles, including compartment formation, the appearance of mutually catalytic networks, molecular information processing, and the rise of collective self-reproduction and compositional inheritance. This review brings back into a broader perspective some key points originally made in the lipid world paper, stressing the distinction between the widely accepted role of lipids in forming compartments and their expanded capacities as delineated above. In the light of recent advancements, we discussed the topical relevance of the lipid worldview as an alternative to broadly accepted scenarios, and the need for further experimental and computer-based validation of the feasibility and implications of the individual attributes of this point of view. Finally, we point to possible avenues for exploring transition paths from small molecule-based noncovalent structures to more complex biopolymer-containing proto-cellular systems.711473 - Minerva Foundation; 80NSSC17K0295, 80NSSC17K0296, 1724150 - National Science FoundationPublished versio
    • …
    corecore