Search CORE

5,161 research outputs found

Nanoinformatics 2010 Program

Author: Baker Nathan A
Chaka Anne
Cohen Yoram
Colvin Vicki
Fritts Martin
Geraci Charles L.
Hoover Mark D
Ku Sharon
Kulinowski Kristen M
Lippell Phil
Luo James
McLennan Michael
Morse Jeffrey
Ostraat Michele L
Rajan Krishna
Reznik-Zellen Rebecca
Schad Peter
Tuominen Mark T.
Publication venue
Publication date: 01/11/2010
Field of study

Text mining for biology - the way forward: opinions from leading scientists

Author: Altman Russ B
Bergman Casey M
Blake Judith
Blaschke Christian
Cohen Aaron
Gannon Frank
Grivell Les
Hahn Udo
Hersh William
Hirschman Lynette
Jensen Lars Juhl
Krallinger Martin
Mons Barend
O'Donoghue Seán I
Peitsch Manuel C
Rebholz-Schuhmann Dietrich
Shatkay Hagit
Valencia Alfonso
Publication venue: BioMed Central
Publication date: 01/01/2008
Field of study

This article collects opinions from leading scientists about how text mining can provide better access to the biological literature, how the scientific community can help with this process, what the next steps are, and what role future BioCreative evaluations can play. The responses identify several broad themes, including the possibility of fusing literature and biological databases through text mining; the need for user interfaces tailored to different classes of users and supporting community-based annotation; the importance of scaling text mining technology and inserting it into larger workflows; and suggestions for additional challenge evaluations, new applications, and additional resources needed to make progress

Springer - Publisher Connector

PubMed Central

Copenhagen University Research Information System

EUR Research Repository

UNSWorks

The University of Manchester - Institutional Repository

Overview of the interactive task in BioCreative V

Author: Van Auken Kimberly
Wang Qinghua
Wang Xiaodong
Publication venue: 'Oxford University Press (OUP)'
Publication date: 01/01/2016
Field of study

Fully automated text mining (TM) systems promote efficient literature searching, retrieval, and review but are not sufficient to produce ready-to-consume curated documents. These systems are not meant to replace biocurators, but instead to assist them in one or more literature curation steps. To do so, the user interface is an important aspect that needs to be considered for tool adoption. The BioCreative Interactive task (IAT) is a track designed for exploring user-system interactions, promoting development of useful TM tools, and providing a communication channel between the biocuration and the TM communities. In BioCreative V, the IAT track followed a format similar to previous interactive tracks, where the utility and usability of TM tools, as well as the generation of use cases, have been the focal points. The proposed curation tasks are user-centric and formally evaluated by biocurators. In BioCreative V IAT, seven TM systems and 43 biocurators participated. Two levels of user participation were offered to broaden curator involvement and obtain more feedback on usability aspects. The full level participation involved training on the system, curation of a set of documents with and without TM assistance, tracking of time-on-task, and completion of a user survey. The partial level participation was designed to focus on usability aspects of the interface and not the performance per se. In this case, biocurators navigated the system by performing pre-designed tasks and then were asked whether they were able to achieve the task and the level of difficulty in completing the task. In this manuscript, we describe the development of the interactive task, from planning to execution and discuss major findings for the systems tested

Caltech Authors

Preliminary evaluation of the CellFinder literature curation pipeline for gene expression in kidney cells and anatomical parts

Author: Damaschun A.
Fontaine J.F.
Kurtz A.
Lekschas F.
Leser U.
Mah N.
Neves M.
Seltmann S.
Stachelscheid H.
Publication venue: 'Oxford University Press (OUP)'
Publication date: 01/01/2013
Field of study

Biomedical literature curation is the process of automatically and/or manually deriving knowledge from scientific publications and recording it into specialized databases for structured delivery to users. It is a slow, error-prone, complex, costly and, yet, highly important task. Previous experiences have proven that text mining can assist in its many phases, especially, in triage of relevant documents and extraction of named entities and biological events. Here, we present the curation pipeline of the CellFinder database, a repository of cell research, which includes data derived from literature curation and microarrays to identify cell types, cell lines, organs and so forth, and especially patterns in gene expression. The curation pipeline is based on freely available tools in all text mining steps, as well as the manual validation of extracted data. Preliminary results are presented for a data set of 2376 full texts from which >4500 gene expression events in cell or anatomical part have been extracted. Validation of half of this data resulted in a precision of ~50% of the extracted data, which indicates that we are on the right track with our pipeline for the proposed task. However, evaluation of the methods shows that there is still room for improvement in the named-entity recognition and that a larger and more robust corpus is needed to achieve a better performance for event extraction. Database URL: http://www.cellfinder.org

CiteSeerX

SNU Open Repository and Archive

PubMed Central

MDC Repository

Overview of the interactive task in BioCreative V

Author: Afroza K. Irin
Andrew Chatr-Aryamontri
Arighi
Arighi
Arighi
Barbra Ferrell
Cathy H. Wu
Cecilia N. Arighi
Chu-Hsien Su
Comeau
David Campos
David Salgado
Emiliano Pereira
Evangelos Pafilis
Fabio Rinaldi
Gabriela Contreras
Georgios Gkoutos
Hamsa D. Tadepally
Hirschman
Hong-Jie Dai
Hui-Jou Chou
Ingrid Keseler
Jeyakumar Natarajan
Johanna McEntyre
Juliane Fluck
Karen Rothfels
Kimberly Van Auken
Krallinger
Lara Almeida
Lars J. Jensen
Laurel Cooper
Likert
Loukia Tsaprouni
Lucy Chilton
Lynette Hirschman
Marija Milacic
Mary Schaeffer
Matthew Mort
Nancy George
Nicole Vasilevsky
Onkar Singh
Peter McQuilton
Qinghua Wang
Raquel M. Silva
Raul Rodriguez-Esteban
Raymund Stefancsik
Riza Batista-Navarro
Sandra Orchard
Sangya Pundir
Shabbir S. Abdul
Sherri Matis-Mitchell
Shruti Rao
Silvia Jimenez
Socorro Gama-Castro
Sophia Ananiadou
Stanley J. F. Laulederkind
Sumit Madan
Suresh Subramani
Sérgio Matos
Toni R. Jue
Wu
Xiaodong Wang
Yalbi I. Balderas-Martínez
Zhiyong Lu
Publication venue: 'Oxford University Press (OUP)'
Publication date: 01/01/2016
Field of study

University of Birmingham Research Portal

HAL AMU

The University of Manchester - Institutional Repository

MPG.PuRe

Hal-Diderot

University of Bedfordshire Repository

Crossref

Online Research @ Cardiff

HAL-Inserm

Copenhagen University Research Information System

PubMed Central

Oxford University Research Archive

Text-mining assisted regulatory annotation

Author: Aerts Stein
Bergman Casey M.
Griffith Obi L.
Haeussler Maximilian
Haussler Maximilian
Hulpiau Paco
Jones Steven J M
Montgomery Stephen B.
van Vooren Steven
Publication venue: BioMed Central
Publication date: 01/01/2008
Field of study

Text-mining technologies can be integrated with genome annotation systems, increasing the availability of annotated cis-regulatory data

Lirias

Crossref

Springer - Publisher Connector

Ghent University Academic Bibliography

PubMed Central

The University of Manchester - Institutional Repository

ProdInra

Identification of "pathologs" (disease-related genes) from the RIKEN mouse cDNA dataset using human curation plus FACTS, a new biological information extraction system

Author: Brusic Vladimir
Nagashima Takeshi
Petrovsky Nikolai
Schonbach Christian
Silva Diego
Socha L
Publication venue: 'Springer Science and Business Media LLC'
Publication date: 11/12/2015
Field of study

BACKGROUND: A major goal in the post-genomic era is to identify and characterise disease susceptibility genes and to apply this knowledge to disease prevention and treatment. Rodents and humans have remarkably similar genomes and share closely related biochemical, physiological and pathological pathways. In this work we utilised the latest information on the mouse transcriptome as revealed by the RIKEN FANTOM2 project to identify novel human disease-related candidate genes. We define a new term "patholog" to mean a homolog of a human disease-related gene encoding a product (transcript, anti-sense or protein) potentially relevant to disease. Rather than just focus on Mendelian inheritance, we applied the analysis to all potential pathologs regardless of their inheritance pattern. RESULTS: Bioinformatic analysis and human curation of 60,770 RIKEN full-length mouse cDNA clones produced 2,578 sequences that showed similarity (70–85% identity) to known human-disease genes. Using a newly developed biological information extraction and annotation tool (FACTS) in parallel with human expert analysis of 17,051 MEDLINE scientific abstracts we identified 182 novel potential pathologs. Of these, 36 were identified by computational tools only, 49 by human expert analysis only and 97 by both methods. These pathologs were related to neoplastic (53%), hereditary (24%), immunological (5%), cardio-vascular (4%), or other (14%), disorders. CONCLUSIONS: Large scale genome projects continue to produce a vast amount of data with potential application to the study of human disease. For this potential to be realised we need intelligent strategies for data categorisation and the ability to link sequence data with relevant literature. This paper demonstrates the power of combining human expert annotation with FACTS, a newly developed bioinformatics tool, to identify novel pathologs from within large-scale mouse transcript datasets

The Australian National University

A novel gluten knowledge base of potential biomedical and health-related interactions extracted from the literature: using machine learning and graph analysis methodologies to reconstruct the bibliome

Author: Fernández Riverola Florentino
Ferreira Tânia
Igrejas Gilberto
Perez Perez Martín
Publication venue: 'Elsevier BV'
Publication date: 06/06/2023
Field of study

Background In return for their nutritional properties and broad availability, cereal crops have been associated with different alimentary disorders and symptoms, with the majority of the responsibility being attributed to gluten. Therefore, the research of gluten-related literature data continues to be produced at ever-growing rates, driven in part by the recent exploratory studies that link gluten to non-traditional diseases and the popularity of gluten-free diets, making it increasingly difficult to access and analyse practical and structured information. In this sense, the accelerated discovery of novel advances in diagnosis and treatment, as well as exploratory studies, produce a favourable scenario for disinformation and misinformation. Objectives Aligned with, the European Union strategy “Delivering on EU Food Safety and Nutrition in 2050″ which emphasizes the inextricable links between imbalanced diets, the increased exposure to unreliable sources of information and misleading information, and the increased dependency on reliable sources of information; this paper presents GlutKNOIS, a public and interactive literature-based database that reconstructs and represents the experimental biomedical knowledge extracted from the gluten-related literature. The developed platform includes different external database knowledge, bibliometrics statistics and social media discussion to propose a novel and enhanced way to search, visualise and analyse potential biomedical and health-related interactions in relation to the gluten domain. Methods For this purpose, the presented study applies a semi-supervised curation workflow that combines natural language processing techniques, machine learning algorithms, ontology-based normalization and integration approaches, named entity recognition methods, and graph knowledge reconstruction methodologies to process, classify, represent and analyse the experimental findings contained in the literature, which is also complemented by data from the social discussion. Results and conclusions In this sense, 5814 documents were manually annotated and 7424 were fully automatically processed to reconstruct the first online gluten-related knowledge database of evidenced health-related interactions that produce health or metabolic changes based on the literature. In addition, the automatic processing of the literature combined with the knowledge representation methodologies proposed has the potential to assist in the revision and analysis of years of gluten research. The reconstructed knowledge base is public and accessible at https://sing-group.org/glutknois/Fundação para a Ciência e a Tecnologia | Ref. UIDB/50006/2020Xunta de Galicia | Ref. ED481B-2019-032Xunta de Galicia | Ref. ED431G2019/06Xunta de Galicia | Ref. ED431C 2022/03Universidade de Vigo/CISU

Investigo