Search CORE

12 research outputs found

The state of the art in text mining and natural language processing for pharmacogenomics

Author: Altman Russ B.
Cohen K. Bretonnel
Coulet Adrien
Publication venue: Elsevier Inc.
Publication date: 31/10/2012
Field of study

Global text mining and development of pharmacogenomic knowledge resource for precision medicine

Author: Adithan C.
Bora Shivangi
Grover Sandeep
Guin Debleena
Hasija Yasha
Karthikeyan Muthusamy
Kukreti Ritushree
Ramachandran S.
Rani Jyoti
Saso Luciano
Satyamoorthy K.
Singh Priyanka
Talwar Puneet
Publication venue: 'Frontiers Media SA'
Publication date: 01/01/2019
Field of study

Understanding patients' genomic variations and their effect in protecting or predisposing them to drug response phenotypes is important for providing personalized healthcare. Several studies have manually curated such genotype-phenotype relationships into organized databases from clinical trial data or published literature. However, there are no text mining tools available to extract high-accuracy information from such existing knowledge. In this work, we used a semiautomated text mining approach to retrieve a complete pharmacogenomic (PGx) resource integrating disease-drug-gene-polymorphism relationships to derive a global perspective for ease in therapeutic approaches. We used an R package, pubmed.mineR, to automatically retrieve PGx-related literature. We identified 1,753 disease types, and 666 drugs, associated with 4,132 genes and 33,942 polymorphisms collated from 180,088 publications. With further manual curation, we obtained a total of 2,304 PGx relationships. We evaluated our approach by performance (precision = 0.806) with benchmark datasets like Pharmacogenomic Knowledgebase (PharmGKB) (0.904), Online Mendelian Inheritance in Man (OMIM) (0.600), and The Comparative Toxicogenomics Database (CTD) (0.729). We validated our study by comparing our results with 362 commercially used the US- Food and drug administration (FDA)-approved drug labeling biomarkers. Of the 2,304 PGx relationships identified, 127 belonged to the FDA list of 362 approved pharmacogenomic markers, indicating that our semiautomated text mining approach may reveal significant PGx information with markers for drug response prediction. In addition, it is a scalable and state-of-art approach in curation for PGx clinical utility

Archivio della ricerca- Università di Roma La Sapienza

Discovery of novel biomarkers and phenotypes by semantic technologies.

Author: Bureeva S
Peregrim D
Sharp ME
Trugenberger CA
Wälti C
Publication venue: 'Springer Science and Business Media LLC'
Publication date: 01/01/2013
Field of study

Biomarkers and target-specific phenotypes are important to targeted drug design and individualized medicine, thus constituting an important aspect of modern pharmaceutical research and development. More and more, the discovery of relevant biomarkers is aided by in silico techniques based on applying data mining and computational chemistry on large molecular databases. However, there is an even larger source of valuable information available that can potentially be tapped for such discoveries: repositories constituted by research documents

Crossref

PubMed Central

White Rose Research Online

Integrating 400 million variants from 80,000 human samples with extensive annotations: towards a knowledge base to analyze disease cohorts

Author
Publication venue: BioMed Central
Publication date: 08/01/2016
Field of study

Springer - Publisher Connector

An association-adjusted consensus deleterious scheme to classify homozygous Mis-sense mutations for personal genome interpretation

Author: Greg Gibson
Thanawadee Preeprem
Publication venue: Springer Nature
Publication date: 23/12/2013
Field of study

BACKGROUND: Personal genome analysis is now being considered for evaluation of disease risk in healthy individuals, utilizing both rare and common variants. Multiple scores have been developed to predict the deleteriousness of amino acid substitutions, using information on the allele frequencies, level of evolutionary conservation, and averaged structural evidence. However, agreement among these scores is limited and they likely over-estimate the fraction of the genome that is deleterious. METHOD: This study proposes an integrative approach to identify a subset of homozygous non-synonymous single nucleotide polymorphisms (nsSNPs). An 8-level classification scheme is constructed from the presence/absence of deleterious predictions combined with evidence of association with disease or complex traits. Detailed literature searches and structural validations are then performed for a subset of homozygous 826 mis-sense mutations in 575 proteins found in the genomes of 12 healthy adults. RESULTS: Implementation of the Association-Adjusted Consensus Deleterious Scheme (AACDS) classifies 11% of all predicted highly deleterious homozygous variants as most likely to influence disease risk. The number of such variants per genome ranges from 0 to 8 with no significant difference between African and Caucasian Americans. Detailed analysis of mutations affecting the APOE, MTMR2, THSB1, CHIA, αMyHC, and AMY2A proteins shows how the protein structure is likely to be disrupted, even though the associated phenotypes have not been documented in the corresponding individuals. CONCLUSIONS: The classification system for homozygous nsSNPs provides an opportunity to systematically rank nsSNPs based on suggestive evidence from annotations and sequence-based predictions. The ranking scheme, in-depth literature searches, and structural validations of highly prioritized mis-sense mutations compliment traditional sequence-based approaches and should have particular utility for the development of individualized health profiles. An online tool reporting the AACDS score for any variant is provided at the authors’ website

Springer - Publisher Connector

PubMed Central

Information retrieval and text mining technologies for chemistry

Author: Abacha A. B.
Alberts D.
Alfonso Valencia
American Chemical Society
Anália Lourenço
Aphinyanaphongs Y.
Appelt D. E.
Aramaki E.
Aronson A. R.
Asahara M.
Babych B.
Baeza-Yates R.
Bambenek J.
Barnard J. M.
Bast H.
Batista-Navarro R.
Batista-Navarro R. T.
Bian J.
Bies A.
Bikel D. M.
Blaschke C.
Brecher J. S.
Brill E.
Bunescu R.
Bunescu R. C.
Califf M. E.
Carpenter B.
Caruana R.
Chee B. W.
Chhieng D.
Chinchor N.
Chiticariu L.
Chowdhury M. F. M.
Chowdhury M. F. M.
Ciravegna F.
Cleverdon C. W.
Coden A.
Cohen R.
Collier N.
Corbett P.
Corbett P.
Cover T. M.
Craven M.
Cummings M. D.
Currano J. N.
Currano J. N.
Currano J. N.
Currano J. N.
Cutting D. R.
Davis C. H.
Dieb T. M.
Dieb T. M.
Dogan R. I.
Downs G. M.
Dunikowski L. G.
Embarek M.
Eom J.-H.
Faber J.
Fall C. J.
Fattore M.
Fennell R. W.
Freund Y.
Fujiyoshi A.
Fukuda K.
Gale W. A.
Garcelon N.
Garnier J.-P.
Garten Y.
Ginn R.
Giuliano C.
Gold S.
Grefenstette G.
Grishman R.
Gurulingappa H.
Gurulingappa H.
Gusfield D.
He Y.
Hearst M. A.
Hersh W.
Hersh W.
Hirschman L.
Hobbs J. R.
Hodge G. M.
Holzinger A.
Hsueh P.-Y.
Huber T.
Iyer S. V
Jackson P.
Joachims T.
Johnson D.
Jonnalagadda S.
Jonnalagadda S.
Julen Oyarzabal
Jurafsky D.
Kaewphan S.
Kaewphan S.
Karkaletsis V.
Katragadda S.
Kazama J.
Kazawa H.
Kelly L.
Kenny P. W.
Kim J.-D.
Kim Y.
Kleene S. C.
Kolárik C.
Kongburan W.
Kornai A.
Kraaij W.
Krallinger M.
Krallinger M.
Krallinger M.
Kremer G.
Kreuzthaler M.
Kucera H.
Lai H.
Lawson A. J.
Leaman R.
Leaman R.
Lee C.-H.
Levenshtein V. I.
Levin M. A.
Li J.
Li N.
Li Y.
Liu X.
Locke W. N.
Lovins J. B.
Lowe D. M.
Lupu M.
Lupu M.
Mackenzie C. E.
Manning C. D.
Mansouri A.
Martin E.
Martin Krallinger
Mattmann C.
Maynard D.
McCallum A.
McEwen L.
McKnight L.
McNaught A.
Meystre S. M.
Michalski S. R.
Michie D.
Mihalcea R.
Mitton R.
Miwa M.
Mollá D.
Murray-Rust P.
Müller B.
Nebel A.
Nikfarjam A.
Névéol A.
Névéol A.
Obdulia Rabal
Pang B.
Panico R.
Perez-Iratxeta C.
Ponomareva N.
Ratinov L.
Ratnaparkhi A.
Read J.
Rebholz-Schuhmann D.
Reeker L. H.
Rocchio J. J.
Rohbeck H.-G.
Rosario B.
Roth D. L.
Rupp C. J.
Rupp C. J.
Sagae K.
Salim N.
Salton G.
Sanchez-Cisneros D.
Saracevic T.
Sasaki Y.
Schapire R. E.
Schenck R.
Schenck R. J.
Schlaf A.
Schuemie M. J.
Segura Bedmar I.
Segura-Bedmar I.
Sekine S.
Sequeira E.
Settles B.
Settles B.
Sewell W.
Shen D.
Shidha M. V
Singhal A.
Smith E. G.
Stamatatos E.
Sutton C.
Sætre R.
Taylor K. T.
Tharatipyakul A.
Tomanek K.
Tomanek K.
Tsuruoka Y.
Tsuruoka Y.
Täger W.
Urbain J.
van Rijsbergen C. J.
Vapnik V. N.
Vasserman A.
Visweswaran S.
Voorhees E. M.
Wang W.
Wang Y.
Wei C.-H.
Wei C.-H.
Wermter J.
Wilbur W. J.
Willett P.
Willett P.
Williams A. J.
Witten I. H.
Workman M. L.
Wrublewski D. T.
Xu R.
Xue N.
Yan S.
Yang C.
Yang C. C.
Yang Y.
Zass E.
Zipf G. K.
Zipf G. K.
Zitnik S.
Publication venue: 'American Chemical Society (ACS)'
Publication date: 01/01/2017
Field of study

Efficient access to chemical information contained in scientific literature, patents, technical reports, or the web is a pressing need shared by researchers and patent attorneys from different chemical disciplines. Retrieval of important chemical information in most cases starts with finding relevant documents for a particular chemical compound or family. Targeted retrieval of chemical documents is closely connected to the automatic recognition of chemical entities in the text, which commonly involves the extraction of the entire list of chemicals mentioned in a document, including any associated information. In this Review, we provide a comprehensive and in-depth description of fundamental concepts, technical implementations, and current technologies for meeting these information demands. A strong focus is placed on community challenges addressing systems performance, more particularly CHEMDNER and CHEMDNER patents tasks of BioCreative IV and V, respectively. Considering the growing interest in the construction of automatically annotated chemical knowledge bases that integrate chemical information and biological data, cheminformatics approaches for mapping the extracted chemical names into chemical structures and their subsequent annotation together with text mining applications for linking chemistry with biological information are also presented. Finally, future trends and current challenges are highlighted as a roadmap proposal for research in this emerging field.A.V. and M.K. acknowledge funding from the European Community’s Horizon 2020 Program (project reference: 654021 - OpenMinted). M.K. additionally acknowledges the Encomienda MINETAD-CNIO as part of the Plan for the Advancement of Language Technology. O.R. and J.O. thank the Foundation for Applied Medical Research (FIMA), University of Navarra (Pamplona, Spain). This work was partially funded by Consellería de Cultura, Educación e Ordenación Universitaria (Xunta de Galicia), and FEDER (European Union), and the Portuguese Foundation for Science and Technology (FCT) under the scope of the strategic funding of UID/BIO/04469/2013 unit and COMPETE 2020 (POCI-01-0145-FEDER-006684). We thank Iñigo Garciá -Yoldi for useful feedback and discussions during the preparation of the manuscript.info:eu-repo/semantics/publishedVersio

Universidade do Minho: RepositoriUM

Crossref

LAReferencia - Red Federada de Repositorios Institucionales de Publicaciones Científicas Latinoamericanas

UPCommons. Portal del coneixement obert de la UPC

AI and precision oncology in clinical cancer genomics : from prevention to targeted cancer therapies-an outcomes based patient care

Author: Bida Nndweleni Meshack
Chauke-Malinga Nkhensani
Dlamini Zodwa
Hull Rodney
Kgoebane-Maseko Minah
Kgokolo C.M.
Khanyile Richard
Kim Namkug
Lockhat Zarina I.
Mabongo Mzubanzi
Mathabe Kgomotso
Mbatha Sikhumbuzo
Mkhabele Mahlori
Molefi Thulo
Mulaudzi Thanyani Victor
Ramagaga Serwalo
Setlai Botle
Skepu Amanda
Publication venue: Elsevier
Publication date: 01/01/2022
Field of study

Precision medicine is the personalization of medicine to suit a specific group of people or even an individual patient, based on genetic or molecular profiling. This can be done using genomic, transcriptomic, epigenomic or proteomic information. Personalized medicine holds great promise, especially in cancer therapy and control, where precision oncology would allow medical practitioners to use this information to optimize the treatment of a patient. Personalized oncology for groups of individuals would also allow for the use of population group specific diagnostic or prognostic biomarkers. Additionally, this information can be used to track the progress of the disease or monitor the response of the patient to treatment. This can be used to establish the molecular basis for drug resistance and allow the targeting of the genes or pathways responsible for drug resistance. Personalized medicine requires the use of large data sets, which must be processed and analysed in order to identify the particular molecular patterns that can inform the decisions required for personalized care. However, the analysis of these large data sets is difficult and time consuming. This is further compounded by the increasing size of these datasets due to technologies such as next generation sequencing (NGS). These difficulties can be met through the use of artificial intelligence (AI) and machine learning (ML). These computational tools use specific neural networks, learning methods, decision making tools and algorithms to construct and improve on models for the analysis of different types of large data sets. These tools can also be used to answer specific questions. Artificial intelligence can also be used to predict the effects of genetic changes on protein structure and therefore function. This review will discuss the current state of the application of AI to omics data, specifically genomic data, and how this is applied to the development of personalized or precision medicine on the treatment of cancer.The South African Medical Research Council (SAMRC) and the National Research Foundation (NRF).https://www.elsevier.com/locate/imuhj2023Anatomical PathologyMaxillo-Facial and Oral SurgeryMedical OncologyOtorhinolaryngologyRadiologySurgeryUrolog

UPSpace at the University of Pretoria

A modular framework for biomedical concept recognition

Author: AA Morgan
AR Aronson
AR Aronson
AS Schwartz
C Jonquet
D Campos
D Campos
D Campos
D Crockford
D Ferrucci
D Rebholz-Schuhmann
D Rebholz-Schuhmann
David Campos
E Loper
EF Tjong Kim Sang
G Zhou
H Cunningham
H Liu
H Yu
J Hakenberg
J Hakenberg
J Wermter
J-J Kim
JD Kim
José Luís Oliveira
K Degtyarenko
K Sagae
K Verspoor
L Smith
L Tanabe
M Ashburner
M Bada
M Gerner
N Elhadad
N Kang
O Bodenreider
P Coppernoll-Blach
P Stenetorp
P Thompson
R Bunescu
R Jelier
R Leaman
RI Doğan
S Matos
S Pyysalo
Sérgio Matos
T Nunes
T Ohta
T Ohta
U Hahn
Y He
Y Kano
Y Tateisi
Y Tsuruoka
Z Lu
Publication venue: 'Springer Science and Business Media LLC'
Publication date
Field of study

Crossref

Mineração de informação biomédica a partir de literatura científica

Author: Campos David Emmanuel Marques
Publication venue: Universidade de Aveiro
Publication date: 01/01/2013
Field of study

Doutoramento conjunto MAP-iThe rapid evolution and proliferation of a world-wide computerized network, the Internet, resulted in an overwhelming and constantly growing amount of publicly available data and information, a fact that was also verified in biomedicine. However, the lack of structure of textual data inhibits its direct processing by computational solutions. Information extraction is the task of text mining that intends to automatically collect information from unstructured text data sources. The goal of the work described in this thesis was to build innovative solutions for biomedical information extraction from scientific literature, through the development of simple software artifacts for developers and biocurators, delivering more accurate, usable and faster results. We started by tackling named entity recognition - a crucial initial task - with the development of Gimli, a machine-learning-based solution that follows an incremental approach to optimize extracted linguistic characteristics for each concept type. Afterwards, Totum was built to harmonize concept names provided by heterogeneous systems, delivering a robust solution with improved performance results. Such approach takes advantage of heterogenous corpora to deliver cross-corpus harmonization that is not constrained to specific characteristics. Since previous solutions do not provide links to knowledge bases, Neji was built to streamline the development of complex and custom solutions for biomedical concept name recognition and normalization. This was achieved through a modular and flexible framework focused on speed and performance, integrating a large amount of processing modules optimized for the biomedical domain. To offer on-demand heterogenous biomedical concept identification, we developed BeCAS, a web application, service and widget. We also tackled relation mining by developing TrigNER, a machine-learning-based solution for biomedical event trigger recognition, which applies an automatic algorithm to obtain the best linguistic features and model parameters for each event type. Finally, in order to assist biocurators, Egas was developed to support rapid, interactive and real-time collaborative curation of biomedical documents, through manual and automatic in-line annotation of concepts and relations. Overall, the research work presented in this thesis contributed to a more accurate update of current biomedical knowledge bases, towards improved hypothesis generation and knowledge discovery.A rápida evolução e proliferação de uma rede mundial de computadores, a Internet, resultou num esmagador e constante crescimento na quantidade de dados e informação publicamente disponíveis, o que também se verificou na biomedicina. No entanto, a inexistência de estrutura em dados textuais inibe o seu processamento direto por parte de soluções informatizadas. Extração de informação é a tarefa de mineração de texto que pretende extrair automaticamente informação de fontes de dados de texto não estruturados. O objetivo do trabalho descrito nesta tese foi essencialmente focado em construir soluções inovadoras para extração de informação biomédica a partir da literatura científica, através do desenvolvimento de aplicações simples de usar por programadores e bio-curadores, capazes de fornecer resultados mais precisos, usáveis e de forma mais rápida. Começámos por abordar o reconhecimento de nomes de conceitos - uma tarefa inicial e fundamental - com o desenvolvimento de Gimli, uma solução baseada em inteligência artificial que aplica uma estratégia incremental para otimizar as características linguísticas extraídas do texto para cada tipo de conceito. Posteriormente, Totum foi implementado para harmonizar nomes de conceitos provenientes de sistemas heterogéneos, oferecendo uma solução mais robusta e com melhores resultados. Esta aproximação recorre a informação contida em corpora heterogéneos para disponibilizar uma solução não restrita às característica de um único corpus. Uma vez que as soluções anteriores não oferecem ligação dos nomes a bases de conhecimento, Neji foi construído para facilitar o desenvolvimento de soluções complexas e personalizadas para o reconhecimento de conceitos nomeados e respectiva normalização. Isto foi conseguido através de uma plataforma modular e flexível focada em rapidez e desempenho, integrando um vasto conjunto de módulos de processamento optimizados para o domínio biomédico. De forma a disponibilizar identificação de conceitos biomédicos em tempo real, BeCAS foi desenvolvido para oferecer um serviço, aplicação e widget Web. A extracção de relações entre conceitos também foi abordada através do desenvolvimento de TrigNER, uma solução baseada em inteligência artificial para o reconhecimento de palavras que desencadeiam a ocorrência de eventos biomédicos. Esta ferramenta aplica um algoritmo automático para encontrar as melhores características linguísticas e parâmetros para cada tipo de evento. Finalmente, de forma a auxiliar o trabalho de bio-curadores, Egas foi desenvolvido para suportar a anotação rápida, interactiva e colaborativa em tempo real de documentos biomédicos, através da anotação manual e automática de conceitos e relações de forma contextualizada. Resumindo, este trabalho contribuiu para a actualização mais precisa das actuais bases de conhecimento, auxiliando a formulação de hipóteses e a descoberta de novo conhecimento

Repositório Institucional da Universidade de Aveiro