12 research outputs found
Global text mining and development of pharmacogenomic knowledge resource for precision medicine
Understanding patients' genomic variations and their effect in protecting or predisposing them to drug response phenotypes is important for providing personalized healthcare. Several studies have manually curated such genotype-phenotype relationships into organized databases from clinical trial data or published literature. However, there are no text mining tools available to extract high-accuracy information from such existing knowledge. In this work, we used a semiautomated text mining approach to retrieve a complete pharmacogenomic (PGx) resource integrating disease-drug-gene-polymorphism relationships to derive a global perspective for ease in therapeutic approaches. We used an R package, pubmed.mineR, to automatically retrieve PGx-related literature. We identified 1,753 disease types, and 666 drugs, associated with 4,132 genes and 33,942 polymorphisms collated from 180,088 publications. With further manual curation, we obtained a total of 2,304 PGx relationships. We evaluated our approach by performance (precision = 0.806) with benchmark datasets like Pharmacogenomic Knowledgebase (PharmGKB) (0.904), Online Mendelian Inheritance in Man (OMIM) (0.600), and The Comparative Toxicogenomics Database (CTD) (0.729). We validated our study by comparing our results with 362 commercially used the US- Food and drug administration (FDA)-approved drug labeling biomarkers. Of the 2,304 PGx relationships identified, 127 belonged to the FDA list of 362 approved pharmacogenomic markers, indicating that our semiautomated text mining approach may reveal significant PGx information with markers for drug response prediction. In addition, it is a scalable and state-of-art approach in curation for PGx clinical utility
Discovery of novel biomarkers and phenotypes by semantic technologies.
Biomarkers and target-specific phenotypes are important to targeted drug design and individualized medicine, thus constituting an important aspect of modern pharmaceutical research and development. More and more, the discovery of relevant biomarkers is aided by in silico techniques based on applying data mining and computational chemistry on large molecular databases. However, there is an even larger source of valuable information available that can potentially be tapped for such discoveries: repositories constituted by research documents
An association-adjusted consensus deleterious scheme to classify homozygous Mis-sense mutations for personal genome interpretation
BACKGROUND: Personal genome analysis is now being considered for evaluation of disease risk in healthy individuals, utilizing both rare and common variants. Multiple scores have been developed to predict the deleteriousness of amino acid substitutions, using information on the allele frequencies, level of evolutionary conservation, and averaged structural evidence. However, agreement among these scores is limited and they likely over-estimate the fraction of the genome that is deleterious. METHOD: This study proposes an integrative approach to identify a subset of homozygous non-synonymous single nucleotide polymorphisms (nsSNPs). An 8-level classification scheme is constructed from the presence/absence of deleterious predictions combined with evidence of association with disease or complex traits. Detailed literature searches and structural validations are then performed for a subset of homozygous 826 mis-sense mutations in 575 proteins found in the genomes of 12 healthy adults. RESULTS: Implementation of the Association-Adjusted Consensus Deleterious Scheme (AACDS) classifies 11% of all predicted highly deleterious homozygous variants as most likely to influence disease risk. The number of such variants per genome ranges from 0 to 8 with no significant difference between African and Caucasian Americans. Detailed analysis of mutations affecting the APOE, MTMR2, THSB1, CHIA, αMyHC, and AMY2A proteins shows how the protein structure is likely to be disrupted, even though the associated phenotypes have not been documented in the corresponding individuals. CONCLUSIONS: The classification system for homozygous nsSNPs provides an opportunity to systematically rank nsSNPs based on suggestive evidence from annotations and sequence-based predictions. The ranking scheme, in-depth literature searches, and structural validations of highly prioritized mis-sense mutations compliment traditional sequence-based approaches and should have particular utility for the development of individualized health profiles. An online tool reporting the AACDS score for any variant is provided at the authorsâ website
Information retrieval and text mining technologies for chemistry
Efficient access to chemical information contained in scientific literature, patents, technical reports, or the web is a pressing need shared by researchers and patent attorneys from different chemical disciplines. Retrieval of important chemical information in most cases starts with finding relevant documents for a particular chemical compound or family. Targeted retrieval of chemical documents is closely connected to the automatic recognition of chemical entities in the text, which commonly involves the extraction of the entire list of chemicals mentioned in a document, including any associated information. In this Review, we provide a comprehensive and in-depth description of fundamental concepts, technical implementations, and current technologies for meeting these information demands. A strong focus is placed on community challenges addressing systems performance, more particularly CHEMDNER and CHEMDNER patents tasks of BioCreative IV and V, respectively. Considering the growing interest in the construction of automatically annotated chemical knowledge bases that integrate chemical information and biological data, cheminformatics approaches for mapping the extracted chemical names into chemical structures and their subsequent annotation together with text mining applications for linking chemistry with biological information are also presented. Finally, future trends and current challenges are highlighted as a roadmap proposal for research in this emerging field.A.V. and M.K. acknowledge funding from the European
Communityâs Horizon 2020 Program (project reference:
654021 - OpenMinted). M.K. additionally acknowledges the
Encomienda MINETAD-CNIO as part of the Plan for the
Advancement of Language Technology. O.R. and J.O. thank
the Foundation for Applied Medical Research (FIMA),
University of Navarra (Pamplona, Spain). This work was
partially funded by ConselleriÌa
de Cultura, EducacioÌn e OrdenacioÌn Universitaria (Xunta de Galicia), and FEDER (European Union), and the Portuguese Foundation for Science and Technology (FCT) under the scope of the strategic
funding of UID/BIO/04469/2013 unit and COMPETE 2020
(POCI-01-0145-FEDER-006684). We thank InÌigo GarciaÌ -Yoldi
for useful feedback and discussions during the preparation of
the manuscript.info:eu-repo/semantics/publishedVersio
AI and precision oncology in clinical cancer genomics : from prevention to targeted cancer therapies-an outcomes based patient care
Precision medicine is the personalization of medicine to suit a specific group of people or even an individual patient, based on genetic or molecular profiling. This can be done using genomic, transcriptomic, epigenomic or proteomic information. Personalized medicine holds great promise, especially in cancer therapy and control, where precision oncology would allow medical practitioners to use this information to optimize the treatment of a patient. Personalized oncology for groups of individuals would also allow for the use of population group specific diagnostic or prognostic biomarkers. Additionally, this information can be used to track the progress of the disease or monitor the response of the patient to treatment. This can be used to establish the molecular basis for drug resistance and allow the targeting of the genes or pathways responsible for drug resistance. Personalized medicine requires the use of large data sets, which must be processed and analysed in order to identify the particular molecular patterns that can inform the decisions required for personalized care. However, the analysis of these large data sets is difficult and time consuming. This is further compounded by the increasing size of these datasets due to technologies such as next generation sequencing (NGS). These difficulties can be met through the use of artificial intelligence (AI) and machine learning (ML). These computational tools use specific neural networks, learning methods, decision making tools and algorithms to construct and improve on models for the analysis of different types of large data sets. These tools can also be used to answer specific questions. Artificial intelligence can also be used to predict the effects of genetic changes on protein structure and therefore function. This review will discuss the current state of the application of AI to omics data, specifically genomic data, and how this is applied to the development of personalized or precision medicine on the treatment of cancer.The South African Medical Research Council (SAMRC) and the National Research Foundation (NRF).https://www.elsevier.com/locate/imuhj2023Anatomical PathologyMaxillo-Facial and Oral SurgeryMedical OncologyOtorhinolaryngologyRadiologySurgeryUrolog
Mineração de informação biomĂ©dica a partir de literatura cientĂfica
Doutoramento conjunto MAP-iThe rapid evolution and proliferation of a world-wide computerized network,
the Internet, resulted in an overwhelming and constantly growing
amount of publicly available data and information, a fact that was also verified
in biomedicine. However, the lack of structure of textual data inhibits
its direct processing by computational solutions. Information extraction is
the task of text mining that intends to automatically collect information
from unstructured text data sources. The goal of the work described in this
thesis was to build innovative solutions for biomedical information extraction
from scientific literature, through the development of simple software
artifacts for developers and biocurators, delivering more accurate, usable
and faster results. We started by tackling named entity recognition - a crucial
initial task - with the development of Gimli, a machine-learning-based
solution that follows an incremental approach to optimize extracted linguistic
characteristics for each concept type. Afterwards, Totum was built to
harmonize concept names provided by heterogeneous systems, delivering a
robust solution with improved performance results. Such approach takes
advantage of heterogenous corpora to deliver cross-corpus harmonization
that is not constrained to specific characteristics. Since previous solutions
do not provide links to knowledge bases, Neji was built to streamline the
development of complex and custom solutions for biomedical concept name
recognition and normalization. This was achieved through a modular and
flexible framework focused on speed and performance, integrating a large
amount of processing modules optimized for the biomedical domain. To
offer on-demand heterogenous biomedical concept identification, we developed
BeCAS, a web application, service and widget. We also tackled relation
mining by developing TrigNER, a machine-learning-based solution for
biomedical event trigger recognition, which applies an automatic algorithm
to obtain the best linguistic features and model parameters for each event
type. Finally, in order to assist biocurators, Egas was developed to support
rapid, interactive and real-time collaborative curation of biomedical documents,
through manual and automatic in-line annotation of concepts and
relations. Overall, the research work presented in this thesis contributed
to a more accurate update of current biomedical knowledge bases, towards
improved hypothesis generation and knowledge discovery.A råpida evolução e proliferação de uma rede mundial de computadores, a
Internet, resultou num esmagador e constante crescimento na quantidade
de dados e informação publicamente disponĂveis, o que tambĂ©m se verificou
na biomedicina. No entanto, a inexistĂȘncia de estrutura em dados textuais
inibe o seu processamento direto por parte de soluçÔes informatizadas. Extração
de informação é a tarefa de mineração de texto que pretende extrair
automaticamente informação de fontes de dados de texto não estruturados.
O objetivo do trabalho descrito nesta tese foi essencialmente focado em
construir soluçÔes inovadoras para extração de informação biomédica a partir
da literatura cientĂfica, atravĂ©s do desenvolvimento de aplicaçÔes simples
de usar por programadores e bio-curadores, capazes de fornecer resultados
mais precisos, usåveis e de forma mais råpida. Começåmos por abordar o
reconhecimento de nomes de conceitos - uma tarefa inicial e fundamental -
com o desenvolvimento de Gimli, uma solução baseada em inteligĂȘncia artificial
que aplica uma estratĂ©gia incremental para otimizar as caracterĂsticas
linguĂsticas extraĂdas do texto para cada tipo de conceito. Posteriormente,
Totum foi implementado para harmonizar nomes de conceitos provenientes
de sistemas heterogéneos, oferecendo uma solução mais robusta e com melhores
resultados. Esta aproximação recorre a informação contida em corpora
heterogĂ©neos para disponibilizar uma solução nĂŁo restrita Ă s caracterĂstica
de um Ășnico corpus. Uma vez que as soluçÔes anteriores nĂŁo oferecem
ligação dos nomes a bases de conhecimento, Neji foi construĂdo para facilitar
o desenvolvimento de soluçÔes complexas e personalizadas para o
reconhecimento de conceitos nomeados e respectiva normalização. Isto foi
conseguido atravĂ©s de uma plataforma modular e flexĂvel focada em rapidez
e desempenho, integrando um vasto conjunto de mĂłdulos de processamento
optimizados para o domĂnio biomĂ©dico. De forma a disponibilizar identificação
de conceitos biomédicos em tempo real, BeCAS foi desenvolvido para
oferecer um serviço, aplicação e widget Web. A extracção de relaçÔes entre
conceitos também foi abordada através do desenvolvimento de TrigNER,
uma solução baseada em inteligĂȘncia artificial para o reconhecimento de
palavras que desencadeiam a ocorrĂȘncia de eventos biomĂ©dicos. Esta ferramenta
aplica um algoritmo automĂĄtico para encontrar as melhores caracterĂsticas
linguĂsticas e parĂąmetros para cada tipo de evento. Finalmente,
de forma a auxiliar o trabalho de bio-curadores, Egas foi desenvolvido para
suportar a anotação råpida, interactiva e colaborativa em tempo real de
documentos biomédicos, através da anotação manual e automåtica de conceitos
e relaçÔes de forma contextualizada. Resumindo, este trabalho contribuiu
para a actualização mais precisa das actuais bases de conhecimento,
auxiliando a formulação de hipóteses e a descoberta de novo conhecimento