498 research outputs found

    J Biomed Inform

    Get PDF
    Targeted anticancer drugs such as imatinib, trastuzumab and erlotinib dramatically improved treatment outcomes in cancer patients, however, these innovative agents are often associated with unexpected side effects. The pathophysiological mechanisms underlying these side effects are not well understood. The availability of a comprehensive knowledge base of side effects associated with targeted anticancer drugs has the potential to illuminate complex pathways underlying toxicities induced by these innovative drugs. While side effect association knowledge for targeted drugs exists in multiple heterogeneous data sources, published full-text oncological articles represent an important source of pivotal, investigational, and even failed trials in a variety of patient populations. In this study, we present an automatic process to extract targeted anticancer drug-associated side effects (drug-SE pairs) from a large number of high profile full-text oncological articles. We downloaded 13,855 full-text articles from the Journal of Oncology (JCO) published between 1983 and 2013. We developed text classification, relationship extraction, signaling filtering, and signal prioritization algorithms to extract drug-SE pairs from downloaded articles. We extracted a total of 26,264 drug-SE pairs with an average precision of 0.405, a recall of 0.899, and an F1 score of 0.465. We show that side effect knowledge from JCO articles is largely complementary to that from the US Food and Drug Administration (FDA) drug labels. Through integrative correlation analysis, we show that targeted drug-associated side effects positively correlate with their gene targets and disease indications. In conclusion, this unique database that we built from a large number of high-profile oncological articles could facilitate the development of computational models to understand toxic effects associated with targeted anticancer drugs.DP2 HD084068/HD/NICHD NIH HHS/United StatesDP2HD084068/DP/NCCDPHP CDC HHS/United StatesR25 CA094186/CA/NCI NIH HHS/United StatesR25CA094186-06/CA/NCI NIH HHS/United StatesUL1 RR024989/RR/NCRR NIH HHS/United StatesUL1 TR000439/TR/NCATS NIH HHS/United States2016-06-01T00:00:00Z25817969PMC458266

    BMC Bioinformatics

    Get PDF
    BackgroundSystems approaches to studying drug-side-effect (drug-SE) associations are emerging as an active research area for both drug target discovery and drug repositioning. However, a comprehensive drug-SE association knowledge base does not exist. In this study, we present a novel knowledge-driven (KD) approach to effectively extract a large number of drug-SE pairs from published biomedical literature.Data and methodsFor the text corpus, we used 21,354,075 MEDLINE records (119,085,682 sentences). First, we used known drug-SE associations derived from FDA drug labels as prior knowledge to automatically find SE-related sentences and abstracts. We then extracted a total of 49,575 drug-SE pairs from MEDLINE sentences and 180,454 pairs from abstracts.ResultsOn average, the KD approach has achieved a precision of 0.335, a recall of 0.509, and an F1 of 0.392, which is significantly better than a SVM-based machine learning approach (precision: 0.135, recall: 0.900, F1: 0.233) with a 73.0% increase in F1 score. Through integrative analysis, we demonstrate that the higher-level phenotypic drug-SE relationships reflects lower-level genetic, genomic, and chemical drug mechanisms. In addition, we show that the extracted drug-SE pairs can be directly used in drug repositioning.ConclusionIn summary, we automatically constructed a large-scale higher-level drug phenotype relationship knowledge, which can have great potential in computational drug discovery.DP2HD084068/DP/NCCDPHP CDC HHS/United StatesR25 CA094186-06/CA/NCI NIH HHS/United StatesUL1 RR024989/RR/NCRR NIH HHS/United States25860223PMC440259

    Doctor of Philosophy

    Get PDF
    dissertationNanoinformatics is a relatively young field of study that is important due to its implications in the field of nanomedicine, specifically toward the development of nanoparticle drug delivery systems. As more structural, biochemical, and physiochemical data become available regarding nanoparticles, the greater the knowledge-gain from using nanoinformatics methods will become. While there are challenges that exist with nanoparticle data, including heterogeneity of data and complexity of the particles, nanoinformatics will be at the forefront of processing these data and aid in the design of nanoparticles for biomedical applications. In this dissertation, a review of data mining and machine learning studies performed in the field of nanomedicine is presented. Next, the use of natural language processing methods to extract numeric values of biomedical property terms of poly(amido amine) (PAMAM) dendrimers from nanomedicine literature is demonstrated, along with successful extraction results. Following this is an implementation and its results of data mining techniques used for the development of predictive models of cytotoxicity of PAMAM dendrimers using their chemical and structural properties. Finally, a method and its results for using molecular dynamics simulations to test the ability of EDTA, as a gold standard, and generation 3.5 (G3.5) PAMAM dendrimers to chelate calcium

    Translational drug interaction study using text mining technology

    Get PDF
    Indiana University-Purdue University Indianapolis (IUPUI)Drug-Drug Interaction (DDI) is one of the major causes of adverse drug reaction (ADR) and has been demonstrated to threat public health. It causes an estimated 195,000 hospitalizations and 74,000 emergency room visits each year in the USA alone. Current DDI research aims to investigate different scopes of drug interactions: molecular level of pharmacogenetics interaction (PG), pharmacokinetics interaction (PK), and clinical pharmacodynamics consequences (PD). All three types of experiments are important, but they are playing different roles for DDI research. As diverse disciplines and varied studies are involved, interaction evidence is often not available cross all three types of evidence, which create knowledge gaps and these gaps hinder both DDI and pharmacogenetics research. In this dissertation, we proposed to distinguish the three types of DDI evidence (in vitro PK, in vivo PK, and clinical PD studies) and identify all knowledge gaps in experimental evidence for them. This is a collective intelligence effort, whereby a text mining tool will be developed for the large-scale mining and analysis of drug-interaction information such that it can be applied to retrieve, categorize, and extract the information of DDI from published literature available on PubMed. To this end, three tasks will be done in this research work: First, the needed lexica, ontology, and corpora for distinguishing three different types of studies were prepared. Despite the lexica prepared in this work, a comprehensive dictionary for drug metabolites or reaction, which is critical to in vitro PK study, is still lacking in pubic databases. Thus, second, a name entity recognition tool will be proposed to identify drug metabolites and reaction in free text. Third, text mining tools for retrieving DDI articles and extracting DDI evidence are developed. In this work, the knowledge gaps cross all three types of DDI evidence can be identified and the gaps between knowledge of molecular mechanisms underlying DDI and their clinical consequences can be closed with the result of DDI prediction using the retrieved drug gene interaction information such that we can exemplify how the tools and methods can advance DDI pharmacogenetics research.2 year

    Mapping Nanomedicine Terminology in the Regulatory Landscape

    Get PDF
    A common terminology is essential in any field of science and technology for a mutual understanding among different communities of experts and regulators, harmonisation of policy actions, standardisation of quality procedures and experimental testing, and the communication to the general public. It also allows effective revision of information for policy making and optimises research fund allocation. In particular, in emerging scientific fields with a high innovation potential, new terms, descriptions and definitions are quickly generated, which are then ambiguously used by stakeholders having diverse interests, coming from different scientific disciplines and/or from various regions. The application of nanotechnology in health -often called nanomedicine- is considered as such emerging and multidisciplinary field with a growing interest of various communities. In order to support a better understanding of terms used in the regulatory domain, the Nanomedicines Working Group of the International Pharmaceutical Regulators Forum (IPRF) has prioritised the need to map, compile and discuss the currently used terminology of regulatory scientists coming from different geographic areas. The JRC has taken the lead to identify and compile frequently used terms in the field by using web crawling and text mining tools as well as the manual extraction of terms. Websites of 13 regulatory authorities and clinical trial registries globally involved in regulating nanomedicines have been crawled. The compilation and analysis of extracted terms demonstrated sectorial and geographical differences in the frequency and type of nanomedicine related terms used in a regulatory context. Finally 31 relevant and most frequently used terms deriving from various agencies have been compiled, discussed and analysed for their similarities and differences. These descriptions will support the development of harmonised use of terminology in the future. The report provides necessary background information to advance the discussion among stakeholders. It will strengthen activities aiming to develop harmonised standards in the field of nanomedicine, which is an essential factor to stimulate innovation and industrial competitiveness.JRC.F.2-Consumer Products Safet

    Information Extraction from Text for Improving Research on Small Molecules and Histone Modifications

    Get PDF
    The cumulative number of publications, in particular in the life sciences, requires efficient methods for the automated extraction of information and semantic information retrieval. The recognition and identification of information-carrying units in text – concept denominations and named entities – relevant to a certain domain is a fundamental step. The focus of this thesis lies on the recognition of chemical entities and the new biological named entity type histone modifications, which are both important in the field of drug discovery. As the emergence of new research fields as well as the discovery and generation of novel entities goes along with the coinage of new terms, the perpetual adaptation of respective named entity recognition approaches to new domains is an important step for information extraction. Two methodologies have been investigated in this concern: the state-of-the-art machine learning method, Conditional Random Fields (CRF), and an approximate string search method based on dictionaries. Recognition methods that rely on dictionaries are strongly dependent on the availability of entity terminology collections as well as on its quality. In the case of chemical entities the terminology is distributed over more than 7 publicly available data sources. The join of entries and accompanied terminology from selected resources enables the generation of a new dictionary comprising chemical named entities. Combined with the automatic processing of respective terminology – the dictionary curation – the recognition performance reached an F1 measure of 0.54. That is an improvement by 29 % in comparison to the raw dictionary. The highest recall was achieved for the class of TRIVIAL-names with 0.79. The recognition and identification of chemical named entities provides a prerequisite for the extraction of related pharmacological relevant information from literature data. Therefore, lexico-syntactic patterns were defined that support the automated extraction of hypernymic phrases comprising pharmacological function terminology related to chemical compounds. It was shown that 29-50 % of the automatically extracted terms can be proposed for novel functional annotation of chemical entities provided by the reference database DrugBank. Furthermore, they are a basis for building up concept hierarchies and ontologies or for extending existing ones. Successively, the pharmacological function and biological activity concepts obtained from text were included into a novel descriptor for chemical compounds. Its successful application for the prediction of pharmacological function of molecules and the extension of chemical classification schemes, such as the the Anatomical Therapeutic Chemical (ATC), is demonstrated. In contrast to chemical entities, no comprehensive terminology resource has been available for histone modifications. Thus, histone modification concept terminology was primary recognized in text via CRFs with a F1 measure of 0.86. Subsequent, linguistic variants of extracted histone modification terms were mapped to standard representations that were organized into a newly assembled histone modification hierarchy. The mapping was accomplished by a novel developed term mapping approach described in the thesis. The combination of term recognition and term variant resolution builds up a new procedure for the assembly of novel terminology collections. It supports the generation of a term list that is applicable in dictionary-based methods. For the recognition of histone modification in text it could be shown that the named entity recognition method based on dictionaries is superior to the used machine learning approach. In conclusion, the present thesis provides techniques which enable an enhanced utilization of textual data, hence, supporting research in epigenomics and drug discovery

    Ontologies and Computational Methods for Traditional Chinese Medicine

    Get PDF
    Perinteinen kiinalainen lääketiede (PKL) on tuhansia vuosia vanha hoitomuoto, jonka tarkoituksena on terveyden ylläpito, tautien ennaltaehkäisemisen ja terveydellisten ongelmien hoito. Useat vuosittain julkaistavat tutkimukset tukevat hoitojen tehokkuutta ja PKL onkin jatkuvasti kasvattamassa suosiotaan maailmanlaajuisesti. Kiinassa PKL ollut suosittu hoitomuoto jo pitkään ja nykyään sitä harjoitetaan rinnakkain länsimaisen lääketieteen kanssa. Viime vuosikymmeninä tapahtuneen tietotekniikan kehityksen ja yleistymisen myötä myös PKL:n menetelmät ovat muuttuneet ja tietotekniikkaa on alettu hyödyntämään PKL:n tutkimuksessa. PKL:n tietoa on tallennettu digitaaliseen muotoon, minkä seurauksena on syntynyt suuri määrä erilaisia tietokantoja. Tieto on jakautunut eri tietokantoihin, joiden terminologia ei ole yhtenevää. Tämä aiheuttaa ongelmia tiedon löytämisessä ja tietoa hyödyntävien sovellusten kehittämisessä. Tässä työssä selvitetään, mitä PKL on, ja mikä sen asema on nykyään Kiinassa ja muualla maailmalla. Työn tarkoituksena on tutkia PKL:n tietoteknisten sovelluksen kehittämistä ja siihen liittyviä haasteita. Työssä perehdytään PKL:n ontologioiden ja semanttisten työkalujen toimintaan, sekä PKL:n laskennallisiin menetelmiin ja niiden tarjoamiin mahdollisuuksiin. Lisäksi kerrotaan uusimmista kansainvälisesti merkittävistä projekteista ja pohditaan tulevaisuuden näkymiä. Jo kehitetyt PKL:n tietotekniset sovellukset tarjoavat uusia mahdollisuuksia tiedon etsimiseen ja parantavat tutkijoiden mahdollisuutta jakaa tietoa ja tehdä yhteistyötä. Tietokoneavusteiset diagnoosityökalut ja asiantuntijajärjestelmät tarjoavat mahdollisuuksia lääkärin tekemän diagnoosin varmistamiseen. Tulevaisuudessa laskennallisia menetelmiä hyödyntäen voitaisiin tarjota terveyttä ja hyvinvointia edistäviä palveluja verkossa.Traditional Chinese Medicine (TCM) has been used for thousands of years in China for the purposes of health maintenance, disease prevention and treatment of health problems. Several published studies support the effectiveness of TCM treatments and the global use of TCM is constantly increasing. In China, Western and Chinese medicine are practiced in parallel. During the past few decades, the use of information technology in medicine has increased rapidly. The development of information technology has opened up new possibilities for information storage and sharing, as well as communication and interaction between people. Along with the growing use of information technology, a wide variety of patient databases and other electronic sources of information have emerged. However, the information is fragmented and dispersed, and the terminology is ambiguous. The objective of the thesis is to examine the position of TCM today, and to find out what changes and new opportunities the modern information technology brings for different aspects of TCM. This study describes how ontologies and semantic tools can be utilized when collecting existing knowledge and combining different databases. Also different computational methods and TCM expert systems are introduced. Finally, the most recent projects in the field of TCM are discussed and the future challenges are reflected. The computational methods for TCM, such as diagnostic tools and expert systems, could be very useful in anticipating and preventing health problems. E-science and knowledge discovery offer new ways for knowledge sharing and cooperation. TCM expert systems can be used to generate diagnosis or automatic clinical alerts. In the future, a comprehensive and easily accessible online health service system could be developed and used to improve the health and well-being of people
    corecore