15 research outputs found
Mining Biomedical Literature to Extract Pharmacokinetic Drug-Drug Interactions
Indiana University-Purdue University Indianapolis (IUPUI)Polypharmacy is a general clinical practice, there is a high chance that multiple administered drugs will interfere with each other, such phenomenon is called drug-drug interaction (DDI). DDI occurs when drugs administered change each other's pharmacokinetic (PK) or pharmacodynamic (PD) response. DDIs in many ways affect the overall effectiveness of the drug or at some times pose a risk of serious side effects to the patients thus, it becomes very challenging to for the successful drug development and clinical patient care. Biomedical literature is rich source for in-vitro and in-vivo DDI reports and there is growing need to automated methods to extract the DDI related information from unstructured text. In this work we present an ontology (PK ontology), which defines annotation guidelines for annotation of PK DDI studies. Using the ontology we have put together a corpora of PK DDI studies, which serves as excellent resource for training machine learning, based DDI extraction algorithms. Finally we demonstrate the use of PK ontology and corpora for extracting PK DDIs from biomedical literature using machine learning algorithms
An ontology for drug-drug interactions
Proceedings of: The 6th International Workshop on Semantic Web Applications and Tools for Life Sciences (SWAT4LS 2013). Took place 2013, December 11-12, in Edinburgh, UK. The evnt Web site http://www.swat4ls.org/workshops/edinburgh2013/Drug-drug interactions form a significant risk group for adverse effects associ-ated with pharmaceutical treatment. These interactions are often reported in the literature, however, they are sparsely represented in machine-readable re-sources, such as online databases, thesauri or ontologies. These knowledge sources play a pivotal role in Natural Language Processing (NLP) systems since they provide a knowledge representation about the world or a particular do-main. While ontologies for drugs and their effects have proliferated in recent years, there is no ontology capable of describing and categorizing drug-drug in-teractions. Moreover, there is no artifact that represents all the possible mecha-nisms that can lead to a DDI. To fill this gap we propose DINTO, an ontology for drug-drug interactions and their mechanisms. In this paper we describe the classes, relationships and overall structure of DINTO. The ontology is free for use and available at https://code.google.com/p/dinto/This work was supported by the Regional Government of Madrid under the Research Network MA2VICMR [S2009/TIC-1542], by the Spanish Ministry of Education under the project MULTIMEDICA [TIN2010-20644-C03-01] and by the European Commission Seventh Framework Programme under the project TrendMiner_Enlarged (EU FP7-ICT 612336).Publicad
Propensity scoreโadjusted threeโcomponent mixture model for drugโdrug interaction data mining in FDA Adverse Event Reporting System
With increasing trend of polypharmacy, drug-drug interaction (DDI)-induced adverse drug events (ADEs) are considered as a major challenge for clinical practice. As premarketing clinical trials usually have stringent inclusion/exclusion criteria, limited comedication data capture and often times small sample size have limited values in study DDIs. On the other hand, ADE reports collected by spontaneous reporting system (SRS) become an important source for DDI studies. There are two major challenges in detecting DDI signals from SRS: confounding bias and false positive rate. In this article, we propose a novel approach, propensity score-adjusted three-component mixture model (PS-3CMM). This model can simultaneously adjust for confounding bias and estimate false discovery rate for all drug-drug-ADE combinations in FDA Adverse Event Reporting System (FAERS), which is a preeminent SRS database. In simulation studies, PS-3CMM performs better in detecting true DDIs comparing to the existing approach. It is more sensitive in selecting the DDI signals that have nonpositive individual drug relative ADE risk (NPIRR). The application of PS-3CMM is illustrated in analyzing the FAERS database. Compared to the existing approaches, PS-3CMM prioritizes DDI signals differently. PS-3CMM gives high priorities to DDI signals that have NPIRR. Both simulation studies and FAERS data analysis conclude that our new PS-3CMM is a new method that is complement to the existing DDI signal detection methods
Translational drug interaction study using text mining technology
Indiana University-Purdue University Indianapolis (IUPUI)Drug-Drug Interaction (DDI) is one of the major causes of adverse drug reaction (ADR) and
has been demonstrated to threat public health. It causes an estimated 195,000
hospitalizations and 74,000 emergency room visits each year in the USA alone. Current
DDI research aims to investigate different scopes of drug interactions: molecular level of
pharmacogenetics interaction (PG), pharmacokinetics interaction (PK), and clinical
pharmacodynamics consequences (PD). All three types of experiments are important, but
they are playing different roles for DDI research. As diverse disciplines and varied studies
are involved, interaction evidence is often not available cross all three types of evidence,
which create knowledge gaps and these gaps hinder both DDI and pharmacogenetics
research.
In this dissertation, we proposed to distinguish the three types of DDI evidence (in vitro
PK, in vivo PK, and clinical PD studies) and identify all knowledge gaps in experimental
evidence for them. This is a collective intelligence effort, whereby a text mining tool will
be developed for the large-scale mining and analysis of drug-interaction information such
that it can be applied to retrieve, categorize, and extract the information of DDI from
published literature available on PubMed. To this end, three tasks will be done in this
research work: First, the needed lexica, ontology, and corpora for distinguishing three
different types of studies were prepared. Despite the lexica prepared in this work, a
comprehensive dictionary for drug metabolites or reaction, which is critical to in vitro PK study, is still lacking in pubic databases. Thus, second, a name entity recognition tool will
be proposed to identify drug metabolites and reaction in free text. Third, text mining tools
for retrieving DDI articles and extracting DDI evidence are developed. In this work, the
knowledge gaps cross all three types of DDI evidence can be identified and the gaps
between knowledge of molecular mechanisms underlying DDI and their clinical
consequences can be closed with the result of DDI prediction using the retrieved drug
gene interaction information such that we can exemplify how the tools and methods can
advance DDI pharmacogenetics research.2 year
A review of the analytical techniques for the detection of anabolicโandrogenic steroids within biological matrices
Anabolicโandrogenic steroids (AASs) and other image and performance enhancing drugs (IPEDs) are controlled by governments and sport institutions such as the World Anti-doping Agency (WADA). Although elite athletes and professional bodybuilders are the most visible AAS abusers, the introduction of the internet has increased the accessibility of AASs, with use being observed among recreational gym goers at increasing prevalence. Despite reported increase in use, routine analysis for these substances is uncommon, with many forensic laboratories opting to outsource AAS analysis. This review collates information regarding the extraction and analysis of AASs from various biological matrices with the considered purpose of providing a reference for the development of AAS methods to allow for routine detection by forensic laboratories
Mining social media data for biomedical signals and health-related behavior
Social media data has been increasingly used to study biomedical and
health-related phenomena. From cohort level discussions of a condition to
planetary level analyses of sentiment, social media has provided scientists
with unprecedented amounts of data to study human behavior and response
associated with a variety of health conditions and medical treatments. Here we
review recent work in mining social media for biomedical, epidemiological, and
social phenomena information relevant to the multilevel complexity of human
health. We pay particular attention to topics where social media data analysis
has shown the most progress, including pharmacovigilance, sentiment analysis
especially for mental health, and other areas. We also discuss a variety of
innovative uses of social media data for health-related applications and
important limitations in social media data access and use.Comment: To appear in the Annual Review of Biomedical Data Scienc
Translational high-dimesional drug interaction discovery and validation using health record databases and pharmacokinetics models
Indiana University-Purdue University Indianapolis (IUPUI)Polypharmacy leads to increased risk of drug-drug interactions (DDIโs). In this
dissertation, we create a database for quantifying fraction of metabolism (fm) of CYP450
isozymes for FDA approved drugs. A reproducible data collection protocol was
developed to extract key information from publicly available in vitro selective CYP
enzyme inhibition studies. The fm was then estimated from the curated data. Then,
proposed a random control selection approach for nested case-control design for
electronical health records (HER) and electronical medical records (EMR) databases. By
relaxing the matching by caseโs index time restriction, random control dramatically
reduces the computational burden compared with traditional control selection
approaches. Using the Observational Medical Outcomes Partnership gold standard and
an EMR database, random control is demonstrated to have better performances as well.
Finally, combining epidemiological studies and pharmacokinetic modeling with fm
database, we detected and evaluated high-dimensional drug-drug interactions among
thirty high frequency drugs. Multi-drug combinations that increased risk of myopathy
were identified in the FAERS and EMR databases by a mixture drug-count response
model (MDCM) model. Twenty-eight 3-way and 43 4-way DDIโs increased ratio of area
under plasma concentrationโtime curve (AUCR) >2-fold and had significant myopathy
risk in both databases. The predicted AUCR of omeprazole in the presence of
fluconazole and clonidine was 9.35; and increased risk of myopathy was 6.41 (LFDR = 0.002) in FAERS and 18.46 (LFDR = 0.005) in EMR. We demonstrate that combining
health record informatics and pharmacokinetic modeling is a powerful translational
approach to detect high-dimensional DDIโs.2 year
์ฝ๋ฌผ ๊ฐ์๋ฅผ ์ํ ๋น์ ํ ํ ์คํธ ๋ด ์์ ์ ๋ณด ์ถ์ถ ์ฐ๊ตฌ
ํ์๋
ผ๋ฌธ(๋ฐ์ฌ) -- ์์ธ๋ํ๊ต๋ํ์ : ์ตํฉ๊ณผํ๊ธฐ์ ๋ํ์ ์์ฉ๋ฐ์ด์ค๊ณตํ๊ณผ, 2023. 2. ์ดํ๊ธฐ.Pharmacovigilance is a scientific activity to detect, evaluate and understand the occurrence of adverse drug events or other problems related to drug safety. However, concerns have been raised over the quality of drug safety information for pharmacovigilance, and there is also a need to secure a new data source to acquire drug safety information. On the other hand, the rise of pre-trained language models
based on a transformer architecture has accelerated the application of natural language processing (NLP) techniques in diverse domains. In this context, I tried to define two problems in pharmacovigilance as an NLP task and provide baseline models for the defined tasks: 1) extracting comprehensive drug safety information from adverse drug events narratives reported through a spontaneous reporting system (SRS) and 2) extracting drug-food interaction information from abstracts of biomedical articles. I developed annotation guidelines and performed manual annotation, demonstrating that strong NLP models can be trained to extracted clinical information from unstructrued free-texts by fine-tuning transformer-based language models on a high-quality annotated corpus. Finally, I discuss issues to consider when when developing annotation guidelines for extracting clinical information related to pharmacovigilance. The annotated corpora and the NLP models in this dissertation can streamline pharmacovigilance activities by enhancing the data quality of reported drug safety information and expanding the data sources.์ฝ๋ฌผ ๊ฐ์๋ ์ฝ๋ฌผ ๋ถ์์ฉ ๋๋ ์ฝ๋ฌผ ์์ ์ฑ๊ณผ ๊ด๋ จ๋ ๋ฌธ์ ์ ๋ฐ์์ ๊ฐ์ง, ํ๊ฐ ๋ฐ ์ดํดํ๊ธฐ ์ํ ๊ณผํ์ ํ๋์ด๋ค. ๊ทธ๋ฌ๋ ์ฝ๋ฌผ ๊ฐ์์ ์ฌ์ฉ๋๋ ์์ฝํ ์์ ์ฑ ์ ๋ณด์ ๋ณด๊ณ ํ์ง์ ๋ํ ์ฐ๋ ค๊ฐ ๊พธ์คํ ์ ๊ธฐ๋์์ผ๋ฉฐ, ํด๋น ๋ณด๊ณ ํ์ง์ ๋์ด๊ธฐ ์ํด์๋ ์์ ์ฑ ์ ๋ณด๋ฅผ ํ๋ณดํ ์๋ก์ด ์๋ฃ์์ด ํ์ํ๋ค. ํํธ ํธ๋์คํฌ๋จธ ์ํคํ
์ฒ๋ฅผ ๊ธฐ๋ฐ์ผ๋ก ์ฌ์ ํ๋ จ ์ธ์ด๋ชจ๋ธ์ด ๋ฑ์ฅํ๋ฉด์ ๋ค์ํ ๋๋ฉ์ธ์์ ์์ฐ์ด์ฒ๋ฆฌ ๊ธฐ์ ์ ์ฉ์ด ๊ฐ์ํ๋์๋ค. ์ด๋ฌํ ๋งฅ๋ฝ์์ ๋ณธ ํ์ ๋
ผ๋ฌธ์์๋ ์ฝ๋ฌผ ๊ฐ์๋ฅผ ์ํ ๋ค์ 2๊ฐ์ง ์ ๋ณด ์ถ์ถ ๋ฌธ์ ๋ฅผ ์์ฐ์ด์ฒ๋ฆฌ ๋ฌธ์ ํํ๋ก ์ ์ํ๊ณ ๊ด๋ จ ๊ธฐ์ค ๋ชจ๋ธ์ ๊ฐ๋ฐํ์๋ค: 1) ์๋์ ์ฝ๋ฌผ ๊ฐ์ ์ฒด๊ณ์ ๋ณด๊ณ ๋ ์ด์์ฌ๋ก ์์ ์๋ฃ์์ ํฌ๊ด์ ์ธ ์ฝ๋ฌผ ์์ ์ฑ ์ ๋ณด๋ฅผ ์ถ์ถํ๋ค. 2) ์๋ฌธ ์์ฝํ ๋
ผ๋ฌธ ์ด๋ก์์ ์ฝ๋ฌผ-์ํ ์ํธ์์ฉ ์ ๋ณด๋ฅผ ์ถ์ถํ๋ค. ์ด๋ฅผ ์ํด ์์ ์ฑ ์ ๋ณด ์ถ์ถ์ ์ํ ์ด๋
ธํ
์ด์
๊ฐ์ด๋๋ผ์ธ์ ๊ฐ๋ฐํ๊ณ ์์์
์ผ๋ก ์ด๋
ธํ
์ด์
์ ์ํํ์๋ค. ๊ฒฐ๊ณผ์ ์ผ๋ก ๊ณ ํ์ง์ ์์ฐ์ด ํ์ต๋ฐ์ดํฐ๋ฅผ ๊ธฐ๋ฐ์ผ๋ก ์ฌ์ ํ์ต ์ธ์ด๋ชจ๋ธ์ ๋ฏธ์ธ ์กฐ์ ํจ์ผ๋ก์จ ๋น์ ํ ํ
์คํธ์์ ์์ ์ ๋ณด๋ฅผ ์ถ์ถํ๋ ๊ฐ๋ ฅํ ์์ฐ์ด์ฒ๋ฆฌ ๋ชจ๋ธ ๊ฐ๋ฐ์ด ๊ฐ๋ฅํจ์ ํ์ธํ์๋ค. ๋ง์ง๋ง์ผ๋ก ๋ณธ ํ์ ๋
ผ๋ฌธ์์๋ ์ฝ๋ฌผ๊ฐ์์ ๊ด๋ จ๋์์ ์ ๋ณด ์ถ์ถ์ ์ํ ์ด๋
ธํ
์ด์
๊ฐ์ด๋๋ผ์ธ์ ๊ฐ๋ฐํ ๋ ๊ณ ๋ คํด์ผ ํ ์ฃผ์ ์ฌํญ์ ๋ํด ๋
ผ์ํ์๋ค. ๋ณธ ํ์ ๋
ผ๋ฌธ์์ ์๊ฐํ ์์ฐ์ด ํ์ต๋ฐ์ดํฐ์ ์์ฐ์ด์ฒ๋ฆฌ ๋ชจ๋ธ์ ์ฝ๋ฌผ ์์ ์ฑ ์ ๋ณด์ ๋ณด๊ณ ํ์ง์ ํฅ์์ํค๊ณ ์๋ฃ์์ ํ์ฅํ์ฌ ์ฝ๋ฌผ ๊ฐ์ ํ๋์ ๋ณด์กฐํ ๊ฒ์ผ๋ก ๊ธฐ๋๋๋ค.Chapter 1 1
1.1 Contributions of this dissertation 2
1.2 Overview of this dissertation 2
1.3 Other works 3
Chapter 2 4
2.1 Pharmacovigilance 4
2.2 Biomedical NLP for pharmacovigilance 6
2.2.1 Pre-trained language models 6
2.2.2 Corpora to extract clinical information for pharmacovigilance 9
Chapter 3 11
3.1 Motivation 12
3.2 Proposed Methods 14
3.2.1 Data source and text corpus 15
3.2.2 Annotation of ADE narratives 16
3.2.3 Quality control of annotation 17
3.2.4 Pretraining KAERS-BERT 18
3.2.6 Named entity recognition 20
3.2.7 Entity label classification and sentence extraction 21
3.2.8 Relation extraction 21
3.2.9 Model evaluation 22
3.2.10 Ablation experiment 23
3.3 Results 24
3.3.1 Annotated ICSRs 24
3.3.2 Corpus statistics 26
3.3.3 Performance of NLP models to extract drug safety information 28
3.3.4 Ablation experiment 31
3.4 Discussion 33
3.5 Conclusion 38
Chapter 4 39
4.1 Motivation 39
4.2 Proposed Methods 43
4.2.1 Data source 44
4.2.2 Annotation 45
4.2.3 Quality control of annotation 49
4.2.4 Baseline model development 49
4.3 Results 50
4.3.1 Corpus statistics 50
4.3.2 Annotation Quality 54
4.3.3 Performance of baseline models 55
4.3.4 Qualitative error analysis 56
4.4 Discussion 59
4.5 Conclusion 63
Chapter 5 64
5.1 Issues around defining a word entity 64
5.2 Issues around defining a relation between word entities 66
5.3 Issues around defining entity labels 68
5.4 Issues around selecting and preprocessing annotated documents 68
Chapter 6 71
6.1 Dissertation summary 71
6.2 Limitation and future works 72
6.2.1 Development of end-to-end information extraction models from free-texts to database based on existing structured information 72
6.2.2 Application of in-context learning framework in clinical information extraction 74
Chapter 7 76
7.1 Annotation Guideline for "Extraction of Comprehensive Drug Safety Information from Adverse Event Narratives Reported through Spontaneous Reporting System" 76
7.2 Annotation Guideline for "Extraction of Drug-Food Interactions from the Abtracts of Biomedical Articles" 100๋ฐ
Computational biology approaches in drug repurposing and gene essentiality screening
Indiana University-Purdue University Indianapolis (IUPUI)The rapid innovations in biotechnology have led to an exponential growth of data
and electronically accessible scientific literature. In this enormous scientific data,
knowledge can be exploited, and novel discoveries can be made. In my dissertation, I
have focused on the novel molecular mechanism and therapeutic discoveries from big
data for complex diseases. It is very evident today that complex diseases have many
factors including genetics and environmental effects. The discovery of these factors is
challenging and critical in personalized medicine. The increasing cost and time to
develop new drugs poses a new challenge in effectively treating complex diseases. In this
dissertation, we want to demonstrate that the use of existing data and literature as a
potential resource for discovering novel therapies and in repositioning existing drugs. The
key to identifying novel knowledge is in integrating information from decades of research
across the different scientific disciplines to uncover interactions that are not explicitly
stated. This puts critical information at the fingertips of researchers and clinicians who
can take advantage of this newly acquired knowledge to make informed decisions.
This dissertation utilizes computational biology methods to identify and integrate
existing scientific data and literature resources in the discovery of novel molecular targets
and drugs that can be repurposed. In chapters 1 of my dissertation, I extensively sifted
through scientific literature and identified a novel interaction between Vitamin A and CYP19A1 that could lead to a potential increase in the production of estrogens. Further in
chapter 2 by exploring a microarray dataset from an estradiol gene sensitivity study I was
able to identify a potential novel anti-estrogenic indication for the commonly used
urinary analgesic, phenazopyridine. Both discoveries were experimentally validated in
the laboratory. In chapter 3 of my dissertation, through the use of a manually curated
corpus and machine learning algorithms, I identified and extracted genes that are
essential for cell survival. These results brighten the reality that novel knowledge with
potential clinical applications can be discovered from existing data and literature by
integrating information across various scientific disciplines