652 research outputs found

    A Survey on Identification of Motifs and Ontology in Medical Database

    Get PDF
    Motifs and ontology are used in medical database for identifyingand diagnose of the disease. A motif is a pattern network used for analysis of the disease. It also identifies the pattern of the signal. Based on the motifs the disease can be predicted, classified and diagnosed. Ontology is knowledge based representation, and it is used as a user interface to diagnose the disease. Ontology is also used by medical expert to diagnose and analyse the disease easily. Gene ontology is used to express the gene of the disease

    ARIANA: Adaptive Robust and Integrative Analysis for finding Novel Associations

    Get PDF
    The effective mining of biological literature can provide a range of services such as hypothesis-generation, semantic-sensitive information retrieval, and knowledge discovery, which can be important to understand the confluence of different diseases, genes, and risk factors. Furthermore, integration of different tools at specific levels could be valuable. The main focus of the dissertation is developing and integrating tools in finding network of semantically related entities. The key contribution is the design and implementation of an Adaptive Robust and Integrative Analysis for finding Novel Associations. ARIANA is a software architecture and a web-based system for efficient and scalable knowledge discovery. It integrates semantic-sensitive analysis of text-data through ontology-mapping with database search technology to ensure the required specificity. ARIANA was prototyped using the Medical Subject Headings ontology and PubMed database and has demonstrated great success as a dynamic-data-driven system. ARIANA has five main components: (i) Data Stratification, (ii) Ontology-Mapping, (iii) Parameter Optimized Latent Semantic Analysis, (iv) Relevance Model and (v) Interface and Visualization. The other contribution is integration of ARIANA with Online Mendelian Inheritance in Man database, and Medical Subject Headings ontology to provide gene-disease associations. Empirical studies produced some exciting knowledge discovery instances. Among them was the connection between the hexamethonium and pulmonary inflammation and fibrosis. In 2001, a research study at John Hopkins used the drug hexamethonium on a healthy volunteer that ended in a tragic death due to pulmonary inflammation and fibrosis. This accident might have been prevented if the researcher knew of published case report. Since the original case report in 1955, there has not been any publications regarding that association. ARIANA extracted this knowledge even though its database contains publications from 1960 to 2012. Out of 2,545 concepts, ARIANA ranked “Scleroderma, Systemic”, “Neoplasms, Fibrous Tissue”, “Pneumonia”, “Fibroma”, and “Pulmonary Fibrosis” as the 13th, 16th, 38th, 174th and 257th ranked concept respectively. The researcher had access to such knowledge this drug would likely not have been used on healthy subjects.In today\u27s world where data and knowledge are moving away from each other, semantic-sensitive tools such as ARIANA can bridge that gap and advance dissemination of knowledge

    Computing Network of Diseases and Pharmacological Entities through the Integration of Distributed Literature Mining and Ontology Mapping

    Get PDF
    The proliferation of -omics (such as, Genomics, Proteomics) and -ology (such as, System Biology, Cell Biology, Pharmacology) have spawned new frontiers of research in drug discovery and personalized medicine. A vast amount (21 million) of published research results are archived in the PubMed and are continually growing in size. To improve the accessibility and utility of such a large number of literatures, it is critical to develop a suit of semantic sensitive technology that is capable of discovering knowledge and can also infer possible new relationships based on statistical co-occurrences of meaningful terms or concepts. In this context, this thesis presents a unified framework to mine a large number of literatures through the integration of latent semantic analysis (LSA) and ontology mapping. In particular, a parameter optimized, robust, scalable, and distributed LSA (DiLSA) technique was designed and implemented on a carefully selected 7.4 million PubMed records related to pharmacology. The DiLSA model was integrated with MeSH to make the model effective and efficient for a specific domain. An optimized multi-gram dictionary was customized by mapping the MeSH to build the DiLSA model. A fully integrated web-based application, called PharmNet, was developed to bridge the gap between biological knowledge and clinical practices. Preliminary analysis using the PharmNet shows an improved performance over global LSA model. A limited expert evaluation was performed to validate the retrieved results and network with biological literatures. A thorough performance evaluation and validation of results is in progress

    The Current Landscape of Genetic Testing in Cardiovascular Malformations: Opportunities and Challenges

    Get PDF
    Human cardiovascular malformations (CVMs) frequently have a genetic contribution. Through the application of novel technologies, such as next-generation sequencing, DNA sequence variants associated with CVMs are being identified at a rapid pace. While clinicians are now able to offer testing with NGS gene panels or whole exome sequencing to any patient with a CVM, the interpretation of genetic variation remains problematic. Variable phenotypic expression, reduced penetrance, inconsistent phenotyping methods, and the lack of high-throughput functional testing of variants contribute to these challenges. This article elaborates critical issues that impact the decision to broadly implement clinical molecular genetic testing in CVMs. Major benefits of testing include establishing a genetic diagnosis, facilitating cost-effective screening of family members who may have subclinical disease, predicting recurrence risk in offsprings, enabling early diagnosis and anticipatory management of CV and non-CV disease phenotypes, predicting long-term outcomes, and facilitating the development of novel therapies aimed at disease improvement or prevention. Limitations include financial cost, psychosocial cost, and ambiguity of interpretation of results. Multiplex families and patients with syndromic features are two groups where disease causation could potentially be firmly established. However, these account for the minority of the overall CVM population, and there is increasing recognition that genotypes previously associated with syndromes also exist in patients who lack non-CV findings. In all circumstances, ongoing dialog between cardiologists and clinical geneticists will be needed to accurately interpret genetic testing and improve these patients' health. This may be most effectively implemented by the creation and support of CV genetics services at centers committed to pursuing testing for patients

    Methodologically Grounded SemanticAnalysis of Large Volume of Chilean Medical Literature Data Applied to the Analysis of Medical Research Funding Efficiency in Chile

    Get PDF
    Background Medical knowledge is accumulated in scientific research papers along time. In order to exploit this knowledge by automated systems, there is a growing interest in developing text mining methodologies to extract, structure, and analyze in the shortest time possible the knowledge encoded in the large volume of medical literature. In this paper, we use the Latent Dirichlet Allocation approach to analyze the correlation between funding efforts and actually published research results in order to provide the policy makers with a systematic and rigorous tool to assess the efficiency of funding programs in the medical area. Results We have tested our methodology in the Revista Medica de Chile, years 2012-2015. 50 relevant semantic topics were identified within 643 medical scientific research papers. Relationships between the identified semantic topics were uncovered using visualization methods. We have also been able to analyze the funding patterns of scientific research underlying these publications. We found that only 29% of the publications declare funding sources, and we identified five topic clusters that concentrate 86% of the declared funds. Conclusions Our methodology allows analyzing and interpreting the current state of medical research at a national level. The funding source analysis may be useful at the policy making level in order to assess the impact of actual funding policies, and to design new policies.This research was partially funded by CONICYT, Programa de Formacion de Capital Humano avanzado (CONICYT-PCHA/Doctorado Nacional/2015-21150115). MG work in this paper has been partially supported by FEDER funds for the MINECO project TIN2017-85827-P, and projects KK-2018/00071 and KK2018/00082 of the Elkartek 2018 funding program. This project has received funding from the European Union's Horizon 2020 research and innovation programme under the Marie Sklodowska-Curie grant agreement No 777720. No role has been played by funding bodies in the design of the study and collection, analysis, or interpretation of data or in writing the manuscript

    Semantic systems biology of prokaryotes : heterogeneous data integration to understand bacterial metabolism

    Get PDF
    The goal of this thesis is to improve the prediction of genotype to phenotypeassociations with a focus on metabolic phenotypes of prokaryotes. This goal isachieved through data integration, which in turn required the development ofsupporting solutions based on semantic web technologies. Chapter 1 providesan introduction to the challenges associated to data integration. Semantic webtechnologies provide solutions to some of these challenges and the basics ofthese technologies are explained in the Introduction. Furthermore, the ba-sics of constraint based metabolic modeling and construction of genome scalemodels (GEM) are also provided. The chapters in the thesis are separated inthree related topics: chapters 2, 3 and 4 focus on data integration based onheterogeneous networks and their application to the human pathogen M. tu-berculosis; chapters 5, 6, 7, 8 and 9 focus on the semantic web based solutionsto genome annotation and applications thereof; and chapter 10 focus on thefinal goal to associate genotypes to phenotypes using GEMs. Chapter 2 provides the prototype of a workflow to efficiently analyze in-formation generated by different inference and prediction methods. This me-thod relies on providing the user the means to simultaneously visualize andanalyze the coexisting networks generated by different algorithms, heteroge-neous data sets, and a suite of analysis tools. As a show case, we have ana-lyzed the gene co-expression networks of M. tuberculosis generated using over600 expression experiments. Hereby we gained new knowledge about theregulation of the DNA repair, dormancy, iron uptake and zinc uptake sys-tems. Furthermore, it enabled us to develop a pipeline to integrate ChIP-seqdat and a tool to uncover multiple regulatory layers. In chapter 3 the prototype presented in chapter 2 is further developedinto the Synchronous Network Data Integration (SyNDI) framework, whichis based on Cytoscape and Galaxy. The functionality and usability of theframework is highlighted with three biological examples. We analyzed thedistinct connectivity of plasma metabolites in networks associated with highor low latent cardiovascular disease risk. We obtained deeper insights froma few similar inflammatory response pathways in Staphylococcus aureus infec-tion common to human and mouse. We identified not yet reported regulatorymotifs associated with transcriptional adaptations of M. tuberculosis.In chapter 4 we present a review providing a systems level overview ofthe molecular and cellular components involved in divalent metal homeosta-sis and their role in regulating the three main virulence strategies of M. tu-berculosis: immune modulation, dormancy and phagosome escape. With theuse of the tools presented in chapter 2 and 3 we identified a single regulatorycascade for these three virulence strategies that respond to limited availabilityof divalent metals in the phagosome. The tools presented in chapter 2 and 3 achieve data integration throughthe use of multiple similarity, coexistence, coexpression and interaction geneand protein networks. However, the presented tools cannot store additional(genome) annotations. Therefore, we applied semantic web technologies tostore and integrate heterogeneous annotation data sets. An increasing num-ber of widely used biological resources are already available in the RDF datamodel. There are however, no tools available that provide structural overviewsof these resources. Such structural overviews are essential to efficiently querythese resources and to assess their structural integrity and design. There-fore, in chapter 5, I present RDF2Graph, a tool that automatically recoversthe structure of an RDF resource. The generated overview enables users tocreate complex queries on these resources and to structurally validate newlycreated resources. Direct functional comparison support genotype to phenotype predictions.A prerequisite for a direct functional comparison is consistent annotation ofthe genetic elements with evidence statements. However, the standard struc-tured formats used by the public sequence databases to present genome an-notations provide limited support for data mining, hampering comparativeanalyses at large scale. To enable interoperability of genome annotations fordata mining application, we have developed the Genome Biology OntologyLanguage (GBOL) and associated infrastructure (GBOL stack), which is pre-sented in chapter 6. GBOL is provenance aware and thus provides a consistentrepresentation of functional genome annotations linked to the provenance.The provenance of a genome annotation describes the contextual details andderivation history of the process that resulted in the annotation. GBOL is mod-ular in design, extensible and linked to existing ontologies. The GBOL stackof supporting tools enforces consistency within and between the GBOL defi-nitions in the ontology. Based on GBOL, we developed the genome annotation pipeline SAPP (Se-mantic Annotation Platform with Provenance) presented in chapter 7. SAPPautomatically predicts, tracks and stores structural and functional annotationsand associated dataset- and element-wise provenance in a Linked Data for-mat, thereby enabling information mining and retrieval with Semantic Webtechnologies. This greatly reduces the administrative burden of handling mul-tiple analysis tools and versions thereof and facilitates multi-level large scalecomparative analysis. In turn this can be used to make genotype to phenotypepredictions. The development of GBOL and SAPP was done simultaneously. Duringthe development we realized that we had to constantly validated the data ex-ported to RDF to ensure coherence with the ontology. This was an extremelytime consuming process and prone to error, therefore we developed the Em-pusa code generator. Empusa is presented in chapter 8. SAPP has been successfully used to annotate 432 sequenced Pseudomonas strains and integrate the resulting annotation in a large scale functional com-parison using protein domains. This comparison is presented in chapter 9.Additionally, data from six metabolic models, nearly a thousand transcrip-tome measurements and four large scale transposon mutagenesis experimentswere integrated with the genome annotations. In this way, we linked gene es-sentiality, persistence and expression variability. This gave us insight into thediversity, versatility and evolutionary history of the Pseudomonas genus, whichcontains some important pathogens as well some useful species for bioengi-neering and bioremediation purposes. Genome annotation can be used to create GEM, which can be used to betterlink genotypes to phenotypes. Bio-Growmatch, presented in chapter 10, istool that can automatically suggest modification to improve a GEM based onphenotype data. Thereby integrating growth data into the complete processof modelling the metabolism of an organism. Chapter 11 presents a general discussion on how the chapters contributedthe central goal. After which I discuss provenance requirements for data reuseand integration. I further discuss how this can be used to further improveknowledge generation. The acquired knowledge could, in turn, be used to de-sign new experiments. The principles of the dry-lab cycle and how semantictechnologies can contribute to establish these cycles are discussed in chapter11. Finally a discussion is presented on how to apply these principles to im-prove the creation and usability of GEM’s.</p

    Three Essays on Enhancing Clinical Trial Subject Recruitment Using Natural Language Processing and Text Mining

    Get PDF
    Patient recruitment and enrollment are critical factors for a successful clinical trial; however, recruitment tends to be the most common problem in most clinical trials. The success of a clinical trial depends on efficiently recruiting suitable patients to conduct the trial. Every clinical trial research has a protocol, which describes what will be done in the study and how it will be conducted. Also, the protocol ensures the safety of the trial subjects and the integrity of the data collected. The eligibility criteria section of clinical trial protocols is important because it specifies the necessary conditions that participants have to satisfy. Since clinical trial eligibility criteria are usually written in free text form, they are not computer interpretable. To automate the analysis of the eligibility criteria, it is therefore necessary to transform those criteria into a computer-interpretable format. Unstructured format of eligibility criteria additionally create search efficiency issues. Thus, searching and selecting appropriate clinical trials for a patient from relatively large number of available trials is a complex task. A few attempts have been made to automate the matching process between patients and clinical trials. However, those attempts have not fully integrated the entire matching process and have not exploited the state-of-the-art Natural Language Processing (NLP) techniques that may improve the matching performance. Given the importance of patient recruitment in clinical trial research, the objective of this research is to automate the matching process using NLP and text mining techniques and, thereby, improve the efficiency and effectiveness of the recruitment process. This dissertation research, which comprises three essays, investigates the issues of clinical trial subject recruitment using state-of-the-art NLP and text mining techniques. Essay 1: Building a Domain-Specific Lexicon for Clinical Trial Subject Eligibility Analysis Essay 2: Clustering Clinical Trials Using Semantic-Based Feature Expansion Essay 3: An Automatic Matching Process of Clinical Trial Subject Recruitment In essay1, I develop a domain-specific lexicon for n-gram Named Entity Recognition (NER) in the breast cancer domain. The domain-specific dictionary is used for selection and reduction of n-gram features in clustering in eassy2. The domain-specific dictionary was evaluated by comparing it with Systematized Nomenclature of Medicine--Clinical Terms (SNOMED CT). The results showed that it add significant number of new terms which is very useful in effective natural language processing In essay 2, I explore the clustering of similar clinical trials using the domain-specific lexicon and term expansion using synonym from the Unified Medical Language System (UMLS). I generate word n-gram features and modify the features with the domain-specific dictionary matching process. In order to resolve semantic ambiguity, a semantic-based feature expansion technique using UMLS is applied. A hierarchical agglomerative clustering algorithm is used to generate clinical trial clusters. The focus is on summarization of clinical trial information in order to enhance trial search efficiency. Finally, in essay 3, I investigate an automatic matching process of clinical trial clusters and patient medical records. The patient records collected from a prior study were used to test our approach. The patient records were pre-processed by tokenization and lemmatization. The pre-processed patient information were then further enhanced by matching with breast cancer custom dictionary described in essay 1 and semantic feature expansion using UMLS Metathesaurus. Finally, I matched the patient record with clinical trial clusters to select the best matched cluster(s) and then with trials within the clusters. The matching results were evaluated by internal expert as well as external medical expert

    Minimal Peroxide Exposure of Neuronal Cells Induces Multifaceted Adaptive Responses

    Get PDF
    Oxidative exposure of cells occurs naturally and may be associated with cellular damage and dysfunction. Protracted low level oxidative exposure can induce accumulated cell disruption, affecting multiple cellular functions. Accumulated oxidative exposure has also been proposed as one of the potential hallmarks of the physiological/pathophysiological aging process. We investigated the multifactorial effects of long-term minimal peroxide exposure upon SH-SY5Y neural cells to understand how they respond to the continued presence of oxidative stressors. We show that minimal protracted oxidative stresses induce complex molecular and physiological alterations in cell functionality. Upon chronic exposure to minimal doses of hydrogen peroxide, SH-SY5Y cells displayed a multifactorial response to the stressor. To fully appreciate the peroxide-mediated cellular effects, we assessed these adaptive effects at the genomic, proteomic and cellular signal processing level. Combined analyses of these multiple levels of investigation revealed a complex cellular adaptive response to the protracted peroxide exposure. This adaptive response involved changes in cytoskeletal structure, energy metabolic shifts towards glycolysis and selective alterations in transmembrane receptor activity. Our analyses of the global responses to chronic stressor exposure, at multiple biological levels, revealed a viable neural phenotype in-part reminiscent of aged or damaged neural tissue. Our paradigm indicates how cellular physiology can subtly change in different contexts and potentially aid the appreciation of stress response adaptations
    corecore