38 research outputs found

    CROssBAR: comprehensive resource of biomedical relations with knowledge graph representations

    Get PDF
    Systemic analysis of available large-scale biological/biomedical data is critical for studying biological mechanisms, and developing novel and effective treatment approaches against diseases. However, different layers of the available data are produced using different technologies and scattered across individual computational resources without any explicit connections to each other, which hinders extensive and integrative multi-omics-based analysis. We aimed to address this issue by developing a new data integration/representation methodology and its application by constructing a biological data resource. CROssBAR is a comprehensive system that integrates large-scale biological/biomedical data from various resources and stores them in a NoSQL database. CROssBAR is enriched with the deep-learning-based prediction of relationships between numerous data entries, which is followed by the rigorous analysis of the enriched data to obtain biologically meaningful modules. These complex sets of entities and relationships are displayed to users via easy-tointerpret, interactive knowledge graphs within an open-access service. CROssBAR knowledge graphs incorporate relevant genes-proteins, molecular interactions, pathways, phenotypes, diseases, as well as known/predicted drugs and bioactive compounds, and they are constructed on-the-fly based on simple non-programmatic user queries. These intensely processed heterogeneous networks are expected to aid systems-level research, especially to infer biological mechanisms in relation to genes, proteins, their ligands, and diseases

    How exome sequencing is shedding light on the complexity of Mendelian disorders: some examples from Sardinia

    Get PDF
    The total number of Mendelian disorders is estimated to be around 7,000 and while each is individually rare, together, these genetic conditions contribute significantly to morbidity, mortality, and healthcare costs. In the last decade there has been a paradigm shift in their investigation due to the development of powerful new DNA sequencing technologies, such as whole exome sequencing. Although our knowledge of the diversity of Mendelian phenotypes is progressively increasing, substantial gaps remain. Up to 50% of patients affected by a rare genetic disorder never receive a diagnosis. We focused our attention on such Mendelian disorders and in a collaborative effort we studied by WES a cohort of heterogeneous samples affected by Crisponi/Cold-induced sweating syndrome-like, syndromic Intellectual Disabilities and Epileptic Encephalopathies. The results of our work along with others reported in the literature, are contributing to reveal the extensive clinical variability and genetic complexity underlying Mendelian phenotypes and inheritance, to provide insight into study design and approach and analytical strategies and to identify novel mechanisms. Our increasing knowledge on the genetic basis of rare disorders is shedding light on the “complex” nature of the “simple” Mendelian disorders and that “true monogenic” disorders are very rare, underscoring the current challenges of clinical diagnostics and discovery

    Systems approaches to drug repositioning

    Get PDF
    PhD ThesisDrug discovery has overall become less fruitful and more costly, despite vastly increased biomedical knowledge and evolving approaches to Research and Development (R&D). One complementary approach to drug discovery is that of drug repositioning which focusses on identifying novel uses for existing drugs. By focussing on existing drugs that have already reached the market, drug repositioning has the potential to both reduce the timeframe and cost of getting a disease treatment to those that need it. Many marketed examples of repositioned drugs have been found via serendipitous or rational observations, highlighting the need for more systematic methodologies. Systems approaches have the potential to enable the development of novel methods to understand the action of therapeutic compounds, but require an integrative approach to biological data. Integrated networks can facilitate systems-level analyses by combining multiple sources of evidence to provide a rich description of drugs, their targets and their interactions. Classically, such networks can be mined manually where a skilled person can identify portions of the graph that are indicative of relationships between drugs and highlight possible repositioning opportunities. However, this approach is not scalable. Automated procedures are required to mine integrated networks systematically for these subgraphs and bring them to the attention of the user. The aim of this project was the development of novel computational methods to identify new therapeutic uses for existing drugs (with particular focus on active small molecules) using data integration. A framework for integrating disparate data relevant to drug repositioning, Drug Repositioning Network Integration Framework (DReNInF) was developed as part of this work. This framework includes a high-level ontology, Drug Repositioning Network Integration Ontology (DReNInO), to aid integration and subsequent mining; a suite of parsers; and a generic semantic graph integration platform. This framework enables the production of integrated networks maintaining strict semantics that are important in, but not exclusive to, drug repositioning. The DReNInF is then used to create Drug Repositioning Network Integration (DReNIn), a semantically-rich Resource Description Framework (RDF) dataset. A Web-based front end was developed, which includes a SPARQL Protocol and RDF Query Language (SPARQL) endpoint for querying this dataset. To automate the mining of drug repositioning datasets, a formal framework for the definition of semantic subgraphs was established and a method for Drug Repositioning Semantic Mining (DReSMin) was developed. DReSMin is an algorithm for mining semantically-rich networks for occurrences of a given semantic subgraph. This algorithm allows instances of complex semantic subgraphs that contain data about putative drug repositioning opportunities to be identified in a computationally tractable fashion, scaling close to linearly with network data. The ability of DReSMin to identify novel Drug-Target (D-T) associations was investigated. 9,643,061 putative D-T interactions were identified and ranked, with a strong correlation between highly scored associations and those supported by literature observed. The 20 top ranked associations were analysed in more detail with 14 found to be novel and six found to be supported by the literature. It was also shown that this approach better prioritises known D-T interactions, than other state-of-the-art methodologies. The ability of DReSMin to identify novel Drug-Disease (Dr-D) indications was also investigated. As target-based approaches are utilised heavily in the field of drug discovery, it is necessary to have a systematic method to rank Gene-Disease (G-D) associations. Although methods already exist to collect, integrate and score these associations, these scores are often not a reliable re flection of expert knowledge. Therefore, an integrated data-driven approach to drug repositioning was developed using a Bayesian statistics approach and applied to rank 309,885 G-D associations using existing knowledge. Ranked associations were then integrated with other biological data to produce a semantically-rich drug discovery network. Using this network it was shown that diseases of the central nervous system (CNS) provide an area of interest. The network was then systematically mined for semantic subgraphs that capture novel Dr-D relations. 275,934 Dr-D associations were identified and ranked, with those more likely to be side-effects filtered. Work presented here includes novel tools and algorithms to enable research within the field of drug repositioning. DReNIn, for example, includes data that previous comparable datasets relevant to drug repositioning have neglected, such as clinical trial data and drug indications. Furthermore, the dataset may be easily extended using DReNInF to include future data as and when it becomes available, such as G-D association directionality (i.e. is the mutation a loss-of-function or gain-of-function). Unlike other algorithms and approaches developed for drug repositioning, DReSMin can be used to infer any types of associations captured in the target semantic network. Moreover, the approaches presented here should be more generically applicable to other fields that require algorithms for the integration and mining of semantically rich networks.European and Physical Sciences Research Council (EPSRC) and GS

    Integrative bioinformatics and graph-based methods for predicting adverse effects of developmental drugs

    Get PDF
    Adverse drug effects are complex phenomena that involve the interplay between drug molecules and their protein targets at various levels of biological organisation, from molecular to organismal. Many factors are known to contribute toward the safety profile of a drug, including the chemical properties of the drug molecule itself, the biological properties of drug targets and other proteins that are involved in pharmacodynamics and pharmacokinetics aspects of drug action, and the characteristics of the intended patient population. A multitude of scattered publicly available resources exist that cover these important aspects of drug activity. These include manually curated biological databases, high-throughput experimental results from gene expression and human genetics resources as well as drug labels and registered clinical trial records. This thesis proposes an integrated analysis of these disparate sources of information to help bridge the gap between the molecular and the clinical aspects of drug action. For example, to address the commonly held assumption that narrowly expressed proteins make safer drug targets, an integrative data-driven analysis was conducted to systematically investigate the relationship between the tissue expression profile of drug targets and the organs affected by clinically observed adverse drug reactions. Similarly, human genetics data were used extensively throughout the thesis to compare adverse symptoms induced by drug molecules with the phenotypes associated with the genes encoding their target proteins. One of the main outcomes of this thesis was the generation of a large knowledge graph, which incorporates diverse molecular and phenotypic data in a structured network format. To leverage the integrated information, two graph-based machine learning methods were developed to predict a wide range of adverse drug effects caused by approved and developmental therapies

    In Silico Toxicology Data Resources to Support Read-Across and (Q)SAR

    Get PDF
    A plethora of databases exist online that can assist in in silico chemical or drug safety assessment. However, a systematic review and grouping of databases, based on purpose and information content, consolidated in a single source has been lacking. To resolve this issue, this review provides a comprehensive listing of the key in silico data resources relevant to: chemical identity and properties, drug action, toxicology (including nano-material toxicity), exposure, omics, pathways, Absorption, Distribution, Metabolism and Elimination (ADME) properties, clinical trials, pharmacovigilance, patents-related databases, biological (genes, enzymes, proteins, other macromolecules etc.) databases, protein-protein interactions (PPIs), environmental exposure related, and finally databases relating to animal alternatives in support of 3Rs policies. More than nine hundred databases were identified and reviewed against criteria relating to accessibility, data coverage, interoperability or application programming interface (API), appropriate identifiers, types of in vitro-in vivo -clinical data recorded and suitability for modelling, read-across or similarity searching. This review also specifically addresses the need for solutions for mapping and integration of databases into a common platform for better translatability of preclinical data to clinical data

    The Healthgrid White Paper

    Get PDF

    DEVELOPMENT OF A COMPUTATIONAL RESOURCE FOR PERSONALIZED DIETARY RECOMMENDATIONS

    Get PDF
    There is a global increase in the incidence of non-communicable diseases associated with unhealthy food intakes. Conditions such as diabetes, heart disease, high blood pressure, and strokes represent a high societal impact and an economic burden for health-care systems around the world. To understand these diseases, one needs to account the several factors that influence how the human body processes food, some of which are determined by the genome and patterns of gene expression that translate to the ability - or lack of - to degrade and absorb certain nutrients. Other factors, like the gut microbiota, are more volatile because its composition is highly moldable by diet and lifestyle. Multi-omics technologies can support the comprehensive collection of dietary intake data and monitoring of the health status of individuals. Also, a correct analysis of this data could lead to new insights about the complex processes involved in the digestion of dietary components and their involvement in the prevention or the appearance of health problems, but its integration and interpretation are still problematic. Thus, in this thesis, we propose the utilization of Constraint-Based Reconstruction and Analysis (COBRA) methods as a framework for the integration of this complex data. To achieve this goal, we have created a knowledge-base, the Virtual Metabolic Human (VMH), that combines information from large-scale models of metabolism from the human organism and typical gut microbes, with food composition information, and a disease compendium. VMH’s unique combination of resources leverages the exploration of metabolic pathways from different organisms, the inclusion of dietary information into in-silico experiments through its own diet designer tool, visualization and analysis of experimental and simulation data, and exploring disease mechanisms and potential treatment strategies. VMH is a step forward in providing the necessary tools to investigate the mechanisms behind the influence of diet in health and disease. Tools such as the diet designer can be used as a basis for diet optimization by predicting combinations of foods that can contribute to specific metabolic outcomes, which has the potential to be integrated and translated into treatment development and dietary recommendations in the foreseeable future

    Characterization of \u3ci\u3eMEF2C\u3c/i\u3e-Related Disorders: Genotype, Phenotype, and Gene Pathway Dysregulation

    Get PDF
    MEF2C­-related disorders are characterized by intellectual disability, developmental delay, lack of speech, seizures, stereotypic movements, hypotonia, and brain abnormalities and are caused by pathogenic alterations involving the MEF2C gene. Despite published cases, MEF2C­-related disorders are difficult to recognize clinically. These studies sought to further characterize MEF2C­-related disorders by investigating the genotypes, phenotypes, and gene functions (or dysfunctions) associated with the disorder. Tremors have been reported in some patients with MEF2C­-related disorders, but the concept of tremors has been complicated by vague definitions and numerous categorization methods. We performed a concept analysis following the Walker and Avant method to clarify the concept and develop an operational definition of tremors. We concluded that tremors are a movement disorder characterized by shaking motions that are involuntary, oscillatory, rhythmic, non-painful, always present although vary in severity, and can be repressed by changing posture or going into a rest position. We then performed a systematic literature review to record the genotypes and comprehensive phenotype of MEF2C­-related disorders reported in the literature. Forty-three articles characterizing 117 patients met the inclusion criteria. Common features included intellectual disability, developmental delay, seizures, hypotonia, absent speech, inability to walk, stereotypic movements, and MRI abnormalities. Nonclassical findings included question mark ear, jugular pit, and a unique neuroendocrine finding. Next, we developed a survey based on validated instruments to gather developmental and clinical information from the parents of children with MEF2C-related disorders. Seventy-three parents completed the survey. Limited speech, seizures, bruxism, repetitive movements, and high pain tolerance were some of the prominent features identified from the survey data. Statistical analyses showed that patients with MEF2C variants were similarly affected as patients with deletions and females showed higher verbal abilities. This natural history study details phenotypic and developmental information of the largest single cohort reported to date. Lastly, we discussed current techniques used to investigate the mouse Mef2c gene expression and regulation in the brain. Previous unbiased RNA sequencing of whole cortex from Mef2c global heterozygous mice showed hundreds of dysregulated genes, particularly autism risk genes and microglial genes. The Cowan lab is currently performing single nuclei RNA sequencing (snRNAseq) to better understand the role of Mef2c in neurons and microglia. Techniques used include nuclei dissociation, fluorescence-activated cell sorting, library preparation and sequencing, and bioinformatic analysis of the snRNAseq data. Additional research techniques include perfusion fixation, brain extraction and slicing, and immunohistochemistry. These studies characterize the phenotype and document the severity of the disorder. The information reported will help providers diagnose and care for patients with MEF2C-related disorders. Additionally, the systematic review and survey data can be useful for further genotype-phenotype correlations, as baseline data for treatment trials, and to develop future studies

    Translational software infrastructure for medical genetics

    Get PDF
    Diep in de kern van onze cellen zetelt het desoxyribonucleïnezuur (DNA) molecuul die bekend staat als het genoom.DNA codeert de informatie die het leven laat groeien, overleven, diversifiëren en evolueren.Helaas kunnen dezelfde mechanismes die ons laten aanpassen aan een veranderende omgeving ook genetische aandoeningen veroorzaken.Hoewel we in staat zijn een aantal van deze aandoeningen op te sporen door moderne technologische vorderingen, moet er nog veel ontdekt en begrepen worden.Dit proefschrift draagt software infrastructuur aan om de moleculaire oorzaak van genetische aandoeningen te onderzoeken, laat zien hoe nieuwe bevindingen vertaald worden van fundamenteel onderzoek naar nieuwe software voor genoom diagnostiek, en introduceert een raamwerk voor genetische analyses die de automatisering en validatie van nieuwe software ondersteunt voor toepassing in de patientenzorg.Eerst ontwikkelen we datamodellen en software die helpt te bepalen welke gebieden op het genoom verantwoordelijk zijn voor ziektes en andere fysieke kenmerken.Vervolgens trekken we deze principes door naar modelorganismen.Door moleculaire gelijkenissen te gebruiken, ontdekken we nieuwe manieren om nematodes in te zetten voor onderzoek naar menselijke ziektes.Daarnaast kunnen we onze kennis van het genoom en de evolutie gebruiken om te voorspellen hoe pathogeen nieuwe mutaties zijn.Het resultaat is een publieke website waar DNA snel en accuraat gescand kan worden op mogelijk ziekteverwekkende mutaties.Tenslotte presenteren we een compleet systeem voor geautomatiseerde DNA analyse, inclusief een protocol specifiek voor genoom diagnostiek om overzichtelijke patient rapportages te produceren voor medisch experts waarmee een diagnose sneller en makkelijker gesteld kan worden.Deep inside the core of our cells resides the deoxyribonucleic acid (DNA) molecule known as the genome.DNA encodes the information that allows life to grow, survive, diversify and evolve.Unfortunately, the same mechanisms that let us adapt to a changing environment can also cause genetic disorders.While we are able to diagnose a number of these disorders using modern technological advancements, much remains to be discovered and understood.This thesis presents software infrastructure for investigating the molecular etiology of genetic disease using data from model organisms, demonstrates how to translate findings from fundamental research into new software tools for genome diagnostics, and introduces a downstream genome analysis framework that assists the automation and validation of the latest tools for applied patient care.We first develop data models and software to help determine which region of the genome is responsible for diseases and other physical traits.We then extend these principles towards model organisms.By using molecular similarities, we discover new ways to use nematodes for research into human diseases.Additionally, we can use our knowledge of the genome and evolution to predict how pathogenic new mutations are.The result is a public website where DNA can be scanned quickly and accurately for probable pathogenic mutations.Finally, we present a complete system for automated DNA analysis, including a protocol specific for genome diagnostics to produce clear patient reports for medical experts with which a diagnosis is made faster and easier
    corecore