38 research outputs found
CROssBAR: comprehensive resource of biomedical relations with knowledge graph representations
Systemic analysis of available large-scale biological/biomedical data is critical for studying biological mechanisms, and developing novel and effective treatment approaches against diseases. However, different layers of the available data are produced using different technologies and scattered across individual computational resources without any explicit connections to each other, which hinders extensive and integrative multi-omics-based analysis. We aimed to address this issue by developing a new data integration/representation methodology and its application by constructing a biological data resource. CROssBAR is a comprehensive system that integrates large-scale biological/biomedical data from various resources and stores them in a NoSQL database. CROssBAR is enriched with the deep-learning-based prediction of relationships between numerous data entries, which is followed by the rigorous analysis of the enriched data to obtain biologically meaningful modules. These complex sets of entities and relationships are displayed to users via easy-tointerpret, interactive knowledge graphs within an open-access service. CROssBAR knowledge graphs incorporate relevant genes-proteins, molecular interactions, pathways, phenotypes, diseases, as well as known/predicted drugs and bioactive compounds, and they are constructed on-the-fly based on simple non-programmatic user queries. These intensely processed heterogeneous networks are expected to aid systems-level research, especially to infer biological mechanisms in relation to genes, proteins, their ligands, and diseases
How exome sequencing is shedding light on the complexity of Mendelian disorders: some examples from Sardinia
The total number of Mendelian disorders is estimated to be around 7,000 and while each is individually rare, together, these genetic conditions contribute significantly to morbidity, mortality, and healthcare costs.
In the last decade there has been a paradigm shift in their investigation due to the development of powerful new DNA sequencing technologies, such as whole exome sequencing. Although our knowledge of the diversity of Mendelian phenotypes is progressively increasing, substantial gaps remain. Up to 50% of patients affected by a rare genetic disorder never receive a diagnosis.
We focused our attention on such Mendelian disorders and in a collaborative effort we studied by WES a cohort of heterogeneous samples affected by Crisponi/Cold-induced sweating syndrome-like, syndromic Intellectual Disabilities and Epileptic Encephalopathies.
The results of our work along with others reported in the literature, are contributing to reveal the extensive clinical variability and genetic complexity underlying Mendelian phenotypes and inheritance, to provide insight into study design and approach and analytical strategies and to identify novel mechanisms.
Our increasing knowledge on the genetic basis of rare disorders is shedding light on the “complex” nature of the “simple” Mendelian disorders and that “true monogenic” disorders are very rare, underscoring the current challenges of clinical diagnostics and discovery
Systems approaches to drug repositioning
PhD ThesisDrug discovery has overall become less fruitful and more costly, despite vastly increased
biomedical knowledge and evolving approaches to Research and Development (R&D).
One complementary approach to drug discovery is that of drug repositioning which
focusses on identifying novel uses for existing drugs. By focussing on existing drugs
that have already reached the market, drug repositioning has the potential to both
reduce the timeframe and cost of getting a disease treatment to those that need it.
Many marketed examples of repositioned drugs have been found via serendipitous or
rational observations, highlighting the need for more systematic methodologies.
Systems approaches have the potential to enable the development of novel methods to
understand the action of therapeutic compounds, but require an integrative approach
to biological data. Integrated networks can facilitate systems-level analyses by combining
multiple sources of evidence to provide a rich description of drugs, their targets and
their interactions. Classically, such networks can be mined manually where a skilled
person can identify portions of the graph that are indicative of relationships between
drugs and highlight possible repositioning opportunities. However, this approach is
not scalable. Automated procedures are required to mine integrated networks systematically
for these subgraphs and bring them to the attention of the user. The aim
of this project was the development of novel computational methods to identify new
therapeutic uses for existing drugs (with particular focus on active small molecules)
using data integration.
A framework for integrating disparate data relevant to drug repositioning, Drug Repositioning
Network Integration Framework (DReNInF) was developed as part of this
work. This framework includes a high-level ontology, Drug Repositioning Network
Integration Ontology (DReNInO), to aid integration and subsequent mining; a suite
of parsers; and a generic semantic graph integration platform. This framework enables
the production of integrated networks maintaining strict semantics that are important
in, but not exclusive to, drug repositioning. The DReNInF is then used to create Drug Repositioning Network Integration (DReNIn), a semantically-rich Resource Description
Framework (RDF) dataset. A Web-based front end was developed, which includes
a SPARQL Protocol and RDF Query Language (SPARQL) endpoint for querying this
dataset.
To automate the mining of drug repositioning datasets, a formal framework for the
definition of semantic subgraphs was established and a method for Drug Repositioning
Semantic Mining (DReSMin) was developed. DReSMin is an algorithm for mining
semantically-rich networks for occurrences of a given semantic subgraph. This algorithm
allows instances of complex semantic subgraphs that contain data about putative
drug repositioning opportunities to be identified in a computationally tractable
fashion, scaling close to linearly with network data.
The ability of DReSMin to identify novel Drug-Target (D-T) associations was investigated.
9,643,061 putative D-T interactions were identified and ranked, with a strong
correlation between highly scored associations and those supported by literature observed.
The 20 top ranked associations were analysed in more detail with 14 found
to be novel and six found to be supported by the literature. It was also shown that
this approach better prioritises known D-T interactions, than other state-of-the-art
methodologies.
The ability of DReSMin to identify novel Drug-Disease (Dr-D) indications was also
investigated. As target-based approaches are utilised heavily in the field of drug discovery,
it is necessary to have a systematic method to rank Gene-Disease (G-D) associations.
Although methods already exist to collect, integrate and score these associations,
these scores are often not a reliable re
flection of expert knowledge. Therefore, an
integrated data-driven approach to drug repositioning was developed using a Bayesian
statistics approach and applied to rank 309,885 G-D associations using existing knowledge.
Ranked associations were then integrated with other biological data to produce
a semantically-rich drug discovery network. Using this network it was shown that
diseases of the central nervous system (CNS) provide an area of interest. The network
was then systematically mined for semantic subgraphs that capture novel Dr-D relations.
275,934 Dr-D associations were identified and ranked, with those more likely to
be side-effects filtered. Work presented here includes novel tools and algorithms to enable research within
the field of drug repositioning. DReNIn, for example, includes data that previous
comparable datasets relevant to drug repositioning have neglected, such as clinical
trial data and drug indications. Furthermore, the dataset may be easily extended
using DReNInF to include future data as and when it becomes available, such as G-D
association directionality (i.e. is the mutation a loss-of-function or gain-of-function).
Unlike other algorithms and approaches developed for drug repositioning, DReSMin
can be used to infer any types of associations captured in the target semantic network.
Moreover, the approaches presented here should be more generically applicable to
other fields that require algorithms for the integration and mining of semantically rich
networks.European and Physical Sciences Research Council (EPSRC) and GS
Integrative bioinformatics and graph-based methods for predicting adverse effects of developmental drugs
Adverse drug effects are complex phenomena that involve the interplay between drug molecules and their protein targets at various levels of biological organisation, from molecular to organismal. Many factors are known to contribute toward the safety profile of a drug, including the chemical properties of the drug molecule itself, the biological properties of drug targets and other proteins that are involved in pharmacodynamics and pharmacokinetics aspects of drug action, and the characteristics of the intended patient population. A multitude of scattered publicly available resources exist that cover these important aspects of drug activity. These include manually curated biological databases, high-throughput experimental results from gene expression and human genetics resources as well as drug labels and registered clinical trial records. This thesis proposes an integrated analysis of these disparate sources of information to help bridge the gap between the molecular and the clinical aspects of drug action. For example, to address the commonly held assumption that narrowly expressed proteins make safer drug targets, an integrative data-driven analysis was conducted to systematically investigate the relationship between the tissue expression profile of drug targets and the organs affected by clinically observed adverse drug reactions. Similarly, human genetics data were used extensively throughout the thesis to compare adverse symptoms induced by drug molecules with the phenotypes associated with the genes encoding their target proteins. One of the main outcomes of this thesis was the generation of a large knowledge graph, which incorporates diverse molecular and phenotypic data in a structured network format. To leverage the integrated information, two graph-based machine learning methods were developed to predict a wide range of adverse drug effects caused by approved and developmental therapies
In Silico Toxicology Data Resources to Support Read-Across and (Q)SAR
A plethora of databases exist online that can assist in in silico chemical or drug safety assessment. However, a systematic review and grouping of databases, based on purpose and information content, consolidated in a single source has been lacking. To resolve this issue, this review provides a comprehensive listing of the key in silico data resources relevant to: chemical identity and properties, drug action, toxicology (including nano-material toxicity), exposure, omics, pathways, Absorption, Distribution, Metabolism and Elimination (ADME) properties, clinical trials, pharmacovigilance, patents-related databases, biological (genes, enzymes, proteins, other macromolecules etc.) databases, protein-protein interactions (PPIs), environmental exposure related, and finally databases relating to animal alternatives in support of 3Rs policies. More than nine hundred databases were identified and reviewed against criteria relating to accessibility, data coverage, interoperability or application programming interface (API), appropriate identifiers, types of in vitro-in vivo -clinical data recorded and suitability for modelling, read-across or similarity searching. This review also specifically addresses the need for solutions for mapping and integration of databases into a common platform for better translatability of preclinical data to clinical data
DEVELOPMENT OF A COMPUTATIONAL RESOURCE FOR PERSONALIZED DIETARY RECOMMENDATIONS
There is a global increase in the incidence of non-communicable diseases associated with unhealthy food intakes. Conditions such as diabetes, heart disease, high blood pressure, and strokes represent a high societal impact and an economic burden for health-care systems around the world. To understand these diseases, one needs to account the several factors that influence how the human body processes food, some of which are determined by the genome and patterns of gene expression that translate to the ability - or lack of - to degrade and absorb certain nutrients. Other factors, like the gut microbiota, are more volatile because its composition is highly moldable by diet and lifestyle.
Multi-omics technologies can support the comprehensive collection of dietary intake data and monitoring of the health status of individuals. Also, a correct analysis of this data could lead to new insights about the complex processes involved in the digestion of dietary components and their involvement in the prevention or the appearance of health problems, but its integration and interpretation are still problematic.
Thus, in this thesis, we propose the utilization of Constraint-Based Reconstruction and Analysis (COBRA) methods as a framework for the integration of this complex data. To achieve this goal, we have created a knowledge-base, the Virtual Metabolic Human (VMH), that combines information from large-scale models of metabolism from the human organism and typical gut microbes, with food composition information, and a disease compendium.
VMH’s unique combination of resources leverages the exploration of metabolic pathways from different organisms, the inclusion of dietary information into in-silico experiments through its own diet designer tool, visualization and analysis of experimental and simulation data, and exploring disease mechanisms and potential treatment strategies. VMH is a step forward in providing the necessary tools to investigate the mechanisms behind the influence of diet in health and disease. Tools such as the diet designer can be used as a basis for diet optimization by predicting combinations of foods that can contribute to specific metabolic outcomes, which has the potential to be integrated and translated into treatment development and dietary recommendations in the foreseeable future
Characterization of \u3ci\u3eMEF2C\u3c/i\u3e-Related Disorders: Genotype, Phenotype, and Gene Pathway Dysregulation
MEF2CÂ-related disorders are characterized by intellectual disability, developmental delay, lack of speech, seizures, stereotypic movements, hypotonia, and brain abnormalities and are caused by pathogenic alterations involving the MEF2C gene. Despite published cases, MEF2CÂ-related disorders are difficult to recognize clinically. These studies sought to further characterize MEF2CÂ-related disorders by investigating the genotypes, phenotypes, and gene functions (or dysfunctions) associated with the disorder.
Tremors have been reported in some patients with MEF2CÂ-related disorders, but the concept of tremors has been complicated by vague definitions and numerous categorization methods. We performed a concept analysis following the Walker and Avant method to clarify the concept and develop an operational definition of tremors. We concluded that tremors are a movement disorder characterized by shaking motions that are involuntary, oscillatory, rhythmic, non-painful, always present although vary in severity, and can be repressed by changing posture or going into a rest position.
We then performed a systematic literature review to record the genotypes and comprehensive phenotype of MEF2CÂ-related disorders reported in the literature. Forty-three articles characterizing 117 patients met the inclusion criteria. Common features included intellectual disability, developmental delay, seizures, hypotonia, absent speech, inability to walk, stereotypic movements, and MRI abnormalities. Nonclassical findings included question mark ear, jugular pit, and a unique neuroendocrine finding.
Next, we developed a survey based on validated instruments to gather developmental and clinical information from the parents of children with MEF2C-related disorders. Seventy-three parents completed the survey. Limited speech, seizures, bruxism, repetitive movements, and high pain tolerance were some of the prominent features identified from the survey data. Statistical analyses showed that patients with MEF2C variants were similarly affected as patients with deletions and females showed higher verbal abilities. This natural history study details phenotypic and developmental information of the largest single cohort reported to date.
Lastly, we discussed current techniques used to investigate the mouse Mef2c gene expression and regulation in the brain. Previous unbiased RNA sequencing of whole cortex from Mef2c global heterozygous mice showed hundreds of dysregulated genes, particularly autism risk genes and microglial genes. The Cowan lab is currently performing single nuclei RNA sequencing (snRNAseq) to better understand the role of Mef2c in neurons and microglia. Techniques used include nuclei dissociation, fluorescence-activated cell sorting, library preparation and sequencing, and bioinformatic analysis of the snRNAseq data. Additional research techniques include perfusion fixation, brain extraction and slicing, and immunohistochemistry.
These studies characterize the phenotype and document the severity of the disorder. The information reported will help providers diagnose and care for patients with MEF2C-related disorders. Additionally, the systematic review and survey data can be useful for further genotype-phenotype correlations, as baseline data for treatment trials, and to develop future studies
Translational software infrastructure for medical genetics
Diep in de kern van onze cellen zetelt het desoxyribonucleïnezuur (DNA) molecuul die bekend staat als het genoom.DNA codeert de informatie die het leven laat groeien, overleven, diversifiëren en evolueren.Helaas kunnen dezelfde mechanismes die ons laten aanpassen aan een veranderende omgeving ook genetische aandoeningen veroorzaken.Hoewel we in staat zijn een aantal van deze aandoeningen op te sporen door moderne technologische vorderingen, moet er nog veel ontdekt en begrepen worden.Dit proefschrift draagt software infrastructuur aan om de moleculaire oorzaak van genetische aandoeningen te onderzoeken, laat zien hoe nieuwe bevindingen vertaald worden van fundamenteel onderzoek naar nieuwe software voor genoom diagnostiek, en introduceert een raamwerk voor genetische analyses die de automatisering en validatie van nieuwe software ondersteunt voor toepassing in de patientenzorg.Eerst ontwikkelen we datamodellen en software die helpt te bepalen welke gebieden op het genoom verantwoordelijk zijn voor ziektes en andere fysieke kenmerken.Vervolgens trekken we deze principes door naar modelorganismen.Door moleculaire gelijkenissen te gebruiken, ontdekken we nieuwe manieren om nematodes in te zetten voor onderzoek naar menselijke ziektes.Daarnaast kunnen we onze kennis van het genoom en de evolutie gebruiken om te voorspellen hoe pathogeen nieuwe mutaties zijn.Het resultaat is een publieke website waar DNA snel en accuraat gescand kan worden op mogelijk ziekteverwekkende mutaties.Tenslotte presenteren we een compleet systeem voor geautomatiseerde DNA analyse, inclusief een protocol specifiek voor genoom diagnostiek om overzichtelijke patient rapportages te produceren voor medisch experts waarmee een diagnose sneller en makkelijker gesteld kan worden.Deep inside the core of our cells resides the deoxyribonucleic acid (DNA) molecule known as the genome.DNA encodes the information that allows life to grow, survive, diversify and evolve.Unfortunately, the same mechanisms that let us adapt to a changing environment can also cause genetic disorders.While we are able to diagnose a number of these disorders using modern technological advancements, much remains to be discovered and understood.This thesis presents software infrastructure for investigating the molecular etiology of genetic disease using data from model organisms, demonstrates how to translate findings from fundamental research into new software tools for genome diagnostics, and introduces a downstream genome analysis framework that assists the automation and validation of the latest tools for applied patient care.We first develop data models and software to help determine which region of the genome is responsible for diseases and other physical traits.We then extend these principles towards model organisms.By using molecular similarities, we discover new ways to use nematodes for research into human diseases.Additionally, we can use our knowledge of the genome and evolution to predict how pathogenic new mutations are.The result is a public website where DNA can be scanned quickly and accurately for probable pathogenic mutations.Finally, we present a complete system for automated DNA analysis, including a protocol specific for genome diagnostics to produce clear patient reports for medical experts with which a diagnosis is made faster and easier