7 research outputs found

    ExprEssence - Revealing the essence of differential experimental data in the context of an interaction/regulation net-work

    Get PDF
    <p>Abstract</p> <p>Background</p> <p>Experimentalists are overwhelmed by high-throughput data and there is an urgent need to condense information into simple hypotheses. For example, large amounts of microarray and deep sequencing data are becoming available, describing a variety of experimental conditions such as gene knockout and knockdown, the effect of interventions, and the differences between tissues and cell lines.</p> <p>Results</p> <p>To address this challenge, we developed a method, implemented as a Cytoscape plugin called <it>ExprEssence</it>. As input we take a network of interaction, stimulation and/or inhibition links between genes/proteins, and differential data, such as gene expression data, tracking an intervention or development in time. We condense the network, highlighting those links across which the largest changes can be observed. Highlighting is based on a simple formula inspired by the law of mass action. We can interactively modify the threshold for highlighting and instantaneously visualize results. We applied <it>ExprEssence </it>to three scenarios describing kidney podocyte biology, pluripotency and ageing: 1) We identify putative processes involved in podocyte (de-)differentiation and validate one prediction experimentally. 2) We predict and validate the expression level of a transcription factor involved in pluripotency. 3) Finally, we generate plausible hypotheses on the role of apoptosis, cell cycle deregulation and DNA repair in ageing data obtained from the hippocampus.</p> <p>Conclusion</p> <p>Reducing the size of gene/protein networks to the few links affected by large changes allows to screen for putative mechanistic relationships among the genes/proteins that are involved in adaptation to different experimental conditions, yielding important hypotheses, insights and suggestions for new experiments. We note that we do not focus on the identification of 'active subnetworks'. Instead we focus on the identification of single links (which may or may not form subnetworks), and these single links are much easier to validate experimentally than submodules. <it>ExprEssence </it>is available at <url>http://sourceforge.net/projects/expressence/</url>.</p

    Drug repurposing prediction for COVID-19 using probabilistic networks and crowdsourced curation

    Get PDF
    Severe acute respiratory syndrome coronavirus two (SARS-CoV-2), the virus responsible for the coronavirus disease 2019 (COVID-19) pandemic, represents an unprecedented global health challenge. Consequently, a large amount of research into the disease pathogenesis and potential treatments has been carried out in a short time frame. However, developing novel drugs is a costly and lengthy process, and is unlikely to deliver a timely treatment for the pandemic. Drug repurposing, by contrast, provides an attractive alternative, as existing drugs have already undergone many of the regulatory requirements. In this work we used a combination of network algorithms and human curation to search integrated knowledge graphs, identifying drug repurposing opportunities for COVID-19. We demonstrate the value of this approach, reporting on eight potential repurposing opportunities identified, and discuss how this approach could be incorporated into future studies

    Knowledge extraction from biomedical data using machine learning

    Get PDF
    PhD ThesisThanks to the breakthroughs in biotechnologies that have occurred during the recent years, biomedical data is accumulating at a previously unseen pace. In the field of biomedicine, decades-old statistical methods are still commonly used to analyse such data. However, the simplicity of these approaches often limits the amount of useful information that can be extracted from the data. Machine learning methods represent an important alternative due to their ability to capture complex patterns, within the data, likely missed by simpler methods. This thesis focuses on the extraction of useful knowledge from biomedical data using machine learning. Within the biomedical context, the vast majority of machine learning applications focus their e↵ort on the generation and validation of prediction models. Rarely the inferred models are used to discover meaningful biomedical knowledge. The work presented in this thesis goes beyond this scenario and devises new methodologies to mine machine learning models for the extraction of useful knowledge. The thesis targets two important and challenging biomedical analytic tasks: (1) the inference of biological networks and (2) the discovery of biomarkers. The first task aims to identify associations between di↵erent biological entities, while the second one tries to discover sets of variables that are relevant for specific biomedical conditions. Successful solutions for both problems rely on the ability to recognise complex interactions within the data, hence the use of multivariate machine learning methods. The network inference problem is addressed with FuNeL: a protocol to generate networks based on the analysis of rule-based machine learning models. The second task, the biomarker discovery, is studied with RGIFE, a heuristic that exploits the information extracted from machine learning models to guide its search for minimal subsets of variables. The extensive analysis conducted for this dissertation shows that the networks inferred with FuNeL capture relevant knowledge complementary to that extracted by standard inference methods. Furthermore, the associations defined by FuNeL are discovered - 6 - more pertinent in a disease context. The biomarkers selected by RGIFE are found to be disease-relevant and to have a high predictive power. When applied to osteoarthritis data, RGIFE confirmed the importance of previously identified biomarkers, whilst also extracting novel biomarkers with possible future clinical applications. Overall, the thesis shows new e↵ective methods to leverage the information, often remaining buried, encapsulated within machine learning models and discover useful biomedical knowledge.European Union Seventh Framework Programme (FP7/2007- 2013) that funded part of this work under the “D-BOARD” project (grant agreement number 305815)

    Comparative genomics for studying the proteomes of mucosal microorganisms

    Get PDF
    A tremendous number of microorganisms are known to interact with their animal hosts. The outcome of the interactions between microbes and their animal hosts range from modulating the maintenance of homeostasis to the establishment of processes leading to pathogenesis. Of the numerous species known to inhabit humans, the great majority live on mucosal surfaces which are highly defended. Despite their importance in human health, little is known about the molecular and cellular basis of most host-microbe interactions across the tremendous diversity of mucosal-adapted microorganisms. The ever-increasing availability of genome sequence data allows systematic comparative genomics studies to identify proteins with potential important molecular functions at the host-microbe interface. In this study, a genome-wide analysis was performed on 3,021,490 protein sequences derived from 867 complete microbial genome sequences across the three domains of cellular life. The ability of microbes to thrive successfully in a mucosal environment was examined in relation to functional genomics data from a range of publicly available databases. Particular emphasis was placed on the extracytoplasmic proteins of microorganisms that thrive on human mucosal surfaces. These proteins form the interface between the complex host-microbe and microbe-microbe interactions. The large amounts of data involved, combined with the numerous analytical techniques that need to be performed makes the study intractable with conventional bioinformatics. The lack of habitat annotations for microorganisms further compounds the problem of identifying the microbial extracytoplasmic proteins playing important roles in the mucosal environments. In order to address these problems, a distributed high throughput computational workflow was developed, and a system for mining biomedical literature was trained to automatically identify microorganisms’ habitats. The workflow integrated existing bioinformatics tools to identify and characterise protein-targeting signals, cell surface-anchoring features, protein domains and protein families. This study successfully demonstrated a large-scale comparative genomics approach utilising a system called Microbase to harness Grid and Cloud computing technologies. A number of conserved protein domains and families that are significantly associated with a speiii iv cific set of mucosa-inhabiting microorganisms were identified. These conserved protein regions of which their functions were either characterised or unknown, were quite narrow in their coverage of taxa distribution, with only a few protein domains more widely distributed, suggesting that mucosal microorganisms evolved different solutions in their strategies and mechanisms for their survival in the host mucosal environments. Metabolic and biological processes common to many mucosal microorganisms included: carbohydrate and amino acid metabolisms, signal transduction, adhesion to host tissues or contents in mucosal environments (e.g. food remnants, mucins), and resistance to host defence mechanisms. Invasive or virulence factors were also identified in pathogenic strains. Several extracytoplasmic protein families were shared among prominent bacterial members of gut microbiota and microbial eukaryotes known to thrive in the same environment, suggesting that the ability of microbes to adapt to particular niches can be influenced by lateral gene transfer. A large number of conserved regions or protein families that potentially play important roles in the mucosa-microbe interactions were revealed by this study. Several of these candidates were proteins of unknown function. The identified candidates were subjected to more detailed computational analysis providing hypothesis for their function that will be tested experimentally in order to contribute to our understanding of the complex host-microbe interactions. Among the candidates of unknown function, a novel M60-like domain was identified. The domain was deposited in the Pfam database with accession number PF13402. The M60-like domain is shared amongst a broad range of mucosal microorganisms as well as their vertebrate hosts. Bioinformatics analyses of the M60-like domain suggested a potential catalytic function of the conserved motif as gluzincins metalloproteases. Targeting signals were detected across microbial M60-likecontaining proteins. Mucosa-related carbohydrate-binding modules (CBMs), CBM32 was also identified on several proteins containing M60-like domains encoded by known mucosal commensals and pathogens. The co-occurrence of the CBMs and M60-like domain, as well as annotated potential peptidase function unveiled a new functional context for the CBM, which is typically connected with carbohydrate processing enzymes but not proteases. The CBM domains linked with members of different protease families are likely to enable these proteases to bind to specific glycoproteins from host animals further highlighting the importance of proteases and CBMs (CBM32 and CBM5_12) in host-microbe interactions.EThOS - Electronic Theses Online ServiceMedical School, Newcastle UniversityGBUnited Kingdo
    corecore