215 research outputs found

    State of the art in silico tools for the study of signaling pathways in cancer

    Get PDF
    In the last several years, researchers have exhibited an intense interest in the evolutionarily conserved signaling pathways that have crucial roles during embryonic development. Interestingly, the malfunctioning of these signaling pathways leads to several human diseases, including cancer. The chemical and biophysical events that occur during cellular signaling, as well as the number of interactions within a signaling pathway, make these systems complex to study. In silico resources are tools used to aid the understanding of cellular signaling pathways. Systems approaches have provided a deeper knowledge of diverse biochemical processes, including individual metabolic pathways, signaling networks and genome-scale metabolic networks. In the future, these tools will be enormously valuable, if they continue to be developed in parallel with growing biological knowledge. In this study, an overview of the bioinformatics resources that are currently available for the analysis of biological networks is provided

    Visualizing genome and systems biology: technologies, tools, implementation techniques and trends, past, present and future.

    Get PDF
    "Α picture is worth a thousand words." This widely used adage sums up in a few words the notion that a successful visual representation of a concept should enable easy and rapid absorption of large amounts of information. Although, in general, the notion of capturing complex ideas using images is very appealing, would 1000 words be enough to describe the unknown in a research field such as the life sciences? Life sciences is one of the biggest generators of enormous datasets, mainly as a result of recent and rapid technological advances; their complexity can make these datasets incomprehensible without effective visualization methods. Here we discuss the past, present and future of genomic and systems biology visualization. We briefly comment on many visualization and analysis tools and the purposes that they serve. We focus on the latest libraries and programming languages that enable more effective, efficient and faster approaches for visualizing biological concepts, and also comment on the future human-computer interaction trends that would enable for enhancing visualization further

    Multi-dimensional omics approaches to dissect natural immune control mechanisms associated with RNA virus infections

    Get PDF
    In recent decades, global health has been challenged by emerging and re-emerging viruses such as severe acute respiratory syndrome coronavirus 2 (SARS-CoV2), human immunodeficiency viruses (HIV-1), and Crimean–Congo hemorrhagic fever virus (CCHFV). Studies have shown dysregulations in the host metabolic processes against SARS-CoV2 and HIV-1 infections, and the research on CCHFV infection is still in the infant stage. Hence, understanding the host metabolic re-programming on the reaction level in infectious disease has therapeutic importance. The thesis uses systems biology methods to investigate the host metabolic alterations in response to SARS-CoV2, HIV-1, and CCHFV infections. The three distinct viruses induce distinct effects on human metabolism that, nevertheless, show some commonalities. We have identified alterations in various immune cell types in patients during the infections of the three viruses. Further, differential expression analysis identified that COVID-19 causes disruptions in pathways related to antiviral response and metabolism (fructose mannose metabolism, oxidative phosphorylation (OXPHOS), and pentose phosphate pathway). Up-regulation of OXPHOS and ROS pathways with most changes in OXPHOS complexes I, III, and IV were identified in people living with HIV on treatment (PLWHART). The acute phase of CCHFV infection is found to be linked with OXPHOS, glycolysis, N-glycan biosynthesis, and NOD-like receptor signaling pathways. The dynamic nature of the metabolic process and adaptive immune response in CCHFV-pathogenesis are also observed. Further, we have identified different metabolic flux in reactions transporting TCA cycle intermediates from the cytosol to mitochondria in COVID-19 patients. Genes such as monocarboxylate transporter (SLC16A6) and nucleoside transporter (SLC29A1) and metabolites such as α-ketoglutarate, succinate, and malate were found to be linked with COVID-19 disease response. Metabolic reactions associated with amino acid, carbohydrate, and energy metabolism pathways and various transporter reactions were observed to be uniquely disrupted in PLWHART along with increased production of αketoglutarate (αKG) and ATP molecules. Changes in essential (leucine and threonine) and non-essential (arginine, alanine, and glutamine) amino acid transport were found to be caused by acute CCHFV infection. The altered flux of reactions involving TCA cycle compounds such as pyruvate, isocitrate, and alpha-ketoglutarate was also observed in CCHFV infection. The research described in the thesis displayed dysregulations in similar metabolic processes against the three viral Infections. But further downstream analysis unveiled unique alterations in several metabolic reactions specific to each virus in the same metabolic pathways showing the importance of increasing the resolution of knowledge about host metabolism in infectious diseases

    Leveraging Multilayered “Omics” Data for Atopic Dermatitis: A Road Map to Precision Medicine

    Get PDF
    Atopic dermatitis (AD) is a complex multifactorial inflammatory skin disease that affects ~280 million people worldwide. About 85% of AD cases begin in childhood, a significant portion of which can persist into adulthood. Moreover, a typical progression of children with AD to food allergy, asthma or allergic rhinitis has been reported (“allergic march” or “atopic march”). AD comprises highly heterogeneous sub-phenotypes/endotypes resulting from complex interplay between intrinsic and extrinsic factors, such as environmental stimuli, and genetic factors regulating cutaneous functions (impaired barrier function, epidermal lipid, and protease abnormalities), immune functions and the microbiome. Though the roles of high-throughput “omics” integrations in defining endotypes are recognized, current analyses are primarily based on individual omics data and using binary clinical outcomes. Although individual omics analysis, such as genome-wide association studies (GWAS), can effectively map variants correlated with AD, the majority of the heritability and the functional relevance of discovered variants are not explained or known by the identified variants. The limited success of singular approaches underscores the need for holistic and integrated approaches to investigate complex phenotypes using trans-omics data integration strategies. Integrating omics layers (e.g., genome, epigenome, transcriptome, proteome, metabolome, lipidome, exposome, microbiome), which often have complementary and synergistic effects, might provide the opportunity to capture the flow of information underlying AD disease manifestation. Overlapping genes/candidates derived from multiple omics types include FLG, SPINK5, S100A8, and SERPINB3 in AD pathogenesis. Overlapping pathways include macrophage, endothelial cell and fibroblast activation pathways, in addition to well-known Th1/Th2 and NFkB activation pathways. Interestingly, there was more multi-omics overlap at the pathway level than gene level. Further analysis of multi-omics overlap at the tissue level showed that among 30 tissue types from the GTEx database, skin and esophagus were significantly enriched, indicating the biological interconnection between AD and food allergy. The present work explores multi-omics integration and provides new biological insights to better define the biological basis of AD etiology and confirm previously reported AD genes/pathways. In this context, we also discuss opportunities and challenges introduced by “big omics data” and their integration

    Unified feature association networks through integration of transcriptomic and proteomic data

    Get PDF
    High-throughput multi-omics studies and corresponding network analyses of multi-omic data have rapidly expanded their impact over the last 10 years. As biological features of different types (e.g. transcripts, proteins, metabolites) interact within cellular systems, the greatest amount of knowledge can be gained from networks that incorporate multiple types of -omic data. However, biological and technical sources of variation diminish the ability to detect cross-type associations, yielding networks dominated by communities comprised of nodes of the same type. We describe here network building methods that can maximize edges between nodes of different data types leading to integrated networks, networks that have a large number of edges that link nodes of different–omic types (transcripts, proteins, lipids etc). We systematically rank several network inference methods and demonstrate that, in many cases, using a random forest method, GENIE3, produces the most integrated networks. This increase in integration does not come at the cost of accuracy as GENIE3 produces networks of approximately the same quality as the other network inference methods tested here. Using GENIE3, we also infer networks representing antibody-mediated Dengue virus cell invasion and receptor-mediated Dengue virus invasion. A number of functional pathways showed centrality differences between the two networks including genes responding to both GM-CSF and IL-4, which had a higher centrality value in an antibody-mediated vs. receptor-mediated Dengue network. Because a biological system involves the interplay of many different types of molecules, incorporating multiple data types into networks will improve their use as models of biological systems. The methods explored here are some of the first to specifically highlight and address the challenges associated with how such multi-omic networks can be assembled and how the greatest number of interactions can be inferred from different data types. The resulting networks can lead to the discovery of new host response patterns and interactions during viral infection, generate new hypotheses of pathogenic mechanisms and confirm mechanisms of disease

    EyeMirDB: a Web-Based Platform of Experimentally Supported Eye Disease-miRNA Information

    Get PDF
    Background:  Studies of microRNA biology have increased in numerous scientific research domains, including eye science. MicroRNAs (miRNAs) are small non-coding RNAs that operate as post-transcriptional regulators of gene expression by destroying or blocking the translation of target messenger RNA.   Despite significant efforts to investigate the miRNA of eye disease, a complete platform of frequent ocular disease with genes, pathways, and miRNA is still unavailable. Material and Methods: Three well-known databases were used as the main data source: DisGeNET, OMIM, and KEGG. The curated genes involved in each disease were manually collected. Then, the annotation information like gene’s sequence, description, chromosome’s number, start and end loci were extracted from the Ensembl data source. Gene’s pathway information was earned from KEGG and Reactome. Finally, experimentally validated gene’s related miRNA has been collected from miRecords, miRTarBase, and TarBase. In order to consider miRNAs expression in ocular tissues. Results: we present EyeMirDB (http://eyemirdb.databanks.behrc.ir/), a web-based platform of consisting of all predicted and validated miRNAs. Information on the annotation of miRNA-related genes was also collected in order to better understand the effects of miRNA. Pathways by which these genes are active were also identified. Right now, EyeMirDB contains 429 curated genes, 1258 pathways, and 2596 validated miRNAs of 25 prevalent ocular diseases. Conclusion: We introduce EyeMirDB, a web-based platform of Eye diseases-related interactions including disease-gene, gene-miRNA, gene-pathway curated information, and annotations, with the optionality of studying all these entities from different viewpoints. This data portal is a good entry point for ocular disease researchers

    Network-based methods for biological data integration in precision medicine

    Full text link
    [eng] The vast and continuously increasing volume of available biomedical data produced during the last decades opens new opportunities for large-scale modeling of disease biology, facilitating a more comprehensive and integrative understanding of its processes. Nevertheless, this type of modelling requires highly efficient computational systems capable of dealing with such levels of data volumes. Computational approximations commonly used in machine learning and data analysis, namely dimensionality reduction and network-based approaches, have been developed with the goal of effectively integrating biomedical data. Among these methods, network-based machine learning stands out due to its major advantage in terms of biomedical interpretability. These methodologies provide a highly intuitive framework for the integration and modelling of biological processes. This PhD thesis aims to explore the potential of integration of complementary available biomedical knowledge with patient-specific data to provide novel computational approaches to solve biomedical scenarios characterized by data scarcity. The primary focus is on studying how high-order graph analysis (i.e., community detection in multiplex and multilayer networks) may help elucidate the interplay of different types of data in contexts where statistical power is heavily impacted by small sample sizes, such as rare diseases and precision oncology. The central focus of this thesis is to illustrate how network biology, among the several data integration approaches with the potential to achieve this task, can play a pivotal role in addressing this challenge provided its advantages in molecular interpretability. Through its insights and methodologies, it introduces how network biology, and in particular, models based on multilayer networks, facilitates bringing the vision of precision medicine to these complex scenarios, providing a natural approach for the discovery of new biomedical relationships that overcomes the difficulties for the study of cohorts presenting limited sample sizes (data-scarce scenarios). Delving into the potential of current artificial intelligence (AI) and network biology applications to address data granularity issues in the precision medicine field, this PhD thesis presents pivotal research works, based on multilayer networks, for the analysis of two rare disease scenarios with specific data granularities, effectively overcoming the classical constraints hindering rare disease and precision oncology research. The first research article presents a personalized medicine study of the molecular determinants of severity in congenital myasthenic syndromes (CMS), a group of rare disorders of the neuromuscular junction (NMJ). The analysis of severity in rare diseases, despite its importance, is typically neglected due to data availability. In this study, modelling of biomedical knowledge via multilayer networks allowed understanding the functional implications of individual mutations in the cohort under study, as well as their relationships with the causal mutations of the disease and the different levels of severity observed. Moreover, the study presents experimental evidence of the role of a previously unsuspected gene in NMJ activity, validating the hypothetical role predicted using the newly introduced methodologies. The second research article focuses on the applicability of multilayer networks for gene priorization. Enhancing concepts for the analysis of different data granularities firstly introduced in the previous article, the presented research provides a methodology based on the persistency of network community structures in a range of modularity resolution, effectively providing a new framework for gene priorization for patient stratification. In summary, this PhD thesis presents major advances on the use of multilayer network-based approaches for the application of precision medicine to data-scarce scenarios, exploring the potential of integrating extensive available biomedical knowledge with patient-specific data

    Machine Learning Applications for Drug Repurposing

    Full text link
    The cost of bringing a drug to market is astounding and the failure rate is intimidating. Drug discovery has been of limited success under the conventional reductionist model of one-drug-one-gene-one-disease paradigm, where a single disease-associated gene is identified and a molecular binder to the specific target is subsequently designed. Under the simplistic paradigm of drug discovery, a drug molecule is assumed to interact only with the intended on-target. However, small molecular drugs often interact with multiple targets, and those off-target interactions are not considered under the conventional paradigm. As a result, drug-induced side effects and adverse reactions are often neglected until a very late stage of the drug discovery, where the discovery of drug-induced side effects and potential drug resistance can decrease the value of the drug and even completely invalidate the use of the drug. Thus, a new paradigm in drug discovery is needed. Structural systems pharmacology is a new paradigm in drug discovery that the drug activities are studied by data-driven large-scale models with considerations of the structures and drugs. Structural systems pharmacology will model, on a genome scale, the energetic and dynamic modifications of protein targets by drug molecules as well as the subsequent collective effects of drug-target interactions on the phenotypic drug responses. To date, however, few experimental and computational methods can determine genome-wide protein-ligand interaction networks and the clinical outcomes mediated by them. As a result, the majority of proteins have not been charted for their small molecular ligands; we have a limited understanding of drug actions. To address the challenge, this dissertation seeks to develop and experimentally validate innovative computational methods to infer genome-wide protein-ligand interactions and multi-scale drug-phenotype associations, including drug-induced side effects. The hypothesis is that the integration of data-driven bioinformatics tools with structure-and-mechanism-based molecular modeling methods will lead to an optimal tool for accurately predicting drug actions and drug associated phenotypic responses, such as side effects. This dissertation starts by reviewing the current status of computational drug discovery for complex diseases in Chapter 1. In Chapter 2, we present REMAP, a one-class collaborative filtering method to predict off-target interactions from protein-ligand interaction network. In our later work, REMAP was integrated with structural genomics and statistical machine learning methods to design a dual-indication polypharmacological anticancer therapy. In Chapter 3, we extend REMAP, the core method in Chapter 2, into a multi-ranked collaborative filtering algorithm, WINTF, and present relevant mathematical justifications. Chapter 4 is an application of WINTF to repurpose an FDA-approved drug diazoxide as a potential treatment for triple negative breast cancer, a deadly subtype of breast cancer. In Chapter 5, we present a multilayer extension of REMAP, applied to predict drug-induced side effects and the associated biological pathways. In Chapter 6, we close this dissertation by presenting a deep learning application to learn biochemical features from protein sequence representation using a natural language processing method

    Cancer proteogenomics : connecting genotype to molecular phenotype

    Get PDF
    The central dogma of molecular biology describes the one-way road from DNA to RNA and finally to protein. Yet, how this flow of information encoded in DNA as genes (genotype) is regulated in order to produce the observable traits of an individual (phenotype) remains unanswered. Recent advances in high-throughput data, i.e., ‘omics’, have allowed the quantification of DNA, RNA and protein levels leading to integrative analyses that essentially probe the central dogma along all of its constituent molecules. Evidence from these analyses suggest that mRNA abundances are at best a moderate proxy for proteins which are the main functional units of cells and thus closer to the phenotype. Cancer proteogenomic studies consider the ensemble of proteins, the so-called proteome, as the readout of the functional molecular phenotype to investigate its influence by upstream events, for example DNA copy number alterations. In typical proteogenomic studies, however, the identified proteome is a simplification of its actual composition, as they methodologically disregard events such as splicing, proteolytic cleavage and post-translational modifications that generate unique protein species – proteoforms. The scope of this thesis is to study the proteome diversity in terms of: a) the complex genetic background of three tumor types, i.e. breast cancer, childhood acute lymphoblastic leukemia and lung cancer, and b) the proteoform composition, describing a computational method for detecting protein species based on their distinct quantitative profiles. In Paper I, we present a proteogenomic landscape of 45 breast cancer samples representative of the five PAM50 intrinsic subtypes. We studied the effect of copy number alterations (CNA) on mRNA and protein levels, overlaying a public dataset of drug- perturbed protein degradation. In Paper II, we describe a proteogenomic analysis of 27 B-cell precursor acute lymphoblastic leukemia clinical samples that compares high hyperdiploid versus ETV6/RUNX1-positive cases. We examined the impact of the amplified chromosomes on mRNA and protein abundance, specifically the linear trend between the amplification level and the dosage effect. Moreover, we investigated mRNA-protein quantitative discrepancies with regard to post-transcriptional and post-translational effects such as mRNA/protein stability and miRNA targeting. In Paper III, we describe a proteogenomic cohort of 141 non-small cell lung cancer clinical samples. We used clustering methods to identify six distinct proteome-based subtypes. We integrated the protein abundances in pathways using protein-protein correlation networks, bioinformatically deconvoluted the immune composition and characterized the neoantigen burden. In Paper IV, we developed a pipeline for proteoform detection from bottom-up mass- spectrometry-based proteomics. Using an in-depth proteomics dataset of 18 cancer cell lines, we identified proteoforms related to splice variant peptides supported by RNA-seq data. This thesis adds on the previous literature of proteogenomic studies by analyzing the tumor proteome and its regulation along the flow of the central dogma of molecular biology. It is anticipated that some of these findings would lead to novel insights about tumor biology and set the stage for clinical applications to improve the current cancer patient care
    corecore