3,743 research outputs found

    The ever-evolving concept of the gene: The use of RNA/Protein experimental techniques to understand genome functions

    Get PDF
    The completion of the human genome sequence together with advances in sequencing technologies have shifted the paradigm of the genome, as composed of discrete and hereditable coding entities, and have shown the abundance of functional noncoding DNA. This part of the genome, previously dismissed as "junk" DNA, increases proportionally with organismal complexity and contributes to gene regulation beyond the boundaries of known protein-coding genes. Different classes of functionally relevant nonprotein-coding RNAs are transcribed from noncoding DNA sequences. Among them are the long noncoding RNAs (lncRNAs), which are thought to participate in the basal regulation of protein-coding genes at both transcriptional and post-transcriptional levels. Although knowledge of this field is still limited, the ability of lncRNAs to localize in different cellular compartments, to fold into specific secondary structures and to interact with different molecules (RNA or proteins) endows them with multiple regulatory mechanisms. It is becoming evident that lncRNAs may play a crucial role in most biological processes such as the control of development, differentiation and cell growth. This review places the evolution of the concept of the gene in its historical context, from Darwin's hypothetical mechanism of heredity to the post-genomic era. We discuss how the original idea of protein-coding genes as unique determinants of phenotypic traits has been reconsidered in light of the existence of noncoding RNAs. We summarize the technological developments which have been made in the genome-wide identification and study of lncRNAs and emphasize the methodologies that have aided our understanding of the complexity of lncRNA-protein interactions in recent years

    Bridging the Gap between Genotype and Phenotype via Network Approaches

    Get PDF
    In the last few years we have witnessed tremendous progress in detecting associations between genetic variations and complex traits. While genome-wide association studies have been able to discover genomic regions that may influence many common human diseases, these discoveries created an urgent need for methods that extend the knowledge of genotype-phenotype relationships to the level of the molecular mechanisms behind them. To address this emerging need, computational approaches increasingly utilize a pathway-centric perspective. These new methods often utilize known or predicted interactions between genes and/or gene products. In this review, we survey recently developed network based methods that attempt to bridge the genotype-phenotype gap. We note that although these methods help narrow the gap between genotype and phenotype relationships, these approaches alone cannot provide the precise details of underlying mechanisms and current research is still far from closing the gap

    SLDR: a computational technique to identify novel genetic regulatory relationships

    Get PDF
    We developed a new computational technique called Step-Level Differential Response (SLDR) to identify genetic regulatory relationships. Our technique takes advantages of functional genomics data for the same species under different perturbation conditions, therefore complementary to current popular computational techniques. It can particularly identify "rare" activation/inhibition relationship events that can be difficult to find in experimental results. In SLDR, we model each candidate target gene as being controlled by N binary-state regulators that lead to ≤2N observable states ("step-levels") for the target. We applied SLDR to the study of the GEO microarray data set GSE25644, which consists of 158 different mutant S. cerevisiae gene expressional profiles. For each target gene t, we first clustered ordered samples into various clusters, each approximating an observable step-level of t to screen out the "de-centric" target. Then, we ordered each gene x as a candidate regulator and aligned t to x for the purpose of examining the step-level correlations between low expression set of x (Ro) and high expression set of x (Rh) from the regulator x to t, by finding max f(t, x): |Ro-Rh| over all candidate × in the genome for each t. We therefore obtained activation and inhibitions events from different combinations of Ro and Rh. Furthermore, we developed criteria for filtering out less-confident regulators, estimated the number of regulators for each target t, and evaluated identified top-ranking regulator-target relationship. Our results can be cross-validated with the Yeast Fitness database. SLDR is also computationally efficient with o(N²) complexity. In summary, we believe SLDR can be applied to the mining of functional genomics big data for future network biology and network medicine applications

    Pathway-Based Genomics Prediction using Generalized Elastic Net.

    Get PDF
    We present a novel regularization scheme called The Generalized Elastic Net (GELnet) that incorporates gene pathway information into feature selection. The proposed formulation is applicable to a wide variety of problems in which the interpretation of predictive features using known molecular interactions is desired. The method naturally steers solutions toward sets of mechanistically interlinked genes. Using experiments on synthetic data, we demonstrate that pathway-guided results maintain, and often improve, the accuracy of predictors even in cases where the full gene network is unknown. We apply the method to predict the drug response of breast cancer cell lines. GELnet is able to reveal genetic determinants of sensitivity and resistance for several compounds. In particular, for an EGFR/HER2 inhibitor, it finds a possible trans-differentiation resistance mechanism missed by the corresponding pathway agnostic approach

    Identifying Causal Genes and Dysregulated Pathways in Complex Diseases

    Get PDF
    In complex diseases, various combinations of genomic perturbations often lead to the same phenotype. On a molecular level, combinations of genomic perturbations are assumed to dys-regulate the same cellular pathways. Such a pathway-centric perspective is fundamental to understanding the mechanisms of complex diseases and the identification of potential drug targets. In order to provide an integrated perspective on complex disease mechanisms, we developed a novel computational method to simultaneously identify causal genes and dys-regulated pathways. First, we identified a representative set of genes that are differentially expressed in cancer compared to non-tumor control cases. Assuming that disease-associated gene expression changes are caused by genomic alterations, we determined potential paths from such genomic causes to target genes through a network of molecular interactions. Applying our method to sets of genomic alterations and gene expression profiles of 158 Glioblastoma multiforme (GBM) patients we uncovered candidate causal genes and causal paths that are potentially responsible for the altered expression of disease genes. We discovered a set of putative causal genes that potentially play a role in the disease. Combining an expression Quantitative Trait Loci (eQTL) analysis with pathway information, our approach allowed us not only to identify potential causal genes but also to find intermediate nodes and pathways mediating the information flow between causal and target genes. Our results indicate that different genomic perturbations indeed dys-regulate the same functional pathways, supporting a pathway-centric perspective of cancer. While copy number alterations and gene expression data of glioblastoma patients provided opportunities to test our approach, our method can be applied to any disease system where genetic variations play a fundamental causal role

    Consequences of refining biological networks through detailed pathway information : From genes to proteoforms

    Get PDF
    Biologiske nettverk kan brukes til å modellere molekylære prosesser, forstå sykdomsprogresjon og finne nye behandlingsstrategier. Denne avhandlingen har undersøkt hvordan utformingen av slike nettverk påvirker deres struktur, og hvordan dette kan benyttes til å forbedre spesifisiteten for påfølgende analyser av slike modeller. Det første som ble undersøkt var potensialet ved å bruke mer detaljerte molekylære data når man modellerer humane biokjemiske reaksjonsnettverk. Resultatene bekrefter at det er nok informasjon om proteoformer, det vil si proteiner i spesifikke post-translasjonelle tilstander, for systematiske analyser og viste også store forskjeller i strukturen mellom en gensentrisk og en proteoformsentrisk representasjon. Deretter utviklet vi programmatisk tilgang og søk i slike nettverk basert på ulike typer av biomolekyler, samt en generisk algoritme som muliggjør fleksibel kartlegging av eksperimentelle data knyttet til den teoretiske representasjonen av proteoformer i referansedatabaser. Til slutt ble det konstruert såkalte pathway-spesifikke nettverk ved bruk av ulike detaljnivåer ved representasjonen av biokjemiske reaksjoner. Her ble informasjon som vanligvis blir oversett i standard nettverksrepresentasjoner inkludert: små molekyler, isoformer og modifikasjoner. Strukturelle egenskaper, som nettverksstørrelse, graddistribusjon og tilkobling i både globale og lokale undernettverk, ble deretter analysert for å kvantifisere virkningene av endringene.Biological networks can be used to model molecular processes, understand disease progression, and find new treatment strategies. This thesis investigated how refining the design of biological networks influences their structure, and how this can be used to improve the specificity of pathway analyses. First, we investigate the potential to use more detailed molecular data in current human biological pathways. We verified that there are enough proteoform annotations, i.e. information about proteins in specific post-translational states, for systematic analyses and characterized the structure of gene-centric versus proteoform-centric network representations of pathways. Next, we enabled the programmatic search and mining of pathways using different models for biomolecules including proteoforms. We notably designed a generic proteoform matching algorithm enabling the flexible mapping of experimental data to the theoretic representation in reference databases. Finally, we constructed pathway-based networks using different degrees of detail in the representation of biochemical reactions. We included information overlooked in most standard network representations: small molecules, isoforms, and post-translational modifications. Structural properties such as network size, degree distribution, and connectivity in both global and local subnetworks, were analysed to quantify the impact of the added molecular entities.Doktorgradsavhandlin

    Integration and mining of malaria molecular, functional and pharmacological data: how far are we from a chemogenomic knowledge space?

    Get PDF
    The organization and mining of malaria genomic and post-genomic data is highly motivated by the necessity to predict and characterize new biological targets and new drugs. Biological targets are sought in a biological space designed from the genomic data from Plasmodium falciparum, but using also the millions of genomic data from other species. Drug candidates are sought in a chemical space containing the millions of small molecules stored in public and private chemolibraries. Data management should therefore be as reliable and versatile as possible. In this context, we examined five aspects of the organization and mining of malaria genomic and post-genomic data: 1) the comparison of protein sequences including compositionally atypical malaria sequences, 2) the high throughput reconstruction of molecular phylogenies, 3) the representation of biological processes particularly metabolic pathways, 4) the versatile methods to integrate genomic data, biological representations and functional profiling obtained from X-omic experiments after drug treatments and 5) the determination and prediction of protein structures and their molecular docking with drug candidate structures. Progresses toward a grid-enabled chemogenomic knowledge space are discussed.Comment: 43 pages, 4 figures, to appear in Malaria Journa

    Methods for Joint Normalization and Comparison of Hi-C data

    Get PDF
    The development of chromatin conformation capture technology has opened new avenues of study into the 3D structure and function of the genome. Chromatin structure is known to influence gene regulation, and differences in structure are now emerging as a mechanism of regulation between, e.g., cell differentiation and disease vs. normal states. Hi-C sequencing technology now provides a way to study the 3D interactions of the chromatin over the whole genome. However, like all sequencing technologies, Hi-C suffers from several forms of bias stemming from both the technology and the DNA sequence itself. Several normalization methods have been developed for normalizing individual Hi-C datasets, but little work has been done on developing joint normalization methods for comparing two or more Hi-C datasets. To make full use of Hi-C data, joint normalization and statistical comparison techniques are needed to carry out experiments to identify regions where chromatin structure differs between conditions. We develop methods for the joint normalization and comparison of two Hi-C datasets, which we then extended to more complex experimental designs. Our normalization method is novel in that it makes use of the distance-dependent nature of chromatin interactions. Our modification of the Minus vs. Average (MA) plot to the Minus vs. Distance (MD) plot allows for a nonparametric data-driven normalization technique using loess smoothing. Additionally, we present a simple statistical method using Z-scores for detecting differentially interacting regions between two datasets. Our initial method was published as the Bioconductor R package HiCcompare [http://bioconductor.org/packages/HiCcompare/](http://bioconductor.org/packages/HiCcompare/). We then further extended our normalization and comparison method for use in complex Hi-C experiments with more than two datasets and optional covariates. We extended the normalization method to jointly normalize any number of Hi-C datasets by using a cyclic loess procedure on the MD plot. The cyclic loess normalization technique can remove between dataset biases efficiently and effectively even when several datasets are analyzed at one time. Our comparison method implements a generalized linear model-based approach for comparing complex Hi-C experiments, which may have more than two groups and additional covariates. The extended methods are also available as a Bioconductor R package [http://bioconductor.org/packages/multiHiCcompare/](http://bioconductor.org/packages/multiHiCcompare/). Finally, we demonstrate the use of HiCcompare and multiHiCcompare in several test cases on real data in addition to comparing them to other similar methods (https://doi.org/10.1002/cpbi.76)

    Interpretation of psychiatric genome-wide association studies with multispecies heterogeneous functional genomic data integration.

    Get PDF
    Genome-wide association studies and other discovery genetics methods provide a means to identify previously unknown biological mechanisms underlying behavioral disorders that may point to new therapeutic avenues, augment diagnostic tools, and yield a deeper understanding of the biology of psychiatric conditions. Recent advances in psychiatric genetics have been made possible through large-scale collaborative efforts. These studies have begun to unearth many novel genetic variants associated with psychiatric disorders and behavioral traits in human populations. Significant challenges remain in characterizing the resulting disease-associated genetic variants and prioritizing functional follow-up to make them useful for mechanistic understanding and development of therapeutics. Model organism research has generated extensive genomic data that can provide insight into the neurobiological mechanisms of variant action, but a cohesive effort must be made to establish which aspects of the biological modulation of behavioral traits are evolutionarily conserved across species. Scalable computing, new data integration strategies, and advanced analysis methods outlined in this review provide a framework to efficiently harness model organism data in support of clinically relevant psychiatric phenotypes
    • …
    corecore