261 research outputs found

    Computational Biology Methods and Their Application to the Comparative Genomics of Endocellular Symbiotic Bacteria of Insects

    Get PDF
    Comparative genomics has become a real tantalizing challenge in the postgenomic era. This fact has been mostly magnified by the plethora of new genomes becoming available in a daily bases. The overwhelming list of new genomes to compare has pushed the field of bioinformatics and computational biology forward toward the design and development of methods capable of identifying patterns in a sea of swamping data noise. Despite many advances made in such endeavor, the ever-lasting annoying exceptions to the general patterns remain to pose difficulties in generalizing methods for comparative genomics. In this review, we discuss the different tools devised to undertake the challenge of comparative genomics and some of the exceptions that compromise the generality of such methods. We focus on endosymbiotic bacteria of insects because of their genomic dynamics peculiarities when compared to free-living organisms

    Advanced Computational Biology Methods Identify Molecular Switches for Malignancy in an EGF Mouse Model of Liver Cancer

    Get PDF
    The molecular causes by which the epidermal growth factor receptor tyrosine kinase induces malignant transformation are largely unknown. To better understand EGFs' transforming capacity whole genome scans were applied to a transgenic mouse model of liver cancer and subjected to advanced methods of computational analysis to construct de novo gene regulatory networks based on a combination of sequence analysis and entrained graph-topological algorithms. Here we identified transcription factors, processes, key nodes and molecules to connect as yet unknown interacting partners at the level of protein-DNA interaction. Many of those could be confirmed by electromobility band shift assay at recognition sites of gene specific promoters and by western blotting of nuclear proteins. A novel cellular regulatory circuitry could therefore be proposed that connects cell cycle regulated genes with components of the EGF signaling pathway. Promoter analysis of differentially expressed genes suggested the majority of regulated transcription factors to display specificity to either the pre-tumor or the tumor state. Subsequent search for signal transduction key nodes upstream of the identified transcription factors and their targets suggested the insulin-like growth factor pathway to render the tumor cells independent of EGF receptor activity. Notably, expression of IGF2 in addition to many components of this pathway was highly upregulated in tumors. Together, we propose a switch in autocrine signaling to foster tumor growth that was initially triggered by EGF and demonstrate the knowledge gain form promoter analysis combined with upstream key node identification

    Genetic determinants of human phenotypes:understanding human infectious diseases by computational biology methods

    Get PDF
    This thesis aims to understand the genetic basis of complex human traits using systems biology (SB).SB exploits quantitative measurements, computational models, and high-throughput screening to decipher biological systems.Next-generation sequencing technologies allow profile human molecular traits cost-effectively at large scales, which brought biology research into the multi-omics era.Omics studies initiated with genomics.Combining genomics with transcriptomics and epigenomics enables us to study regulatory networks underlying phenotypes.This led me to study genetic regulatory relationships in health and diseases by identifying allele-specific expression and allele-specific open chromatin.Advances in omics methods also offer opportunities to identify and study phenotype- or disease-associated genetic variants using methods like genome-wide association studies (GWAS).GWAS generated numerous genetic associations with phenotypes and diseases.However, it is not yet possible to interpret most of these findings due to limited biological knowledge.Therefore, dissecting molecular functions of GWAS variants is important in the post-GWAS era.Importantly, integrating multi-omics data can answer: How does information flow in biological systems?The integration is biologically informative as it represents the biological signals flowing underlying phenotypes of interest or disease conditions.Moreover, single-cell methods deepen our biological knowledge by capturing characteristics and functions per cell.I explored genetic determinants of human molecular traits - allelic imbalance and host responses to pathogenic viruses in this thesis

    Methods in Computational Biology

    Get PDF
    Modern biology is rapidly becoming a study of large sets of data. Understanding these data sets is a major challenge for most life sciences, including the medical, environmental, and bioprocess fields. Computational biology approaches are essential for leveraging this ongoing revolution in omics data. A primary goal of this Special Issue, entitled “Methods in Computational Biology”, is the communication of computational biology methods, which can extract biological design principles from complex data sets, described in enough detail to permit the reproduction of the results. This issue integrates interdisciplinary researchers such as biologists, computer scientists, engineers, and mathematicians to advance biological systems analysis. The Special Issue contains the following sections:‱Reviews of Computational Methods‱Computational Analysis of Biological Dynamics: From Molecular to Cellular to Tissue/Consortia Levels‱The Interface of Biotic and Abiotic Processes‱Processing of Large Data Sets for Enhanced Analysis‱Parameter Optimization and Measuremen

    Glucocorticoids and Airway Smooth Muscle: A Few More Answers, Still More Questions

    Get PDF

    Methods detecting rhythmic gene expression are biologically relevant only for strong signal

    Get PDF
    Author summary To be active, genes have to be transcribed to RNA. For some genes, the transcription rate follows a circadian rhythm with a periodicity of approximately 24 hours; we call these genes “rhythmic”. In this study, we compared methods designed to detect rhythmic genes in gene expression data. The data are measures of the number of RNA molecules for each gene, given at several time-points, usually spaced 2 to 4 hours, over one or several periods of 24 hours. There are many such methods, but it is not known which ones work best to detect genes whose rhythmic expression is biologically functional. We compared these methods using a reference group of evolutionarily conserved rhythmic genes. We compared data from baboon, mouse, rat, zebrafish, fly, and mosquitoes. Surprisingly, no method was particularly effective. Furthermore, we found that only very strong rhythmic signals were relevant with each method. More precisely, when we use a usual cut-off to define rhythmic genes, the group of genes considered as rhythmic contains many genes whose rhythmicity cannot be confirmed to be biologically relevant. We also show that rhythmic genes mainly contain highly expressed genes. Finally, based on our results, we provide recommendations on which methods to use and how, and suggestions for future experimental designs

    A Rough Set-Based Model of HIV-1 Reverse Transcriptase Resistome

    Get PDF
    Reverse transcriptase (RT) is a viral enzyme crucial for HIV-1 replication. Currently, 12 drugs are targeted against the RT. The low fidelity of the RT-mediated transcription leads to the quick accumulation of drug-resistance mutations. The sequence-resistance relationship remains only partially understood. Using publicly available data collected from over 15 years of HIV proteome research, we have created a general and predictive rule-based model of HIV-1 resistance to eight RT inhibitors. Our rough set-based model considers changes in the physicochemical properties of a mutated sequence as compared to the wild-type strain. Thanks to the application of the Monte Carlo feature selection method, the model takes into account only the properties that significantly contribute to the resistance phenomenon. The obtained results show that drug-resistance is determined in more complex way than believed. We confirmed the importance of many resistance-associated sites, found some sites to be less relevant than formerly postulated and—more importantly—identified several previously neglected sites as potentially relevant. By mapping some of the newly discovered sites on the 3D structure of the RT, we were able to suggest possible molecular-mechanisms of drug-resistance. Importantly, our model has the ability to generalize predictions to the previously unseen cases. The study is an example of how computational biology methods can increase our understanding of the HIV-1 resistome

    Etude de l’expression des gĂšnes nycthĂ©mĂ©raux Ă  la lumiĂšre de l’évolution

    Get PDF
    Circadian clocks are now an important part of the understanding of biological systems. They are ubiquitous, found in a wide range of biological processes, from molecular systems to behavior, and are also found almost everywhere in nature: in animals, plants, bacteria and fungi. This thesis focuses on biological systems that respond to factors oscillating on a 24-hour time scale. The detection of genes expressed with a periodicity of 24 hrs remains a complicated aspect of analytical work. We show that most detection methods are efficient only for strong signals and that outside of these genes, the algorithms seem to detect rhythmic genes in a rather random way. We have also tried to understand why genes have periodic variations in the amount of their RNA or their protein they encode. Indeed, 20% to 50% of cyclically accumulated proteins (i.e. nycthemeral) are translated from non-oscillating mRNAs, and conversely, there are many mRNAs that oscillate but not the proteins they encode. Why is that? My results suggest that the nycthemeral variation of proteins concerns on average highly expressed proteins, which remain on average costlier to produce for the cell (in terms of energy and molecular material) compared to other proteins produced in a non-rhythmic way. Moreover, these rhythmic proteins would be even more expensive to produce if the cell had to maintain constantly a sufficient high effective level of these proteins to ensure the function. The costs of protein production are large enough to be under natural selection, whereas the costs of mRNA production are not. So, why do cells periodically produce some mRNAs? My results suggest that the periodic oscillation in mRNA quantity concerns genes that have on average weaker cell-to-cell variability (noise) than genes with constant mRNA levels. Since causality is not very clear, it is still possible that the rhythmicity of mRNAs may optimize the expression precision for noise-sensitive functions over a period of time, repeatedly, every 24 hours. Finally, mRNA rhythmicity concerns genes that have undergone a strong purifying selection. This strong purifying selection does not seem to concern genes that have periodic protein levels, although there is insufficient data to really go further in the formulation of an evolutionary explanation. Overall, I suggest the hypothesis that rhythmicity of gene expression provides an adaptive advantage only to species living in highly changing environments (over 24 hours). In such environments, i.e. for a large part of marine and terrestrial ecosystems, it is possible that the rhythmicity of gene expression could have allowed the preservation of complex and costly new properties that would otherwise have been eliminated. The evolutionary trade-offs take into account the advantages provided by the function, its expression costs and precision required, but maybe also the variability of expression leading to phenotypic diversity improving adaptability in a fluctuating environment

    The use of mixture density networks in the emulation of complex epidemiological individual-based models

    Get PDF
    Complex, highly-computational, individual-based models are abundant in epidemiology. For epidemics such as macro-parasitic diseases, detailed modelling of human behaviour and pathogen life-cycle are required in order to produce accurate results. This can often lead to models that are computationally-expensive to analyse and perform model fitting, and often require many simulation runs in order to build up sufficient statistics. Emulation can provide a more computationally-efficient output of the individual-based model, by approximating it using a statistical model. Previous work has used Gaussian processes (GPs) in order to achieve this, but these can not deal with multi-modal, heavy-tailed, or discrete distributions. Here, we introduce the concept of a mixture density network (MDN) in its application in the emulation of epidemiological models. MDNs incorporate both a mixture model and a neural network to provide a flexible tool for emulating a variety of models and outputs. We develop an MDN emulation methodology and demonstrate its use on a number of simple models incorporating both normal, gamma and beta distribution outputs. We then explore its use on the stochastic SIR model to predict the final size distribution and infection dynamics. MDNs have the potential to faithfully reproduce multiple outputs of an individual-based model and allow for rapid analysis from a range of users. As such, an open-access library of the method has been released alongside this manuscript
    • 

    corecore