121 research outputs found

    HENA, heterogeneous network-based data set for Alzheimer's disease.

    Get PDF
    Alzheimer's disease and other types of dementia are the top cause for disabilities in later life and various types of experiments have been performed to understand the underlying mechanisms of the disease with the aim of coming up with potential drug targets. These experiments have been carried out by scientists working in different domains such as proteomics, molecular biology, clinical diagnostics and genomics. The results of such experiments are stored in the databases designed for collecting data of similar types. However, in order to get a systematic view of the disease from these independent but complementary data sets, it is necessary to combine them. In this study we describe a heterogeneous network-based data set for Alzheimer's disease (HENA). Additionally, we demonstrate the application of state-of-the-art graph convolutional networks, i.e. deep learning methods for the analysis of such large heterogeneous biological data sets. We expect HENA to allow scientists to explore and analyze their own results in the broader context of Alzheimer's disease research

    Charting the Host Adaptation of Influenza Viruses

    Get PDF
    Four influenza pandemics have struck the human population during the last 100 years causing substantial morbidity and mortality. The pandemics were caused by the introduction of a new virus into the human population from an avian or swine host or through the mixing of virus segments from an animal host with a human virus to create a new reassortant subtype virus. Understanding which changes have contributed to the adaptation of the virus to the human host is essential in assessing the pandemic potential of current and future animal viruses. Here, we develop a measure of the level of adaptation of a given virus strain to a particular host. We show that adaptation to the human host has been gradual with a timescale of decades and that none of the virus proteins have yet achieved full adaptation to the selective constraints. When the measure is applied to historical data, our results indicate that the 1918 influenza virus had undergone a period of preadaptation prior to the 1918 pandemic. Yet, ancestral reconstruction of the avian virus that founded the classical swine and 1918 human influenza lineages shows no evidence that this virus was exceptionally preadapted to humans. These results indicate that adaptation to humans occurred following the initial host shift from birds to mammals, including a significant amount prior to 1918. The 2009 pandemic virus seems to have undergone preadaptation to human-like selective constraints during its period of circulation in swine. Ancestral reconstruction along the human virus tree indicates that mutations that have increased the adaptation of the virus have occurred preferentially along the trunk of the tree. The method should be helpful in assessing the potential of current viruses to found future epidemics or pandemics

    Background frequencies for residue variability estimates: BLOSUM revisited

    Get PDF
    <p>Abstract</p> <p>Background</p> <p>Shannon entropy applied to columns of multiple sequence alignments as a score of residue conservation has proven one of the most fruitful ideas in bioinformatics. This straightforward and intuitively appealing measure clearly shows the regions of a protein under increased evolutionary pressure, highlighting their functional importance. The inability of the column entropy to differentiate between residue types, however, limits its resolution power.</p> <p>Results</p> <p>In this work we suggest generalizing Shannon's expression to a function with similar mathematical properties, that, at the same time, includes observed propensities of residue types to mutate to each other. To do that, we revisit the original construction of BLOSUM matrices, and re-interpret them as mutation probability matrices. These probabilities are then used as background frequencies in the revised residue conservation measure.</p> <p>Conclusion</p> <p>We show that joint entropy with BLOSUM-proportional probabilities as a reference distribution enables detection of protein functional sites comparable in quality to a time-costly maximum-likelihood evolution simulation method (rate4site), and offers greater resolution than the Shannon entropy alone, in particular in the cases when the available sequences are of narrow evolutionary scope.</p

    Refining transcriptional regulatory networks using network evolutionary models and gene histories

    Get PDF
    <p>Abstract</p> <p>Background</p> <p>Computational inference of transcriptional regulatory networks remains a challenging problem, in part due to the lack of strong network models. In this paper we present evolutionary approaches to improve the inference of regulatory networks for a family of organisms by developing an evolutionary model for these networks and taking advantage of established phylogenetic relationships among these organisms. In previous work, we used a simple evolutionary model and provided extensive simulation results showing that phylogenetic information, combined with such a model, could be used to gain significant improvements on the performance of current inference algorithms.</p> <p>Results</p> <p>In this paper, we extend the evolutionary model so as to take into account gene duplications and losses, which are viewed as major drivers in the evolution of regulatory networks. We show how to adapt our evolutionary approach to this new model and provide detailed simulation results, which show significant improvement on the reference network inference algorithms. Different evolutionary histories for gene duplications and losses are studied, showing that our adapted approach is feasible under a broad range of conditions. We also provide results on biological data (<it>cis</it>-regulatory modules for 12 species of <it>Drosophila</it>), confirming our simulation results.</p

    Discovering local patterns of co - evolution: computational aspects and biological examples

    Get PDF
    <p>Abstract</p> <p>Background</p> <p>Co-evolution is the process in which two (or more) sets of orthologs exhibit a similar or correlative pattern of evolution. Co-evolution is a powerful way to learn about the functional interdependencies between sets of genes and cellular functions and to predict physical interactions. More generally, it can be used for answering fundamental questions about the evolution of biological systems.</p> <p>Orthologs that exhibit a strong signal of co-evolution in a certain part of the evolutionary tree may show a mild signal of co-evolution in other branches of the tree. The major reasons for this phenomenon are noise in the biological input, genes that gain or lose functions, and the fact that some measures of co-evolution relate to rare events such as positive selection. Previous publications in the field dealt with the problem of finding sets of genes that co-evolved along an entire underlying phylogenetic tree, without considering the fact that often co-evolution is local.</p> <p>Results</p> <p>In this work, we describe a new set of biological problems that are related to finding patterns of <it>local </it>co-evolution. We discuss their computational complexity and design algorithms for solving them. These algorithms outperform other bi-clustering methods as they are designed specifically for solving the set of problems mentioned above.</p> <p>We use our approach to trace the co-evolution of fungal, eukaryotic, and mammalian genes at high resolution across the different parts of the corresponding phylogenetic trees. Specifically, we discover regions in the fungi tree that are enriched with positive evolution. We show that metabolic genes exhibit a remarkable level of co-evolution and different patterns of co-evolution in various biological datasets.</p> <p>In addition, we find that protein complexes that are related to gene expression exhibit non-homogenous levels of co-evolution across different parts of the <it>fungi </it>evolutionary line. In the case of mammalian evolution, signaling pathways that are related to <it>neurotransmission </it>exhibit a relatively higher level of co-evolution along the <it>primate </it>subtree.</p> <p>Conclusions</p> <p>We show that finding local patterns of co-evolution is a computationally challenging task and we offer novel algorithms that allow us to solve this problem, thus opening a new approach for analyzing the evolution of biological systems.</p

    Detecting microsatellites within genomes: significant variation among algorithms

    Get PDF
    <p>Abstract</p> <p>Background</p> <p>Microsatellites are short, tandemly-repeated DNA sequences which are widely distributed among genomes. Their structure, role and evolution can be analyzed based on exhaustive extraction from sequenced genomes. Several dedicated algorithms have been developed for this purpose. Here, we compared the detection efficiency of five of them (TRF, Mreps, Sputnik, STAR, and RepeatMasker).</p> <p>Results</p> <p>Our analysis was first conducted on the human X chromosome, and microsatellite distributions were characterized by microsatellite number, length, and divergence from a pure motif. The algorithms work with user-defined parameters, and we demonstrate that the parameter values chosen can strongly influence microsatellite distributions. The five algorithms were then compared by fixing parameters settings, and the analysis was extended to three other genomes (<it>Saccharomyces cerevisiae</it>, <it>Neurospora crassa </it>and <it>Drosophila melanogaster</it>) spanning a wide range of size and structure. Significant differences for all characteristics of microsatellites were observed among algorithms, but not among genomes, for both perfect and imperfect microsatellites. Striking differences were detected for short microsatellites (below 20 bp), regardless of motif.</p> <p>Conclusion</p> <p>Since the algorithm used strongly influences empirical distributions, studies analyzing microsatellite evolution based on a comparison between empirical and theoretical size distributions should therefore be considered with caution. We also discuss why a typological definition of microsatellites limits our capacity to capture their genomic distributions.</p

    Prediction of binding hot spot residues by using structural and evolutionary parameters

    Get PDF
    In this work, we present a method for predicting hot spot residues by using a set of structural and evolutionary parameters. Unlike previous studies, we use a set of parameters which do not depend on the structure of the protein in complex, so that the predictor can also be used when the interface region is unknown. Despite the fact that no information concerning proteins in complex is used for prediction, the application of the method to a compiled dataset described in the literature achieved a performance of 60.4%, as measured by F-Measure, corresponding to a recall of 78.1% and a precision of 49.5%. This result is higher than those reported by previous studies using the same data set

    Efficient algorithms for reconstructing gene content by co-evolution

    Get PDF
    <p>Abstract</p> <p>Background</p> <p>In a previous study we demonstrated that co-evolutionary information can be utilized for improving the accuracy of ancestral gene content reconstruction. To this end, we defined a new computational problem, the Ancestral Co-Evolutionary (ACE) problem, and developed algorithms for solving it.</p> <p>Results</p> <p>In the current paper we generalize our previous study in various ways. First, we describe new efficient computational approaches for solving the ACE problem. The new approaches are based on reductions to classical methods such as linear programming relaxation, quadratic programming, and min-cut. Second, we report new computational hardness results related to the ACE, including practical cases where it can be solved in polynomial time.</p> <p>Third, we generalize the ACE problem and demonstrate how our approach can be used for inferring parts of the genomes of <it>non-ancestral</it> organisms. To this end, we describe a heuristic for finding the portion of the genome ('dominant set’) that can be used to reconstruct the rest of the genome with the lowest error rate. This heuristic utilizes both evolutionary information and co-evolutionary information.</p> <p>We implemented these algorithms on a large input of the ACE problem (95 unicellular organisms, 4,873 protein families, and 10, 576 of co-evolutionary relations), demonstrating that some of these algorithms can outperform the algorithm used in our previous study. In addition, we show that based on our approach a ’dominant set’ cab be used reconstruct a major fraction of a genome (up to 79%) with relatively low error-rate (<it>e.g.</it> 0.11). We find that the ’dominant set’ tends to include metabolic and regulatory genes, with high evolutionary rate, and low protein abundance and number of protein-protein interactions.</p> <p>Conclusions</p> <p>The <it>ACE</it> problem can be efficiently extended for inferring the genomes of organisms that exist today. In addition, it may be solved in polynomial time in many practical cases. Metabolic and regulatory genes were found to be the most important groups of genes necessary for reconstructing gene content of an organism based on other related genomes.</p
    corecore