59 research outputs found

    Meta-Alignment with Crumble and Prune: Partitioning very large alignment problems for performance and parallelization

    Get PDF
    <p>Abstract</p> <p>Background</p> <p>Continuing research into the global multiple sequence alignment problem has resulted in more sophisticated and principled alignment methods. Unfortunately these new algorithms often require large amounts of time and memory to run, making it nearly impossible to run these algorithms on large datasets. As a solution, we present two general methods, Crumble and Prune, for breaking a phylogenetic alignment problem into smaller, more tractable sub-problems. We call Crumble and Prune <it>meta-alignment </it>methods because they use existing alignment algorithms and can be used with many current alignment programs. Crumble breaks long alignment problems into shorter sub-problems. Prune divides the phylogenetic tree into a collection of smaller trees to reduce the number of sequences in each alignment problem. These methods are orthogonal: they can be applied together to provide better scaling in terms of sequence length and in sequence depth. Both methods partition the problem such that many of the sub-problems can be solved independently. The results are then combined to form a solution to the full alignment problem.</p> <p>Results</p> <p>Crumble and Prune each provide a significant performance improvement with little loss of accuracy. In some cases, a gain in accuracy was observed. Crumble and Prune were tested on real and simulated data. Furthermore, we have implemented a system called Job-tree that allows hierarchical sub-problems to be solved in parallel on a compute cluster, significantly shortening the run-time.</p> <p>Conclusions</p> <p>These methods enabled us to solve gigabase alignment problems. These methods could enable a new generation of biologically realistic alignment algorithms to be applied to real world, large scale alignment problems.</p

    Identifying a High Fraction of the Human Genome to be under Selective Constraint Using GERP++

    Get PDF
    Computational efforts to identify functional elements within genomes leverage comparative sequence information by looking for regions that exhibit evidence of selective constraint. One way of detecting constrained elements is to follow a bottom-up approach by computing constraint scores for individual positions of a multiple alignment and then defining constrained elements as segments of contiguous, highly scoring nucleotide positions. Here we present GERP++, a new tool that uses maximum likelihood evolutionary rate estimation for position-specific scoring and, in contrast to previous bottom-up methods, a novel dynamic programming approach to subsequently define constrained elements. GERP++ evaluates a richer set of candidate element breakpoints and ranks them based on statistical significance, eliminating the need for biased heuristic extension techniques. Using GERP++ we identify over 1.3 million constrained elements spanning over 7% of the human genome. We predict a higher fraction than earlier estimates largely due to the annotation of longer constrained elements, which improves one to one correspondence between predicted elements with known functional sequences. GERP++ is an efficient and effective tool to provide both nucleotide- and element-level constraint scores within deep multiple sequence alignments

    Brown-Vialetto-Van Laere and Fazio Londe syndrome is associated with a riboflavin transporter defect mimicking mild MADD: a new inborn error of metabolism with potential treatment

    Get PDF
    We report on three patients (two siblings and one unrelated) presenting in infancy with progressive muscle weakness and paralysis of the diaphragm. Metabolic studies revealed a profile of plasma acylcarnitines and urine organic acids suggestive of a mild form of the multiple acyl-CoA dehydrogenation defect (MADD, ethylmalonic/adipic acid syndrome). Subsequently, a profound flavin deficiency in spite of a normal dietary riboflavin intake was established in the plasma of all three children, suggesting a riboflavin transporter defect. Genetic analysis of these patients demonstrated mutations in the C20orf54 gene which encodes the human homolog of a rat riboflavin transporter. This gene was recently implicated in the Brown-Vialetto-Van Laere syndrome, a rare neurological disorder which may either present in infancy with neurological deterioration with hypotonia, respiratory insufficiency and early death, or later in life with deafness and progressive ponto-bulbar palsy. Supplementation of riboflavin rapidly improved the clinical symptoms as well as the biochemical abnormalities in our patients, demonstrating that high dose riboflavin is a potential treatment for the Brown-Vialetto-Van Laere syndrome as well as for the Fazio Londe syndrome which is considered to be the same disease entity without the deafnes

    Dynamic simulations on the mitochondrial fatty acid Beta-oxidation network

    Get PDF
    <p>Abstract</p> <p>Background</p> <p>The oxidation of fatty acids in mitochondria plays an important role in energy metabolism and genetic disorders of this pathway may cause metabolic diseases. Enzyme deficiencies can block the metabolism at defined reactions in the mitochondrion and lead to accumulation of specific substrates causing severe clinical manifestations. Ten of the disorders directly affecting mitochondrial fatty acid oxidation have been well-defined, implicating episodic hypoketotic hypoglycemia provoked by catabolic stress, multiple organ failure, muscle weakness, or hypertrophic cardiomyopathy. Additionally, syndromes of severe maternal illness (HELLP syndrome and AFLP) have been associated with pregnancies carrying a fetus affected by fatty acid oxidation deficiencies. However, little is known about fatty acids kinetics, especially during fasting or exercise when the demand for fatty acid oxidation is increased (catabolic stress).</p> <p>Results</p> <p>A computational kinetic network of 64 reactions with 91 compounds and 301 parameters was constructed to study dynamic properties of mitochondrial fatty acid β-oxidation. Various deficiencies of acyl-CoA dehydrogenase were simulated and verified with measured concentrations of indicative metabolites of screened newborns in Middle Europe and South Australia. The simulated accumulation of specific acyl-CoAs according to the investigated enzyme deficiencies are in agreement with experimental data and findings in literature. Investigation of the dynamic properties of the fatty acid β-oxidation reveals that the formation of acetyl-CoA – substrate for energy production – is highly impaired within the first hours of fasting corresponding to the rapid progress to coma within 1–2 hours. LCAD deficiency exhibits the highest accumulation of fatty acids along with marked increase of these substrates during catabolic stress and the lowest production rate of acetyl-CoA. These findings might confirm gestational loss to be the explanation that no human cases of LCAD deficiency have been described.</p> <p>Conclusion</p> <p>In summary, this work provides a detailed kinetic model of mitochondrial metabolism with specific focus on fatty acid β-oxidation to simulate and predict the dynamic response of that metabolic network in the context of human disease. Our findings offer insight into the disease process (e.g. rapid progress to coma) and might confirm new explanations (no human cases of LCAD deficiency), which can hardly be obtained from experimental data alone.</p

    Dual DNA Methylation Patterns in the CNS Reveal Developmentally Poised Chromatin and Monoallelic Expression of Critical Genes

    Get PDF
    As a first step towards discovery of genes expressed from only one allele in the CNS, we used a tiling array assay for DNA sequences that are both methylated and unmethylated (the MAUD assay). We analyzed regulatory regions of the entire mouse brain transcriptome, and found that approximately 10% of the genes assayed showed dual DNA methylation patterns. They include a large subset of genes that display marks of both active and silent, i.e., poised, chromatin during development, consistent with a link between differential DNA methylation and lineage-specific differentiation within the CNS. Sixty-five of the MAUD hits and 57 other genes whose function is of relevance to CNS development and/or disorders were tested for allele-specific expression in F1 hybrid clonal neural stem cell (NSC) lines. Eight MAUD hits and one additional gene showed such expression. They include Lgi1, which causes a subtype of inherited epilepsy that displays autosomal dominance with incomplete penetrance; Gfra2, a receptor for glial cell line-derived neurotrophic factor GDNF that has been linked to kindling epilepsy; Unc5a, a netrin-1 receptor important in neurodevelopment; and Cspg4, a membrane chondroitin sulfate proteoglycan associated with malignant melanoma and astrocytoma in human. Three of the genes, Camk2a, Kcnc4, and Unc5a, show preferential expression of the same allele in all clonal NSC lines tested. The other six genes show a stochastic pattern of monoallelic expression in some NSC lines and bi-allelic expression in others. These results support the estimate that 1–2% of genes expressed in the CNS may be subject to allelic exclusion, and demonstrate that the group includes genes implicated in major disorders of the CNS as well as neurodevelopment

    Genome-Wide Association between Branch Point Properties and Alternative Splicing

    Get PDF
    The branch point (BP) is one of the three obligatory signals required for pre-mRNA splicing. In mammals, the degeneracy of the motif combined with the lack of a large set of experimentally verified BPs complicates the task of modeling it in silico, and therefore of predicting the location of natural BPs. Consequently, BPs have been disregarded in a considerable fraction of the genome-wide studies on the regulation of splicing in mammals. We present a new computational approach for mammalian BP prediction. Using sequence conservation and positional bias we obtained a set of motifs with good agreement with U2 snRNA binding stability. Using a Support Vector Machine algorithm, we created a model complemented with polypyrimidine tract features, which considerably improves the prediction accuracy over previously published methods. Applying our algorithm to human introns, we show that BP position is highly dependent on the presence of AG dinucleotides in the 3′ end of introns, with distance to the 3′ splice site and BP strength strongly correlating with alternative splicing. Furthermore, experimental BP mapping for five exons preceded by long AG-dinucleotide exclusion zones revealed that, for a given intron, more than one BP can be chosen throughout the course of splicing. Finally, the comparison between exons of different evolutionary ages and pseudo exons suggests a key role of the BP in the pathway of exon creation in human. Our computational and experimental analyses suggest that BP recognition is more flexible than previously assumed, and it appears highly dependent on the presence of downstream polypyrimidine tracts. The reported association between BP features and the splicing outcome suggests that this, so far disregarded but yet crucial, element buries information that can complement current acceptor site models

    Silencing, Positive Selection and Parallel Evolution: Busy History of Primate Cytochromes c

    Get PDF
    Cytochrome c (cyt c) participates in two crucial cellular processes, energy production and apoptosis, and unsurprisingly is a highly conserved protein. However, previous studies have reported for the primate lineage (i) loss of the paralogous testis isoform, (ii) an acceleration and then a deceleration of the amino acid replacement rate of the cyt c somatic isoform, and (iii) atypical biochemical behavior of human cyt c. To gain insight into the cause of these major evolutionary events, we have retraced the history of cyt c loci among primates. For testis cyt c, all primate sequences examined carry the same nonsense mutation, which suggests that silencing occurred before the primates diversified. For somatic cyt c, maximum parsimony, maximum likelihood, and Bayesian phylogenetic analyses yielded the same tree topology. The evolutionary analyses show that a fast accumulation of non-synonymous mutations (suggesting positive selection) occurred specifically on the anthropoid lineage root and then continued in parallel on the early catarrhini and platyrrhini stems. Analysis of evolutionary changes using the 3D structure suggests they are focused on the respiratory chain rather than on apoptosis or other cyt c functions. In agreement with previous biochemical studies, our results suggest that silencing of the cyt c testis isoform could be linked with the decrease of primate reproduction rate. Finally, the evolution of cyt c in the two sister anthropoid groups leads us to propose that somatic cyt c evolution may be related both to COX evolution and to the convergent brain and body mass enlargement in these two anthropoid clades

    Integrated Expression Profiling and Genome-Wide Analysis of ChREBP Targets Reveals the Dual Role for ChREBP in Glucose-Regulated Gene Expression

    Get PDF
    The carbohydrate response element binding protein (ChREBP), a basic helix-loop-helix/leucine zipper transcription factor, plays a critical role in the control of lipogenesis in the liver. To identify the direct targets of ChREBP on a genome-wide scale and provide more insight into the mechanism by which ChREBP regulates glucose-responsive gene expression, we performed chromatin immunoprecipitation-sequencing and gene expression analysis. We identified 1153 ChREBP binding sites and 783 target genes using the chromatin from HepG2, a human hepatocellular carcinoma cell line. A motif search revealed a refined consensus sequence (CABGTG-nnCnG-nGnSTG) to better represent critical elements of a functional ChREBP binding sequence. Gene ontology analysis shows that ChREBP target genes are particularly associated with lipid, fatty acid and steroid metabolism. In addition, other functional gene clusters related to transport, development and cell motility are significantly enriched. Gene set enrichment analysis reveals that ChREBP target genes are highly correlated with genes regulated by high glucose, providing a functional relevance to the genome-wide binding study. Furthermore, we have demonstrated that ChREBP may function as a transcriptional repressor as well as an activator

    Linking the Epigenome to the Genome: Correlation of Different Features to DNA Methylation of CpG Islands

    Get PDF
    DNA methylation of CpG islands plays a crucial role in the regulation of gene expression. More than half of all human promoters contain CpG islands with a tissue-specific methylation pattern in differentiated cells. Still today, the whole process of how DNA methyltransferases determine which region should be methylated is not completely revealed. There are many hypotheses of which genomic features are correlated to the epigenome that have not yet been evaluated. Furthermore, many explorative approaches of measuring DNA methylation are limited to a subset of the genome and thus, cannot be employed, e.g., for genome-wide biomarker prediction methods. In this study, we evaluated the correlation of genetic, epigenetic and hypothesis-driven features to DNA methylation of CpG islands. To this end, various binary classifiers were trained and evaluated by cross-validation on a dataset comprising DNA methylation data for 190 CpG islands in HEPG2, HEK293, fibroblasts and leukocytes. We achieved an accuracy of up to 91% with an MCC of 0.8 using ten-fold cross-validation and ten repetitions. With these models, we extended the existing dataset to the whole genome and thus, predicted the methylation landscape for the given cell types. The method used for these predictions is also validated on another external whole-genome dataset. Our results reveal features correlated to DNA methylation and confirm or disprove various hypotheses of DNA methylation related features. This study confirms correlations between DNA methylation and histone modifications, DNA structure, DNA sequence, genomic attributes and CpG island properties. Furthermore, the method has been validated on a genome-wide dataset from the ENCODE consortium. The developed software, as well as the predicted datasets and a web-service to compare methylation states of CpG islands are available at http://www.cogsys.cs.uni-tuebingen.de/software/dna-methylation/
    corecore