88 research outputs found
Galaxy: a comprehensive approach for supporting accessible, reproducible, and transparent computational research in the life sciences
Increased reliance on computational approaches in the life sciences has revealed grave concerns about how accessible and reproducible computation-reliant results truly are. Galaxy http://usegalaxy.org, an open web-based platform for genomic research, addresses these problems. Galaxy automatically tracks and manages data provenance and provides support for capturing the context and intent of computational methods. Galaxy Pages are interactive, web-based documents that provide users with a medium to communicate a complete computational analysis
Oscillating Evolution of a Mammalian Locus with Overlapping Reading Frames: An XLαs/ALEX Relay
XLαs and ALEX are structurally unrelated mammalian proteins translated from alternative overlapping reading frames of a single transcript. Not only are they encoded by the same locus, but a specific XLαs/ALEX interaction is essential for G-protein signaling in neuroendocrine cells. A disruption of this interaction leads to abnormal human phenotypes, including mental retardation and growth deficiency. The region of overlap between the two reading frames evolves at a remarkable speed: the divergence between human and mouse ALEX polypeptides makes them virtually unalignable. To trace the evolution of this puzzling locus, we sequenced it in apes, Old World monkeys, and a New World monkey. We show that the overlap between the two reading frames and the physical interaction between the two proteins force the locus to evolve in an unprecedented way. Namely, to maintain two overlapping protein-coding regions the locus is forced to have high GC content, which significantly elevates its intrinsic evolutionary rate. However, the two encoded proteins cannot afford to change too quickly relative to each other as this may impair their interaction and lead to severe physiological consequences. As a result XLαs and ALEX evolve in an oscillating fashion constantly balancing the rates of amino acid replacements. This is the first example of a rapidly evolving locus encoding interacting proteins via overlapping reading frames, with a possible link to the origin of species-specific neurological differences
Rapid and asymmetric divergence of duplicate genes in the human gene coexpression network
BACKGROUND: While gene duplication is known to be one of the most common mechanisms of genome evolution, the fates of genes after duplication are still being debated. In particular, it is presently unknown whether most duplicate genes preserve (or subdivide) the functions of the parental gene or acquire new functions. One aspect of gene function, that is the expression profile in gene coexpression network, has been largely unexplored for duplicate genes. RESULTS: Here we build a human gene coexpression network using human tissue-specific microarray data and investigate the divergence of duplicate genes in it. The topology of this network is scale-free. Interestingly, our analysis indicates that duplicate genes rapidly lose shared coexpressed partners: after approximately 50 million years since duplication, the two duplicate genes in a pair have only slightly higher number of shared partners as compared with two random singletons. We also show that duplicate gene pairs quickly acquire new coexpressed partners: the average number of partners for a duplicate gene pair is significantly greater than that for a singleton (the latter number can be used as a proxy of the number of partners for a parental singleton gene before duplication). The divergence in gene expression between two duplicates in a pair occurs asymmetrically: one gene usually has more partners than the other one. The network is resilient to both random and degree-based in silico removal of either singletons or duplicate genes. In contrast, the network is especially vulnerable to the removal of highly connected genes when duplicate genes and singletons are considered together. CONCLUSION: Duplicate genes rapidly diverge in their expression profiles in the network and play similar role in maintaining the network robustness as compared with singletons. Contact: [email protected] Supplementary information: Please see additional files
Web-based visual analysis for high-throughput genomics
BACKGROUND: Visualization plays an essential role in genomics research by making it possible to observe correlations and trends in large datasets as well as communicate findings to others. Visual analysis, which combines visualization with analysis tools to enable seamless use of both approaches for scientific investigation, offers a powerful method for performing complex genomic analyses. However, there are numerous challenges that arise when creating rich, interactive Web-based visualizations/visual analysis applications for high-throughput genomics. These challenges include managing data flow from Web server to Web browser, integrating analysis tools and visualizations, and sharing visualizations with colleagues. RESULTS: We have created a platform simplifies the creation of Web-based visualization/visual analysis applications for high-throughput genomics. This platform provides components that make it simple to efficiently query very large datasets, draw common representations of genomic data, integrate with analysis tools, and share or publish fully interactive visualizations. Using this platform, we have created a Circos-style genome-wide viewer, a generic scatter plot for correlation analysis, an interactive phylogenetic tree, a scalable genome browser for next-generation sequencing data, and an application for systematically exploring tool parameter spaces to find good parameter values. All visualizations are interactive and fully customizable. The platform is integrated with the Galaxy (http://galaxyproject.org) genomics workbench, making it easy to integrate new visual applications into Galaxy. CONCLUSIONS: Visualization and visual analysis play an important role in high-throughput genomics experiments, and approaches are needed to make it easier to create applications for these activities. Our framework provides a foundation for creating Web-based visualizations and integrating them into Galaxy. Finally, the visualizations we have created using the framework are useful tools for high-throughput genomics experiments
Recommended from our members
Bottleneck and selection in the germline and maternal age influence transmission of mitochondrial DNA in human pedigrees.
Heteroplasmy-the presence of multiple mitochondrial DNA (mtDNA) haplotypes in an individual-can lead to numerous mitochondrial diseases. The presentation of such diseases depends on the frequency of the heteroplasmic variant in tissues, which, in turn, depends on the dynamics of mtDNA transmissions during germline and somatic development. Thus, understanding and predicting these dynamics between generations and within individuals is medically relevant. Here, we study patterns of heteroplasmy in 2 tissues from each of 345 humans in 96 multigenerational families, each with, at least, 2 siblings (a total of 249 mother-child transmissions). This experimental design has allowed us to estimate the timing of mtDNA mutations, drift, and selection with unprecedented precision. Our results are remarkably concordant between 2 complementary population-genetic approaches. We find evidence for a severe germline bottleneck (7-10 mtDNA segregating units) that occurs independently in different oocyte lineages from the same mother, while somatic bottlenecks are less severe. We demonstrate that divergence between mother and offspring increases with the mother's age at childbirth, likely due to continued drift of heteroplasmy frequencies in oocytes under meiotic arrest. We show that this period is also accompanied by mutation accumulation leading to more de novo mutations in children born to older mothers. We show that heteroplasmic variants at intermediate frequencies can segregate for many generations in the human population, despite the strong germline bottleneck. We show that selection acts during germline development to keep the frequency of putatively deleterious variants from rising. Our findings have important applications for clinical genetics and genetic counseling
A First Look at ARFome: Dual-Coding Genes in Mammalian Genomes
Coding of multiple proteins by overlapping reading frames is not a feature one would associate with eukaryotic genes. Indeed, codependency between codons of overlapping protein-coding regions imposes a unique set of evolutionary constraints, making it a costly arrangement. Yet in cases of tightly coexpressed interacting proteins, dual coding may be advantageous. Here we show that although dual coding is nearly impossible by chance, a number of human transcripts contain overlapping coding regions. Using newly developed statistical techniques, we identified 40 candidate genes with evolutionarily conserved overlapping coding regions. Because our approach is conservative, we expect mammals to possess more dual-coding genes. Our results emphasize that the skepticism surrounding eukaryotic dual coding is unwarranted: rather than being artifacts, overlapping reading frames are often hallmarks of fascinating biology
Recommended from our members
Family reunion via error correction: an efficient analysis of duplex sequencing data
Background
Duplex sequencing is the most accurate approach for identification of sequence variants present at very low frequencies. Its power comes from pooling together multiple descendants of both strands of original DNA molecules, which allows distinguishing true nucleotide substitutions from PCR amplification and sequencing artifacts. This strategy comes at a cost—sequencing the same molecule multiple times increases dynamic range but significantly diminishes coverage, making whole genome duplex sequencing prohibitively expensive. Furthermore, every duplex experiment produces a substantial proportion of singleton reads that cannot be used in the analysis and are thrown away.
Results
In this paper we demonstrate that a significant fraction of these reads contains PCR or sequencing errors within duplex tags. Correction of such errors allows “reuniting” these reads with their respective families increasing the output of the method and making it more cost effective.
Conclusions
We combine an error correction strategy with a number of algorithmic improvements in a new version of the duplex analysis software, Du Novo 2.0. It is written in Python, C, AWK, and Bash. It is open source and readily available through Galaxy, Bioconda, and Github: https://github.com/galaxyproject/dunovo
- …