2,487 research outputs found

    Shape-based peak identification for ChIP-Seq

    Get PDF
    We present a new algorithm for the identification of bound regions from ChIP-seq experiments. Our method for identifying statistically significant peaks from read coverage is inspired by the notion of persistence in topological data analysis and provides a non-parametric approach that is robust to noise in experiments. Specifically, our method reduces the peak calling problem to the study of tree-based statistics derived from the data. We demonstrate the accuracy of our method on existing datasets, and we show that it can discover previously missed regions and can more clearly discriminate between multiple binding events. The software T-PIC (Tree shape Peak Identification for ChIP-Seq) is available at http://math.berkeley.edu/~vhower/tpic.htmlComment: 12 pages, 6 figure

    Developing and applying heterogeneous phylogenetic models with XRate

    Get PDF
    Modeling sequence evolution on phylogenetic trees is a useful technique in computational biology. Especially powerful are models which take account of the heterogeneous nature of sequence evolution according to the "grammar" of the encoded gene features. However, beyond a modest level of model complexity, manual coding of models becomes prohibitively labor-intensive. We demonstrate, via a set of case studies, the new built-in model-prototyping capabilities of XRate (macros and Scheme extensions). These features allow rapid implementation of phylogenetic models which would have previously been far more labor-intensive. XRate's new capabilities for lineage-specific models, ancestral sequence reconstruction, and improved annotation output are also discussed. XRate's flexible model-specification capabilities and computational efficiency make it well-suited to developing and prototyping phylogenetic grammar models. XRate is available as part of the DART software package: http://biowiki.org/DART .Comment: 34 pages, 3 figures, glossary of XRate model terminolog

    GIVE: portable genome browsers for personal websites.

    Get PDF
    Growing popularity and diversity of genomic data demand portable and versatile genome browsers. Here, we present an open source programming library called GIVE that facilitates the creation of personalized genome browsers without requiring a system administrator. By inserting HTML tags, one can add to a personal webpage interactive visualization of multiple types of genomics data, including genome annotation, "linear" quantitative data, and genome interaction data. GIVE includes a graphical interface called HUG (HTML Universal Generator) that automatically generates HTML code for displaying user chosen data, which can be copy-pasted into user's personal website or saved and shared with collaborators. GIVE is available at: https://www.givengine.org/

    Establishment of a normal medakafish spermatogonial cell line capable of sperm production in vitro

    Get PDF
    Spermatogonia are the male germ stem cells that continuously produce sperm for the next generation. Spermatogenesis is a complicated process that proceeds through mitotic phase of stem cell renewal and differentiation, meiotic phase, and postmeiotic phase of spermiogenesis. Full recapitulation of spermatogenesis in vitro has been impossible, as generation of normal spermatogonial stem cell lines without immortalization and production of motile sperm from these cells after long-term culture have not been achieved. Here we report the derivation of a normal spermatogonial cell line from a mature medakafish testis without immortalization. After 140 passages during 2 years of culture, this cell line retains stable but growth factor-dependent proliferation, a diploid karyotype, and the phenotype and gene expression pattern of spermatogonial stem cells. Furthermore, we show that this cell line can undergo meiosis and spermiogenesis to generate motile sperm. Therefore, the ability of continuous proliferation and sperm production in culture is an intrinsic property of medaka spermatogonial stem cells, and immortalization apparently is not necessary to derive male germ cell cultures. Our findings and cell line will offer a unique opportunity to study and recapitulate spermatogenesis in vitro and to develop approaches for germ-line transmission.Spermatogonia are the male germ stem cells that continuously produce sperm for the next generation. Spermatogenesis is a complicated process that proceeds through mitotic phase of stem cell renewal and differentiation, meiotic phase, and postmeiotic phase of spermiogenesis. Full recapitulation of spermatogenesis in vitro has been impossible, as generation of normal spermatogonial stem cell lines without immortalization and production of motile sperm from these cells after long-term culture have not been achieved. Here we report the derivation of a normal spermatogonial cell line from a mature medakafish testis without immortalization. After 140 passages during 2 years of culture, this cell line retains stable but growth factor-dependent proliferation, a diploid karyotype, and the phenotype and gene expression pattern of spermatogonial stem cells. Furthermore, we show that this cell line can undergo meiosis and spermiogenesis to generate motile sperm. Therefore, the ability of continuous proliferation and sperm production in culture is an intrinsic property of medaka spermatogonial stem cells, and immortalization apparently is not necessary to derive male germ cell cultures. Our findings and cell line will offer a unique opportunity to study and recapitulate spermatogenesis in vitro and to develop approaches for germ-line transmission

    Caenorhabditis elegans Operons Contain a Higher Proportion of Genes with Multiple Transcripts and Use 3′ Splice Sites Differentially

    Get PDF
    RNA splicing generates multiple transcript isoforms from a single gene and enhances the complexity of eukaryotic gene expression. In some eukaryotes, operon exists as an ancient regulatory mechanism of gene expression that requires strict positional and regulatory relationships among its genes. It remains unknown whether operonic genes generate transcript isoforms in a similar manner as non-operonic genes do, the expression of which is less likely limited by their positions and relationships with surrounding genes. We analyzed the number of transcript isoforms of Caenorhabditis elegans operonic genes and found that C. elegans operons contain a much higher proportion of genes with multiple transcript isoforms than non-operonic genes do. For genes that express multiple transcript isoforms, there is no apparent difference between the number of isoforms in operonic and non-operonic genes. C. elegans operonic genes also have a different preference of the 20 most common 3′ splice sites compared to non-operonic genes. Our analyses suggest that C. elegans operons enhance expression complexity by increasing the proportion of genes that express multiple transcript isoforms and maintain splicing efficiency by differential use of common 3′ splice sites

    Multi-level evidence of an allelic hierarchy of USH2A variants in hearing, auditory processing and speech/language outcomes.

    Get PDF
    Language development builds upon a complex network of interacting subservient systems. It therefore follows that variations in, and subclinical disruptions of, these systems may have secondary effects on emergent language. In this paper, we consider the relationship between genetic variants, hearing, auditory processing and language development. We employ whole genome sequencing in a discovery family to target association and gene x environment interaction analyses in two large population cohorts; the Avon Longitudinal Study of Parents and Children (ALSPAC) and UK10K. These investigations indicate that USH2A variants are associated with altered low-frequency sound perception which, in turn, increases the risk of developmental language disorder. We further show that Ush2a heterozygote mice have low-level hearing impairments, persistent higher-order acoustic processing deficits and altered vocalizations. These findings provide new insights into the complexity of genetic mechanisms serving language development and disorders and the relationships between developmental auditory and neural systems

    Low-Bandwidth and Non-Compute Intensive Remote Identification of Microbes from Raw Sequencing Reads

    Get PDF
    Cheap high-throughput DNA sequencing may soon become routine not only for human genomes but also for practically anything requiring the identification of living organisms from their DNA: tracking of infectious agents, control of food products, bioreactors, or environmental samples. We propose a novel general approach to the analysis of sequencing data in which the reference genome does not have to be specified. Using a distributed architecture we are able to query a remote server for hints about what the reference might be, transferring a relatively small amount of data, and the hints can be used for more computationally-demanding work. Our system consists of a server with known reference DNA indexed, and a client with raw sequencing reads. The client sends a sample of unidentified reads, and in return receives a list of matching references known to the server. Sequences for the references can be retrieved and used for exhaustive computation on the reads, such as alignment. To demonstrate this approach we have implemented a web server, indexing tens of thousands of publicly available genomes and genomic regions from various organisms and returning lists of matching hits from query sequencing reads. We have also implemented two clients, one of them running in a web browser, in order to demonstrate that gigabytes of raw sequencing reads of unknown origin could be identified without the need to transfer a very large volume of data, and on modestly powered computing devices. A web access is available at http://tapir.cbs.dtu.dk. The source code for a python command-line client, a server, and supplementary data is available at http://bit.ly/1aURxkc

    Perturbation with Intrabodies Reveals That Calpain Cleavage Is Required for Degradation of Huntingtin Exon 1

    Get PDF
    Background: Proteolytic processing of mutant huntingtin (mHtt), the protein that causes Huntington's disease (HD), is critical for mHtt toxicity and disease progression. mHtt contains several caspase and calpain cleavage sites that generate N-terminal fragments that are more toxic than full-length mHtt. Further processing is then required for the degradation of these fragments, which in turn, reduces toxicity. This unknown, secondary degradative process represents a promising therapeutic target for HD. Methodology/Principal Findings: We have used intrabodies, intracellularly expressed antibody fragments, to gain insight into the mechanism of mutant huntingtin exon 1 (mHDx-1) clearance. Happ1, an intrabody recognizing the proline-rich region of mHDx-1, reduces the level of soluble mHDx-1 by increasing clearance. While proteasome and macroautophagy inhibitors reduce turnover of mHDx-1, Happ1 is still able to reduce mHDx-1 under these conditions, indicating Happ1-accelerated mHDx-1 clearance does not rely on these processes. In contrast, a calpain inhibitor or an inhibitor of lysosomal pH block Happ1-mediated acceleration of mHDx-1 clearance. These results suggest that mHDx-1 is cleaved by calpain, likely followed by lysosomal degradation and this process regulates the turnover rate of mHDx-1. Sequence analysis identifies amino acid (AA) 15 as a potential calpain cleavage site. Calpain cleavage of recombinant mHDx-1 in vitro yields fragments of sizes corresponding to this prediction. Moreover, when the site is blocked by binding of another intrabody, V_L12.3, turnover of soluble mHDx-1 in living cells is blocked. Conclusions/Significance: These results indicate that calpain-mediated removal of the 15 N-terminal AAs is required for the degradation of mHDx-1, a finding that may have therapeutic implications

    Clustering exact matches of pairwise sequence alignments by weighted linear regression

    Get PDF
    <p>Abstract</p> <p>Background</p> <p>At intermediate stages of genome assembly projects, when a number of contigs have been generated and their validity needs to be verified, it is desirable to align these contigs to a reference genome when it is available. The interest is not to analyze a detailed alignment between a contig and the reference genome at the base level, but rather to have a rough estimate of where the contig aligns to the reference genome, specifically, by identifying the starting and ending positions of such a region. This information is very useful in ordering the contigs, facilitating post-assembly analysis such as gap closure and resolving repeats. There exist programs, such as BLAST and MUMmer, that can quickly align and identify high similarity segments between two sequences, which, when seen in a dot plot, tend to agglomerate along a diagonal but can also be disrupted by gaps or shifted away from the main diagonal due to mismatches between the contig and the reference. It is a tedious and practically impossible task to visually inspect the dot plot to identify the regions covered by a large number of contigs from sequence assembly projects. A forced global alignment between a contig and the reference is not only time consuming but often meaningless.</p> <p>Results</p> <p>We have developed an algorithm that uses the coordinates of all the exact matches or high similarity local alignments, clusters them with respect to the main diagonal in the dot plot using a weighted linear regression technique, and identifies the starting and ending coordinates of the region of interest.</p> <p>Conclusion</p> <p>This algorithm complements existing pairwise sequence alignment packages by replacing the time-consuming seed extension phase with a weighted linear regression for the alignment seeds. It was experimentally shown that the gain in execution time can be outstanding without compromising the accuracy. This method should be of great utility to sequence assembly and genome comparison projects.</p

    Features of mammalian microRNA promoters emerge from polymerase II chromatin immunoprecipitation data

    Get PDF
    Background: MicroRNAs (miRNAs) are short, non-coding RNA regulators of protein coding genes. miRNAs play a very important role in diverse biological processes and various diseases. Many algorithms are able to predict miRNA genes and their targets, but their transcription regulation is still under investigation. It is generally believed that intragenic miRNAs (located in introns or exons of protein coding genes) are co-transcribed with their host genes and most intergenic miRNAs transcribed from their own RNA polymerase II (Pol II) promoter. However, the length of the primary transcripts and promoter organization is currently unknown. Methodology: We performed Pol II chromatin immunoprecipitation (ChIP)-chip using a custom array surrounding regions of known miRNA genes. To identify the true core transcription start sites of the miRNA genes we developed a new tool (CPPP). We showed that miRNA genes can be transcribed from promoters located several kilobases away and that their promoters share the same general features as those of protein coding genes. Finally, we found evidence that as many as 26% of the intragenic miRNAs may be transcribed from their own unique promoters. Conclusion: miRNA promoters have similar features to those of protein coding genes, but miRNA transcript organization is more complex. © 2009 Corcoran et al
    corecore