155 research outputs found
Gene finding in novel genomes
BACKGROUND: Computational gene prediction continues to be an important problem, especially for genomes with little experimental data. RESULTS: I introduce the SNAP gene finder which has been designed to be easily adaptable to a variety of genomes. In novel genomes without an appropriate gene finder, I demonstrate that employing a foreign gene finder can produce highly inaccurate results, and that the most compatible parameters may not come from the nearest phylogenetic neighbor. I find that foreign gene finders are more usefully employed to bootstrap parameter estimation and that the resulting parameters can be highly accurate. CONCLUSION: Since gene prediction is sensitive to species-specific parameters, every genome needs a dedicated gene finder
Bind-n-Seq: high-throughput analysis of in vitro protein-DNA interactions using massively parallel sequencing.
Transcription factor-DNA interactions are some of the most important processes in biology because they directly control hereditary information. The targets of most transcription factor are unknown. In this report, we introduce Bind-n-Seq, a new high-throughput method for analyzing protein-DNA interactions in vitro, with several advantages over current methods. The procedure has three steps (i) binding proteins to randomized oligonucleotide DNA targets, (ii) sequencing the bound oligonucleotide with massively parallel technology and (iii) finding motifs among the sequences. De novo binding motifs determined by this method for the DNA-binding domains of two well-characterized zinc-finger proteins were similar to those described previously. Furthermore, calculations of the relative affinity of the proteins for specific DNA sequences correlated significantly with previous studies (R(2 )= 0.9). These results present Bind-n-Seq as a highly rapid and parallel method for determining in vitro binding sites and relative affinities
GC skew is a conserved property of unmethylated CpG island promoters across vertebrates.
GC skew is a measure of the strand asymmetry in the distribution of guanines and cytosines. GC skew favors R-loops, a type of three stranded nucleic acid structures that form upon annealing of an RNA strand to one strand of DNA, creating a persistent RNA:DNA hybrid. Previous studies show that GC skew is prevalent at thousands of human CpG island (CGI) promoters and transcription termination regions, which correspond to hotspots of R-loop formation. Here, we investigated the conservation of GC skew patterns in 60 sequenced chordates genomes. We report that GC skew is a conserved sequence characteristic of the CGI promoter class in vertebrates. Furthermore, we reveal that promoter GC skew peaks at the exon 1/ intron1 junction and that it is highly correlated with gene age and CGI promoter strength. Our data also show that GC skew is predictive of unmethylated CGI promoters in a range of vertebrate species and that it imparts significant DNA hypomethylation for promoters with intermediate CpG densities. Finally, we observed that terminal GC skew is conserved for a subset of vertebrate genes that tend to be located significantly closer to their downstream neighbors, consistent with a role for R-loop formation in transcription termination
Evidence for a DNA-Based Mechanism of Intron-Mediated Enhancement
Many introns significantly increase gene expression through a process termed intron-mediated enhancement (IME). Introns exist in the transcribed DNA and the nascent RNA, and could affect expression from either location. To determine which is more relevant to IME, hybrid introns were constructed that contain sequences from stimulating Arabidopsis thaliana introns either in their normal orientation or as the reverse complement. Both ends of each intron are from the non-stimulatory COR15a intron in their normal orientation to allow splicing. The inversions create major alterations to the sequence of the transcribed RNA with relatively minor changes to the DNA structure. Introns containing portions of either the UBQ10 or ATPK1 intron increased expression to a similar degree regardless of orientation. Also, computational predictions of IME improve when both intron strands are considered. These findings are more consistent with models of IME that act at the level of DNA rather than RNA
Assessing the gene space in draft genomes
Genome sequencing projects have been initiated for a wide range of eukaryotes. A few projects have reached completion, but most exist as draft assemblies. As one of the main reasons to sequence a genome is to obtain its catalog of genes, an important question is how complete or completable the catalog is in unfinished genomes. To answer this question, we have identified a set of core eukaryotic genes (CEGs), that are extremely highly conserved and which we believe are present in low copy numbers in higher eukaryotes. From an analysis of a phylogenetically diverse set of eukaryotic genome assemblies, we found that the proportion of CEGs mapped in draft genomes provides a useful metric for describing the gene space, and complements the commonly used N50 length and x-fold coverage values
SAMSA: a comprehensive metatranscriptome analysis pipeline
BackgroundAlthough metatranscriptomics-the study of diverse microbial population activity based on RNA-seq data-is rapidly growing in popularity, there are limited options for biologists to analyze this type of data. Current approaches for processing metatranscriptomes rely on restricted databases and a dedicated computing cluster, or metagenome-based approaches that have not been fully evaluated for processing metatranscriptomic datasets. We created a new bioinformatics pipeline, designed specifically for metatranscriptome dataset analysis, which runs in conjunction with Metagenome-RAST (MG-RAST) servers. Designed for use by researchers with relatively little bioinformatics experience, SAMSA offers a breakdown of metatranscriptome transcription activity levels by organism or transcript function, and is fully open source. We used this new tool to evaluate best practices for sequencing stool metatranscriptomes.ResultsWorking with the MG-RAST annotation server, we constructed the Simple Annotation of Metatranscriptomes by Sequence Analysis (SAMSA) software package, a complete pipeline for the analysis of gut microbiome data. SAMSA can summarize and evaluate raw annotation results, identifying abundant species and significant functional differences between metatranscriptomes. Using pilot data and simulated subsets, we determined experimental requirements for fecal gut metatranscriptomes. Sequences need to be either long reads (longer than 100 bp) or joined paired-end reads. Each sample needs 40-50 million raw sequences, which can be expected to yield the 5-10 million annotated reads necessary for accurate abundance measures. We also demonstrated that ribosomal RNA depletion does not equally deplete ribosomes from all species within a sample, and remaining rRNA sequences should be discarded. Using publicly available metatranscriptome data in which rRNA was not depleted, we were able to demonstrate that overall organism transcriptional activity can be measured using mRNA counts. We were also able to detect significant differences between control and experimental groups in both organism transcriptional activity and specific cellular functions.ConclusionsBy making this new pipeline publicly available, we have created a powerful new tool for metatranscriptomics research, offering a new method for greater insight into the activity of diverse microbial communities. We further recommend that stool metatranscriptomes be ribodepleted and sequenced in a 100 bp paired end format with a minimum of 40 million reads per sample
Myc and Miz-1 have coordinate genomic functions including targeting Hox genes in human embryonic stem cells
<p>Abstract</p> <p>Background</p> <p>A proposed role for Myc in maintaining mouse embryonic stem (ES) cell pluripotency is transcriptional repression of key differentiation-promoting genes, but detail of the mechanism has remained an important open topic.</p> <p>Results</p> <p>To test the hypothesis that the zinc finger protein Miz-1 plays a central role, in the present work we conducted chromatin immunoprecipitation/microarray (ChIP-chip) analysis of Myc and Miz-1 in human ES cells, finding homeobox (<it>Hox</it>) genes as the most significant functional class of Miz-1 direct targets. Miz-1 differentiation-associated target genes specifically lack acetylated lysine 9 and trimethylated lysine 4 of histone H3 (AcH3K9 and H3K4me3) 9 histone marks, consistent with a repressed transcriptional state. Almost 30% of Miz-1 targets are also bound by Myc and these cobound genes are mostly factors that promote differentiation including <it>Hox </it>genes. Knockdown of Myc increased expression of differentiation genes directly bound by Myc and Miz-1, while a subset of the same genes is downregulated by Miz-1 loss-of-function. Myc and Miz-1 proteins interact with each other and associate with several corepressor factors in ES cells, suggesting a mechanism of repression of differentiation genes.</p> <p>Conclusions</p> <p>Taken together our data indicate that Miz-1 and Myc maintain human ES cell pluripotency by coordinately suppressing differentiation genes, particularly <it>Hox </it>genes. These data also support a new model of how Myc and Miz-1 function on chromatin.</p
- …