18,267 research outputs found
Detection of microRNAs in color space
MotivationDeep sequencing provides inexpensive opportunities to characterize the transcriptional diversity of known genomes. The AB SOLiD technology generates millions of short sequencing reads in color-space; that is, the raw data is a sequence of colors, where each color represents 2 nt and each nucleotide is represented by two consecutive colors. This strategy is purported to have several advantages, including increased ability to distinguish sequencing errors from polymorphisms. Several programs have been developed to map short reads to genomes in color space. However, a number of previously unexplored technical issues arise when using SOLiD technology to characterize microRNAs.ResultsHere we explore these technical difficulties. First, since the sequenced reads are longer than the biological sequences, every read is expected to contain linker fragments. The color-calling error rate increases toward the 3(') end of the read such that recognizing the linker sequence for removal becomes problematic. Second, mapping in color space may lead to the loss of the first nucleotide of each read. We propose a sequential trimming and mapping approach to map small RNAs. Using our strategy, we reanalyze three published insect small RNA deep sequencing datasets and characterize 22 new microRNAs.Availability and implementationA bash shell script to perform the sequential trimming and mapping procedure, called SeqTrimMap, is available at: http://www.mirbase.org/tools/seqtrimmap/[email protected] informationSupplementary data are available at Bioinformatics online
Neutral genomic microevolution of a recently emerged pathogen, salmonella enterica serovar agona
Salmonella enterica serovar Agona has caused multiple food-borne outbreaks of gastroenteritis since it was first isolated in
1952. We analyzed the genomes of 73 isolates from global sources, comparing five distinct outbreaks with sporadic
infections as well as food contamination and the environment. Agona consists of three lineages with minimal mutational
diversity: only 846 single nucleotide polymorphisms (SNPs) have accumulated in the non-repetitive, core genome since
Agona evolved in 1932 and subsequently underwent a major population expansion in the 1960s. Homologous
recombination with other serovars of S. enterica imported 42 recombinational tracts (360 kb) in 5/143 nodes within the
genealogy, which resulted in 3,164 additional SNPs. In contrast to this paucity of genetic diversity, Agona is highly diverse
according to pulsed-field gel electrophoresis (PFGE), which is used to assign isolates to outbreaks. PFGE diversity reflects a
highly dynamic accessory genome associated with the gain or loss (indels) of 51 bacteriophages, 10 plasmids, and 6
integrative conjugational elements (ICE/IMEs), but did not correlate uniquely with outbreaks. Unlike the core genome, indels
occurred repeatedly in independent nodes (homoplasies), resulting in inaccurate PFGE genealogies. The accessory genome
contained only few cargo genes relevant to infection, other than antibiotic resistance. Thus, most of the genetic diversity
within this recently emerged pathogen reflects changes in the accessory genome, or is due to recombination, but these
changes seemed to reflect neutral processes rather than Darwinian selection. Each outbreak was caused by an independent
clade, without universal, outbreak-associated genomic features, and none of the variable genes in the pan-genome seemed
to be associated with an ability to cause outbreaks
Recommended from our members
Avoiding chromosome pathology when replication forks collide
This is the author's accepted manuscript. The final published article is available from the link below. Copyright @ 2013 Macmillan Publishers Limited.Chromosome duplication normally initiates through the assembly of replication fork complexes at defined origins1, 2. DNA synthesis by any one fork is thought to cease when it meets another travelling in the opposite direction, at which stage the replication machinery may simply dissociate before the nascent strands are finally ligated. But what actually happens is not clear. Here we present evidence consistent with the idea that every fork collision has the potential to threaten genomic integrity. In Escherichia coli this threat is kept at bay by RecG DNA translocase3 and by single-strand DNA exonucleases. Without RecG, replication initiates where forks meet through a replisome assembly mechanism normally associated with fork repair, replication restart and recombination4, 5, establishing new forks with the potential to sustain cell growth and division without an active origin. This potential is realized when roadblocks to fork progression are reduced or eliminated. It relies on the chromosome being circular, reinforcing the idea that replication initiation is triggered repeatedly by fork collision. The results reported raise the question of whether replication fork collisions have pathogenic potential for organisms that exploit several origins to replicate each chromosome.THe MRC, the Leverhulme Trust, and the BBSRC
Annotation-based genome-wide SNP discovery in the large and complex Aegilops tauschii genome using next-generation sequencing without a reference genome sequence
<p>Abstract</p> <p>Background</p> <p>Many plants have large and complex genomes with an abundance of repeated sequences. Many plants are also polyploid. Both of these attributes typify the genome architecture in the tribe Triticeae, whose members include economically important wheat, rye and barley. Large genome sizes, an abundance of repeated sequences, and polyploidy present challenges to genome-wide SNP discovery using next-generation sequencing (NGS) of total genomic DNA by making alignment and clustering of short reads generated by the NGS platforms difficult, particularly in the absence of a reference genome sequence.</p> <p>Results</p> <p>An annotation-based, genome-wide SNP discovery pipeline is reported using NGS data for large and complex genomes without a reference genome sequence. Roche 454 shotgun reads with low genome coverage of one genotype are annotated in order to distinguish single-copy sequences and repeat junctions from repetitive sequences and sequences shared by paralogous genes. Multiple genome equivalents of shotgun reads of another genotype generated with SOLiD or Solexa are then mapped to the annotated Roche 454 reads to identify putative SNPs. A pipeline program package, AGSNP, was developed and used for genome-wide SNP discovery in <it>Aegilops tauschii-</it>the diploid source of the wheat D genome, and with a genome size of 4.02 Gb, of which 90% is repetitive sequences. Genomic DNA of <it>Ae. tauschii </it>accession AL8/78 was sequenced with the Roche 454 NGS platform. Genomic DNA and cDNA of <it>Ae. tauschii </it>accession AS75 was sequenced primarily with SOLiD, although some Solexa and Roche 454 genomic sequences were also generated. A total of 195,631 putative SNPs were discovered in gene sequences, 155,580 putative SNPs were discovered in uncharacterized single-copy regions, and another 145,907 putative SNPs were discovered in repeat junctions. These SNPs were dispersed across the entire <it>Ae. tauschii </it>genome. To assess the false positive SNP discovery rate, DNA containing putative SNPs was amplified by PCR from AL8/78 and AS75 and resequenced with the ABI 3730 xl. In a sample of 302 randomly selected putative SNPs, 84.0% in gene regions, 88.0% in repeat junctions, and 81.3% in uncharacterized regions were validated.</p> <p>Conclusion</p> <p>An annotation-based genome-wide SNP discovery pipeline for NGS platforms was developed. The pipeline is suitable for SNP discovery in genomic libraries of complex genomes and does not require a reference genome sequence. The pipeline is applicable to all current NGS platforms, provided that at least one such platform generates relatively long reads. The pipeline package, AGSNP, and the discovered 497,118 <it>Ae. tauschii </it>SNPs can be accessed at (<url>http://avena.pw.usda.gov/wheatD/agsnp.shtml</url>).</p
- …