24,476 research outputs found
Progressive Mauve: Multiple alignment of genomes with gene flux and rearrangement
Multiple genome alignment remains a challenging problem. Effects of
recombination including rearrangement, segmental duplication, gain, and loss
can create a mosaic pattern of homology even among closely related organisms.
We describe a method to align two or more genomes that have undergone
large-scale recombination, particularly genomes that have undergone substantial
amounts of gene gain and loss (gene flux). The method utilizes a novel
alignment objective score, referred to as a sum-of-pairs breakpoint score. We
also apply a probabilistic alignment filtering method to remove erroneous
alignments of unrelated sequences, which are commonly observed in other genome
alignment methods. We describe new metrics for quantifying genome alignment
accuracy which measure the quality of rearrangement breakpoint predictions and
indel predictions. The progressive genome alignment algorithm demonstrates
markedly improved accuracy over previous approaches in situations where genomes
have undergone realistic amounts of genome rearrangement, gene gain, loss, and
duplication. We apply the progressive genome alignment algorithm to a set of 23
completely sequenced genomes from the genera Escherichia, Shigella, and
Salmonella. The 23 enterobacteria have an estimated 2.46Mbp of genomic content
conserved among all taxa and total unique content of 15.2Mbp. We document
substantial population-level variability among these organisms driven by
homologous recombination, gene gain, and gene loss. Free, open-source software
implementing the described genome alignment approach is available from
http://gel.ahabs.wisc.edu/mauve .Comment: Revision dated June 19, 200
Back-translation for discovering distant protein homologies
Frameshift mutations in protein-coding DNA sequences produce a drastic change
in the resulting protein sequence, which prevents classic protein alignment
methods from revealing the proteins' common origin. Moreover, when a large
number of substitutions are additionally involved in the divergence, the
homology detection becomes difficult even at the DNA level. To cope with this
situation, we propose a novel method to infer distant homology relations of two
proteins, that accounts for frameshift and point mutations that may have
affected the coding sequences. We design a dynamic programming alignment
algorithm over memory-efficient graph representations of the complete set of
putative DNA sequences of each protein, with the goal of determining the two
putative DNA sequences which have the best scoring alignment under a powerful
scoring system designed to reflect the most probable evolutionary process. This
allows us to uncover evolutionary information that is not captured by
traditional alignment methods, which is confirmed by biologically significant
examples.Comment: The 9th International Workshop in Algorithms in Bioinformatics
(WABI), Philadelphia : \'Etats-Unis d'Am\'erique (2009
Computational Identification of Four Spliceosomal snRNAs from the Deep-Branching Eukaryote Giardia intestinalis
Funding: Marsden Fund New Zealand Allan Wilson Centre The funders had no role in study design, data collection and analysis, decision to publish, or
preparation of the manuscript.RNAs processing other RNAs is very general in eukaryotes, but is not clear to what extent it is ancestral to eukaryotes. Here
we focus on pre-mRNA splicing, one of the most important RNA-processing mechanisms in eukaryotes. In most eukaryotes
splicing is predominantly catalysed by the major spliceosome complex, which consists of five uridine-rich small nuclear
RNAs (U-snRNAs) and over 200 proteins in humans. Three major spliceosomal introns have been found experimentally in
Giardia; one Giardia U-snRNA (U5) and a number of spliceosomal proteins have also been identified. However, because of
the low sequence similarity between the Giardia ncRNAs and those of other eukaryotes, the other U-snRNAs of Giardia had
not been found. Using two computational methods, candidates for Giardia U1, U2, U4 and U6 snRNAs were identified in this
study and shown by RT-PCR to be expressed. We found that identifying a U2 candidate helped identify U6 and U4 based on
interactions between them. Secondary structural modelling of the Giardia U-snRNA candidates revealed typical features of
eukaryotic U-snRNAs. We demonstrate a successful approach to combine computational and experimental methods to
identify expected ncRNAs in a highly divergent protist genome. Our findings reinforce the conclusion that spliceosomal
small-nuclear RNAs existed in the last common ancestor of eukaryotes
Recommended from our members
Kevlar: A Mapping-Free Framework for Accurate Discovery of De Novo Variants.
De novo genetic variants are an important source of causative variation in complex genetic disorders. Many methods for variant discovery rely on mapping reads to a reference genome, detecting numerous inherited variants irrelevant to the phenotype of interest. To distinguish between inherited and de novo variation, sequencing of families (parents and siblings) is commonly pursued. However, standard mapping-based approaches tend to have a high false-discovery rate for de novo variant prediction. Kevlar is a mapping-free method for de novo variant discovery, based on direct comparison of sequences between related individuals. Kevlar identifies high-abundance k-mers unique to the individual of interest. Reads containing these k-mers are partitioned into disjoint sets by shared k-mer content for variant calling, and preliminary variant predictions are sorted using a probabilistic score. We evaluated Kevlar on simulated and real datasets, demonstrating its ability to detect both de novo single-nucleotide variants and indels with high accuracy
Species-level functional profiling of metagenomes and metatranscriptomes.
Functional profiles of microbial communities are typically generated using comprehensive metagenomic or metatranscriptomic sequence read searches, which are time-consuming, prone to spurious mapping, and often limited to community-level quantification. We developed HUMAnN2, a tiered search strategy that enables fast, accurate, and species-resolved functional profiling of host-associated and environmental communities. HUMAnN2 identifies a community's known species, aligns reads to their pangenomes, performs translated search on unclassified reads, and finally quantifies gene families and pathways. Relative to pure translated search, HUMAnN2 is faster and produces more accurate gene family profiles. We applied HUMAnN2 to study clinal variation in marine metabolism, ecological contribution patterns among human microbiome pathways, variation in species' genomic versus transcriptional contributions, and strain profiling. Further, we introduce 'contributional diversity' to explain patterns of ecological assembly across different microbial community types
- …