13,739 research outputs found
RasBhari: optimizing spaced seeds for database searching, read mapping and alignment-free sequence comparison
Many algorithms for sequence analysis rely on word matching or word
statistics. Often, these approaches can be improved if binary patterns
representing match and don't-care positions are used as a filter, such that
only those positions of words are considered that correspond to the match
positions of the patterns. The performance of these approaches, however,
depends on the underlying patterns. Herein, we show that the overlap complexity
of a pattern set that was introduced by Ilie and Ilie is closely related to the
variance of the number of matches between two evolutionarily related sequences
with respect to this pattern set. We propose a modified hill-climbing algorithm
to optimize pattern sets for database searching, read mapping and
alignment-free sequence comparison of nucleic-acid sequences; our
implementation of this algorithm is called rasbhari. Depending on the
application at hand, rasbhari can either minimize the overlap complexity of
pattern sets, maximize their sensitivity in database searching or minimize the
variance of the number of pattern-based matches in alignment-free sequence
comparison. We show that, for database searching, rasbhari generates pattern
sets with slightly higher sensitivity than existing approaches. In our Spaced
Words approach to alignment-free sequence comparison, pattern sets calculated
with rasbhari led to more accurate estimates of phylogenetic distances than the
randomly generated pattern sets that we previously used. Finally, we used
rasbhari to generate patterns for short read classification with CLARK-S. Here
too, the sensitivity of the results could be improved, compared to the default
patterns of the program. We integrated rasbhari into Spaced Words; the source
code of rasbhari is freely available at http://rasbhari.gobics.de
Distributions associated with general runs and patterns in hidden Markov models
This paper gives a method for computing distributions associated with
patterns in the state sequence of a hidden Markov model, conditional on
observing all or part of the observation sequence. Probabilities are computed
for very general classes of patterns (competing patterns and generalized later
patterns), and thus, the theory includes as special cases results for a large
class of problems that have wide application. The unobserved state sequence is
assumed to be Markovian with a general order of dependence. An auxiliary Markov
chain is associated with the state sequence and is used to simplify the
computations. Two examples are given to illustrate the use of the methodology.
Whereas the first application is more to illustrate the basic steps in applying
the theory, the second is a more detailed application to DNA sequences, and
shows that the methods can be adapted to include restrictions related to
biological knowledge.Comment: Published in at http://dx.doi.org/10.1214/07-AOAS125 the Annals of
Applied Statistics (http://www.imstat.org/aoas/) by the Institute of
Mathematical Statistics (http://www.imstat.org
Genome-Wide Footprints of Pig Domestication and Selection Revealed through Massive Parallel Sequencing of Pooled DNA
Background Artificial selection has caused rapid evolution in domesticated species. The identification of selection footprints across domesticated genomes can contribute to uncover the genetic basis of phenotypic diversity. Methodology/Main Findings Genome wide footprints of pig domestication and selection were identified using massive parallel sequencing of pooled reduced representation libraries (RRL) representing ~2% of the genome from wild boar and four domestic pig breeds (Large White, Landrace, Duroc and Pietrain) which have been under strong selection for muscle development, growth, behavior and coat color. Using specifically developed statistical methods that account for DNA pooling, low mean sequencing depth, and sequencing errors, we provide genome-wide estimates of nucleotide diversity and genetic differentiation in pig. Widespread signals suggestive of positive and balancing selection were found and the strongest signals were observed in Pietrain, one of the breeds most intensively selected for muscle development. Most signals were population-specific but affected genomic regions which harbored genes for common biological categories including coat color, brain development, muscle development, growth, metabolism, olfaction and immunity. Genetic differentiation in regions harboring genes related to muscle development and growth was higher between breeds than between a given breed and the wild boar. Conclusions/Significance These results, suggest that although domesticated breeds have experienced similar selective pressures, selection has acted upon different genes. This might reflect the multiple domestication events of European breeds or could be the result of subsequent introgression of Asian alleles. Overall, it was estimated that approximately 7% of the porcine genome has been affected by selection events. This study illustrates that the massive parallel sequencing of genomic pools is a cost-effective approach to identify footprints of selection
Recommended from our members
Selection and environmental adaptation along a path to speciation in the Tibetan frog Nanorana parkeri.
Tibetan frogs, Nanorana parkeri, are differentiated genetically but not morphologically along geographical and elevational gradients in a challenging environment, presenting a unique opportunity to investigate processes leading to speciation. Analyses of whole genomes of 63 frogs reveal population structuring and historical demography, characterized by highly restricted gene flow in a narrow geographic zone lying between matrilines West (W) and East (E). A population found only along a single tributary of the Yalu Zangbu River has the mitogenome only of E, whereas nuclear genes of W comprise 89-95% of the nuclear genome. Selection accounts for 579 broadly scattered, highly divergent regions (HDRs) of the genome, which involve 365 genes. These genes fall into 51 gene ontology (GO) functional classes, 14 of which are likely to be important in driving reproductive isolation. GO enrichment analyses of E reveal many overrepresented functional categories associated with adaptation to high elevations, including blood circulation, response to hypoxia, and UV radiation. Four genes, including DNAJC8 in the brain, TNNC1 and ADORA1 in the heart, and LAMB3 in the lung, differ in levels of expression between low- and high-elevation populations. High-altitude adaptation plays an important role in maintaining and driving continuing divergence and reproductive isolation. Use of total genomes enabled recognition of selection and adaptation in and between populations, as well as documentation of evolution along a stepped cline toward speciation
Molecular evolution of candidate male reproductive genes in the brown algal model Ectocarpus
Background: Evolutionary studies of genes that mediate recognition between sperm and egg contribute to our understanding of reproductive isolation and speciation. Surface receptors involved in fertilization are targets of sexual selection, reinforcement, and other evolutionary forces including positive selection. This observation was made across different lineages of the eukaryotic tree from land plants to mammals, and is particularly evident in free-spawning animals. Here we use the brown algal model species Ectocarpus (Phaeophyceae) to investigate the evolution of candidate gamete recognition proteins in a distant major phylogenetic group of eukaryotes.
Results: Male gamete specific genes were identified by comparing transcriptome data covering different stages of the Ectocarpus life cycle and screened for characteristics expected from gamete recognition receptors. Selected genes were sequenced in a representative number of strains from distant geographical locations and varying stages of reproductive isolation, to search for signatures of adaptive evolution. One of the genes (Esi0130_0068) showed evidence of selective pressure. Interestingly, that gene displayed domain similarities to the receptor for egg jelly (REJ) protein involved in sperm-egg recognition in sea urchins.
Conclusions: We have identified a male gamete specific gene with similarity to known gamete recognition receptors and signatures of adaptation. Altogether, this gene could contribute to gamete interaction during reproduction as well as reproductive isolation in Ectocarpus and is therefore a good candidate for further functional evaluation
Recommended from our members
Long-term balancing selection drives evolution of immunity genes in Capsella.
Genetic drift is expected to remove polymorphism from populations over long periods of time, with the rate of polymorphism loss being accelerated when species experience strong reductions in population size. Adaptive forces that maintain genetic variation in populations, or balancing selection, might counteract this process. To understand the extent to which natural selection can drive the retention of genetic diversity, we document genomic variability after two parallel species-wide bottlenecks in the genus Capsella. We find that ancestral variation preferentially persists at immunity related loci, and that the same collection of alleles has been maintained in different lineages that have been separated for several million years. By reconstructing the evolution of the disease-related locus MLO2b, we find that divergence between ancient haplotypes can be obscured by referenced based re-sequencing methods, and that trans-specific alleles can encode substantially diverged protein sequences. Our data point to long-term balancing selection as an important factor shaping the genetics of immune systems in plants and as the predominant driver of genomic variability after a population bottleneck
Behavioral, morphological, and genomic analyses of population structure in brood parasitic indigobirds (Vidua spp.)
The African indigobirds (Vidua spp.) are exceptional among avian brood parasites in that mimicry of host vocalizations plays an integral role in their social behaviors and evolutionary history. Young indigobirds imprint on the vocalizations of their hosts during development, adult males include mimicry of these vocalizations in their own repertoire, and adult females use these songs to choose both their mates and the nests they parasitize. Imprinting on the host during development therefore results in assortative mating and host fidelity, but also provides a mechanism for rapid, sympatric speciation via host shift. Host shifts require some degree of host infidelity, however, and the same behavioral mechanisms may thus lead to hybridization if eggs are laid in the nest of a host species already "occupied" by another indigobird species. Thus, it is not clear if the morphological and genetic similarity of most indigobird species is due to recent common ancestry or ongoing hybridization. I addressed this uncertainty by studying indigobirds in East Africa, a region that was colonized by West African ancestors in the late Pleistocene and is currently home to four indigobird species. I analyzed variation among species in: vi1) the responses of territorial males to playbacks of conspecific and heterospecific vocalizations; 2) temporal and frequency traits of chatter calls and complex non-mimicry songs; 3) morphological characters; and 4) genomic polymorphisms. The playback experiment shows that host mimicry is an important cue in species recognition, and suggests that it may contribute to species cohesion when juveniles or adults disperse beyond the boundaries of their dialect neighborhood. Analyses of both non-mimetic vocalizations and morphological characters (i.e., plumage color and body size) reveal that they are shaped by divergence among species as well as local ecology. Analyses of thousands of "double-digest" restriction site-associated DNA (ddRAD) loci scattered across the genome indicate that both species identity and geographic divergence contribute to population structure. Taken together, the results show that the tempo of speciation and morphological divergence among indigobirds associated with different hosts is likely variable, depending on geographic context, and the breeding ecology and morphology of alternative hosts
- …