13,739 research outputs found

    RasBhari: optimizing spaced seeds for database searching, read mapping and alignment-free sequence comparison

    Full text link
    Many algorithms for sequence analysis rely on word matching or word statistics. Often, these approaches can be improved if binary patterns representing match and don't-care positions are used as a filter, such that only those positions of words are considered that correspond to the match positions of the patterns. The performance of these approaches, however, depends on the underlying patterns. Herein, we show that the overlap complexity of a pattern set that was introduced by Ilie and Ilie is closely related to the variance of the number of matches between two evolutionarily related sequences with respect to this pattern set. We propose a modified hill-climbing algorithm to optimize pattern sets for database searching, read mapping and alignment-free sequence comparison of nucleic-acid sequences; our implementation of this algorithm is called rasbhari. Depending on the application at hand, rasbhari can either minimize the overlap complexity of pattern sets, maximize their sensitivity in database searching or minimize the variance of the number of pattern-based matches in alignment-free sequence comparison. We show that, for database searching, rasbhari generates pattern sets with slightly higher sensitivity than existing approaches. In our Spaced Words approach to alignment-free sequence comparison, pattern sets calculated with rasbhari led to more accurate estimates of phylogenetic distances than the randomly generated pattern sets that we previously used. Finally, we used rasbhari to generate patterns for short read classification with CLARK-S. Here too, the sensitivity of the results could be improved, compared to the default patterns of the program. We integrated rasbhari into Spaced Words; the source code of rasbhari is freely available at http://rasbhari.gobics.de

    Distributions associated with general runs and patterns in hidden Markov models

    Full text link
    This paper gives a method for computing distributions associated with patterns in the state sequence of a hidden Markov model, conditional on observing all or part of the observation sequence. Probabilities are computed for very general classes of patterns (competing patterns and generalized later patterns), and thus, the theory includes as special cases results for a large class of problems that have wide application. The unobserved state sequence is assumed to be Markovian with a general order of dependence. An auxiliary Markov chain is associated with the state sequence and is used to simplify the computations. Two examples are given to illustrate the use of the methodology. Whereas the first application is more to illustrate the basic steps in applying the theory, the second is a more detailed application to DNA sequences, and shows that the methods can be adapted to include restrictions related to biological knowledge.Comment: Published in at http://dx.doi.org/10.1214/07-AOAS125 the Annals of Applied Statistics (http://www.imstat.org/aoas/) by the Institute of Mathematical Statistics (http://www.imstat.org

    Genome-Wide Footprints of Pig Domestication and Selection Revealed through Massive Parallel Sequencing of Pooled DNA

    Get PDF
    Background Artificial selection has caused rapid evolution in domesticated species. The identification of selection footprints across domesticated genomes can contribute to uncover the genetic basis of phenotypic diversity. Methodology/Main Findings Genome wide footprints of pig domestication and selection were identified using massive parallel sequencing of pooled reduced representation libraries (RRL) representing ~2% of the genome from wild boar and four domestic pig breeds (Large White, Landrace, Duroc and Pietrain) which have been under strong selection for muscle development, growth, behavior and coat color. Using specifically developed statistical methods that account for DNA pooling, low mean sequencing depth, and sequencing errors, we provide genome-wide estimates of nucleotide diversity and genetic differentiation in pig. Widespread signals suggestive of positive and balancing selection were found and the strongest signals were observed in Pietrain, one of the breeds most intensively selected for muscle development. Most signals were population-specific but affected genomic regions which harbored genes for common biological categories including coat color, brain development, muscle development, growth, metabolism, olfaction and immunity. Genetic differentiation in regions harboring genes related to muscle development and growth was higher between breeds than between a given breed and the wild boar. Conclusions/Significance These results, suggest that although domesticated breeds have experienced similar selective pressures, selection has acted upon different genes. This might reflect the multiple domestication events of European breeds or could be the result of subsequent introgression of Asian alleles. Overall, it was estimated that approximately 7% of the porcine genome has been affected by selection events. This study illustrates that the massive parallel sequencing of genomic pools is a cost-effective approach to identify footprints of selection

    Molecular evolution of candidate male reproductive genes in the brown algal model Ectocarpus

    Get PDF
    Background: Evolutionary studies of genes that mediate recognition between sperm and egg contribute to our understanding of reproductive isolation and speciation. Surface receptors involved in fertilization are targets of sexual selection, reinforcement, and other evolutionary forces including positive selection. This observation was made across different lineages of the eukaryotic tree from land plants to mammals, and is particularly evident in free-spawning animals. Here we use the brown algal model species Ectocarpus (Phaeophyceae) to investigate the evolution of candidate gamete recognition proteins in a distant major phylogenetic group of eukaryotes. Results: Male gamete specific genes were identified by comparing transcriptome data covering different stages of the Ectocarpus life cycle and screened for characteristics expected from gamete recognition receptors. Selected genes were sequenced in a representative number of strains from distant geographical locations and varying stages of reproductive isolation, to search for signatures of adaptive evolution. One of the genes (Esi0130_0068) showed evidence of selective pressure. Interestingly, that gene displayed domain similarities to the receptor for egg jelly (REJ) protein involved in sperm-egg recognition in sea urchins. Conclusions: We have identified a male gamete specific gene with similarity to known gamete recognition receptors and signatures of adaptation. Altogether, this gene could contribute to gamete interaction during reproduction as well as reproductive isolation in Ectocarpus and is therefore a good candidate for further functional evaluation

    Behavioral, morphological, and genomic analyses of population structure in brood parasitic indigobirds (Vidua spp.)

    Full text link
    The African indigobirds (Vidua spp.) are exceptional among avian brood parasites in that mimicry of host vocalizations plays an integral role in their social behaviors and evolutionary history. Young indigobirds imprint on the vocalizations of their hosts during development, adult males include mimicry of these vocalizations in their own repertoire, and adult females use these songs to choose both their mates and the nests they parasitize. Imprinting on the host during development therefore results in assortative mating and host fidelity, but also provides a mechanism for rapid, sympatric speciation via host shift. Host shifts require some degree of host infidelity, however, and the same behavioral mechanisms may thus lead to hybridization if eggs are laid in the nest of a host species already "occupied" by another indigobird species. Thus, it is not clear if the morphological and genetic similarity of most indigobird species is due to recent common ancestry or ongoing hybridization. I addressed this uncertainty by studying indigobirds in East Africa, a region that was colonized by West African ancestors in the late Pleistocene and is currently home to four indigobird species. I analyzed variation among species in: vi1) the responses of territorial males to playbacks of conspecific and heterospecific vocalizations; 2) temporal and frequency traits of chatter calls and complex non-mimicry songs; 3) morphological characters; and 4) genomic polymorphisms. The playback experiment shows that host mimicry is an important cue in species recognition, and suggests that it may contribute to species cohesion when juveniles or adults disperse beyond the boundaries of their dialect neighborhood. Analyses of both non-mimetic vocalizations and morphological characters (i.e., plumage color and body size) reveal that they are shaped by divergence among species as well as local ecology. Analyses of thousands of "double-digest" restriction site-associated DNA (ddRAD) loci scattered across the genome indicate that both species identity and geographic divergence contribute to population structure. Taken together, the results show that the tempo of speciation and morphological divergence among indigobirds associated with different hosts is likely variable, depending on geographic context, and the breeding ecology and morphology of alternative hosts
    • …
    corecore