359 research outputs found

    Logarithmic gap costs decrease alignment accuracy

    Get PDF
    BACKGROUND: Studies on the distribution of indel sizes have consistently found that they obey a power law. This finding has lead several scientists to propose that logarithmic gap costs, G (k) = a + c ln k, are more biologically realistic than affine gap costs, G (k) = a + bk, for sequence alignment. Since quick and efficient affine costs are currently the most popular way to globally align sequences, the goal of this paper is to determine whether logarithmic gap costs improve alignment accuracy significantly enough the merit their use over the faster affine gap costs. RESULTS: A group of simulated sequences pairs were globally aligned using affine, logarithmic, and log-affine gap costs. Alignment accuracy was calculated by comparing resulting alignments to actual alignments of the sequence pairs. Gap costs were then compared based on average alignment accuracy. Log-affine gap costs had the best accuracy, followed closely by affine gap costs, while logarithmic gap costs performed poorly. Subsequently a model was developed to explain the results. CONCLUSION: In contrast to initial expectations, logarithmic gap costs produce poor alignments and are actually not implied by the power-law behavior of gap sizes, given typical match and mismatch costs. Furthermore, affine gap costs not only produce accurate alignments but are also good approximations to biologically realistic gap costs. This work provides added confidence for the biological relevance of existing alignment algorithms

    A Composite Genome Approach to Identify Phylogenetically Informative Data from Next-Generation Sequencing

    Full text link
    We have developed a novel method to rapidly obtain homologous genomic data for phylogenetics directly from next-generation sequencing reads without the use of a reference genome. This software, called SISRS, avoids the time consuming steps of de novo whole genome assembly, genome-genome alignment, and annotation. For simulations SISRS is able to identify large numbers of loci containing variable sites with phylogenetic signal. For genomic data from apes, SISRS identified thousands of variable sites, from which we produced an accurate phylogeny. Finally, we used SISRS to identify phylogenetic markers that we used to estimate the phylogeny of placental mammals. We recovered phylogenies from multiple datasets that were consistent with previous conflicting estimates of the relationships among mammals. SISRS is open source and freely available at https://github.com/rachelss/SISRS.Comment: 12 pages plus36 figures, 1 supplementary table, 3 supplementary figure

    The multiple personalities of Watson and Crick strands

    Get PDF
    <p>Abstract</p> <p>Background</p> <p>In genetics it is customary to refer to double-stranded DNA as containing a "Watson strand" and a "Crick strand." However, there seems to be no consensus in the literature on the exact meaning of these two terms, and the many usages contradict one another as well as the original definition. Here, we review the history of the terminology and suggest retaining a single sense that is currently the most useful and consistent.</p> <p>Proposal</p> <p>The <it>Saccharomyces </it>Genome Database defines the Watson strand as the strand which has its 5'-end at the short-arm telomere and the Crick strand as its complement. The Watson strand is always used as the reference strand in their database. Using this as the basis of our standard, we recommend that Watson and Crick strand terminology only be used in the context of genomics. When possible, the centromere or other genomic feature should be used as a reference point, dividing the chromosome into two arms of unequal lengths. Under our proposal, the Watson strand is standardized as the strand whose 5'-end is on the short arm of the chromosome, and the Crick strand as the one whose 5'-end is on the long arm. Furthermore, the Watson strand should be retained as the reference (plus) strand in a genomic database. This usage not only makes the determination of Watson and Crick unambiguous, but also allows unambiguous selection of reference stands for genomics.</p> <p>Reviewers</p> <p>This article was reviewed by John M. Logsdon, Igor B. Rogozin (nominated by Andrey Rzhetsky), and William Martin.</p

    The effect of the dispersal kernel on isolation-by-distance in a continuous population

    Full text link
    Under models of isolation-by-distance, population structure is determined by the probability of identity-by-descent between pairs of genes according to the geographic distance between them. Well established analytical results indicate that the relationship between geographical and genetic distance depends mostly on the neighborhood size of the population, Nb=4πσ2DeN_b = 4{\pi}{\sigma}^2 D_e, which represents a standardized measure of dispersal. To test this prediction, we model local dispersal of haploid individuals on a two-dimensional torus using four dispersal kernels: Rayleigh, exponential, half-normal and triangular. When neighborhood size is held constant, the distributions produce similar patterns of isolation-by-distance, confirming predictions. Considering this, we propose that the triangular distribution is the appropriate null distribution for isolation-by-distance studies. Under the triangular distribution, dispersal is uniform within an area of 4πσ24{\pi}{\sigma}^2 (i.e. the neighborhood area), which suggests that the common description of neighborhood size as a measure of a local panmictic population is valid for popular families of dispersal distributions. We further show how to draw from the triangular distribution efficiently and argue that it should be utilized in other studies in which computational efficiency is importantComment: 18 pages (main); 4 pages (supp

    A family-based probabilistic method for capturing de novo mutations from high-throughput short-read sequencing data

    Get PDF
    Recent advances in high-throughput DNA sequencing technologies and associated statistical analyses have enabled in-depth analysis of whole-genome sequences. As this technology is applied to a growing number of individual human genomes, entire families are now being sequenced. Information contained within the pedigree of a sequenced family can be leveraged when inferring the donors' genotypes. The presence of a de novo mutation within the pedigree is indicated by a violation of Mendelian inheritance laws. Here, we present a method for probabilistically inferring genotypes across a pedigree using high-throughput sequencing data and producing the posterior probability of de novo mutation at each genomic site examined. This framework can be used to disentangle the effects of germline and somatic mutational processes and to simultaneously estimate the effect of sequencing error and the initial genetic variation in the population from which the founders of the pedigree arise. This approach is examined in detail through simulations and areas for method improvement are noted. By applying this method to data from members of a well-defined nuclear family with accurate pedigree information, the stage is set to make the most direct estimates of the human mutation rate to date

    Estimating error models for whole genome sequencing using mixtures of Dirichlet-multinomial distributions

    Get PDF
    Motivation: Accurate identification of genotypes is an essential part of the analysis of genomic data, including in identification of sequence polymorphisms, linking mutations with disease and determining mutation rates. Biological and technical processes that adversely affect genotyping include copy-number-variation, paralogous sequences, library preparation, sequencing error and reference-mapping biases, among others. Results: We modeled the read depth for all data as a mixture of Dirichlet-multinomial distributions, resulting in significant improvements over previously used models. In most cases the best model was comprised of two distributions. The major-component distribution is similar to a binomial distribution with low error and low reference bias. The minor-component distribution is overdispersed with higher error and reference bias. We also found that sites fitting the minor component are enriched for copy number variants and low complexity regions, which can produce erroneous genotype calls. By removing sites that do not fit the major component, we can improve the accuracy of genotype calls. Availability and Implementation: Methods and data files are available at https://github.com/ CartwrightLab/WuEtAl2017/ (doi:10.5281/zenodo.256858). Contact: [email protected] Supplementary information: Supplementary data is available at Bioinformatics online

    Coevolution of the Tlx homeobox gene with medusa development (Cnidaria: Medusozoa)

    Get PDF
    Cnidarians display a wide diversity of life cycles. Among the main cnidarian clades, only Medusozoa possesses a swimming life cycle stage called the medusa, alternating with a benthic polyp stage. The medusa stage was repeatedly lost during medusozoan evolution, notably in the most diverse medusozoan class, Hydrozoa. Here, we show that the presence of the homeobox gene Tlx in Cnidaria is correlated with the presence of the medusa stage, the gene having been lost in clades that ancestrally lack a medusa (anthozoans, endocnidozoans) and in medusozoans that secondarily lost the medusa stage. Our characterization of Tlx expression indicate an upregulation of Tlx during medusa development in three distantly related medusozoans, and spatially restricted expression patterns in developing medusae in two distantly related species, the hydrozoan Podocoryna carnea and the scyphozoan Pelagia noctiluca. These results suggest that Tlx plays a key role in medusa development and that the loss of this gene is likely linked to the repeated loss of the medusa life cycle stage in the evolution of Hydrozoa

    'Colour and communion': Exploring the influences of visual art-making as a leisure activity on older women's subjective well-being

    Get PDF
    This is the post-print version of the final paper published in Journal of Aging Studies. The published article is available from the link below. Changes resulting from the publishing process, such as peer review, editing, corrections, structural formatting, and other quality control mechanisms may not be reflected in this document. Changes may have been made to this work since it was submitted for publication. Copyright @ 2009 Elsevier B.V.Research into the subjective experience of art-making for older people is limited, and has focused mostly on professional artists rather than amateurs. This study examined older women's motives for visual art-making. Thirty-two participants aged 60-86 years old were interviewed. Twelve lived with chronic illness; twenty reported good health. Nearly all had taken up art after retirement; two had since become professional artists. Participants described their art-making as enriching their mental life, promoting enjoyment of the sensuality of colour and texture, presenting new challenges, playful experimentation, and fresh ambitions. Art also afforded participants valued connections with the world outside the home and immediate family. It encouraged attention to the aesthetics of the physical environment, preserved equal status relationships, and created opportunities for validation. Art-making protected the women's identities, helping them to resist the stereotypes and exclusions which are commonly encountered in later life.AHR

    A phylogenomic approach reveals a low somatic mutation rate in a long-lived plant.

    Get PDF
    Somatic mutations can have important effects on the life history, ecology, and evolution of plants, but the rate at which they accumulate is poorly understood and difficult to measure directly. Here, we develop a method to measure somatic mutations in individual plants and use it to estimate the somatic mutation rate in a large, long-lived, phenotypically mosaic Eucalyptus melliodora tree. Despite being 100 times larger than Arabidopsis, this tree has a per-generation mutation rate only ten times greater, which suggests that this species may have evolved mechanisms to reduce the mutation rate per unit of growth. This adds to a growing body of evidence that illuminates the correlated evolutionary shifts in mutation rate and life history in plants
    • …
    corecore