127 research outputs found

    Targeted Assembly of Short Sequence Reads

    Get PDF
    As next-generation sequence (NGS) production continues to increase, analysis is becoming a significant bottleneck. However, in situations where information is required only for specific sequence variants, it is not necessary to assemble or align whole genome data sets in their entirety. Rather, NGS data sets can be mined for the presence of sequence variants of interest by localized assembly, which is a faster, easier, and more accurate approach. We present TASR, a streamlined assembler that interrogates very large NGS data sets for the presence of specific variants, by only considering reads within the sequence space of input target sequences provided by the user. The NGS data set is searched for reads with an exact match to all possible short words within the target sequence, and these reads are then assembled strin-gently to generate a consensus of the target and flanking sequence. Typically, variants of a particular locus are provided as different target sequences, and the presence of the variant in the data set being interrogated is revealed by a successful assembly outcome. However, TASR can also be used to find unknown sequences that flank a given target. We demonstrate that TASR has utility in finding or confirming ge-nomic mutations, polymorphism, fusion and integration events. Targeted assembly is a powerful method for interrogating large data sets for the presence of sequence variants of interest. TASR is a fast, flexible and easy to use tool for targeted assembly

    Prediction of RNA secondary structure by maximizing pseudo-expected accuracy

    Get PDF
    <p>Abstract</p> <p>Background</p> <p>Recent studies have revealed the importance of considering the entire distribution of possible secondary structures in RNA secondary structure predictions; therefore, a new type of estimator is proposed including the maximum expected accuracy (MEA) estimator. The MEA-based estimators have been designed to maximize the expected accuracy of the base-pairs and have achieved the highest level of accuracy. Those methods, however, do not give the single best prediction of the structure, but employ parameters to control the trade-off between the sensitivity and the positive predictive value (PPV). It is unclear what parameter value we should use, and even the well-trained default parameter value does not, in general, give the best result in popular accuracy measures to each RNA sequence.</p> <p>Results</p> <p>Instead of using the expected values of the popular accuracy measures for RNA secondary structure prediction, which is difficult to be calculated, the <it>pseudo</it>-expected accuracy, which can easily be computed from base-pairing probabilities, is introduced. It is shown that the pseudo-expected accuracy is a good approximation in terms of sensitivity, PPV, MCC, or F-score. The pseudo-expected accuracy can be approximately maximized for each RNA sequence by stochastic sampling. It is also shown that well-balanced secondary structures between sensitivity and PPV can be predicted with a small computational overhead by combining the pseudo-expected accuracy of MCC or F-score with the γ-centroid estimator.</p> <p>Conclusions</p> <p>This study gives not only a method for predicting the secondary structure that balances between sensitivity and PPV, but also a general method for approximately maximizing the (pseudo-)expected accuracy with respect to various evaluation measures including MCC and F-score.</p

    Early readmission and length of hospitalization practices in the Dialysis Outcomes and Practice Patterns Study (DOPPS)

    Full text link
    Background:  Rising hospital care costs have created pressure to shorten hospital stays and emphasize outpatient care. This study tests the hypothesis that shorter median length of stay (LOS) as a dialysis facility practice is associated with higher rates of early readmission. Methods:  Readmission within 30 days of each hospitalization was evaluated for participants in the Dialysis Outcomes and Practice Patterns Study, an observational study of randomly selected hemodialysis patients in the United States (142 facilities, 5095 patients with hospitalizations), five European countries (101 facilities, 2281 patients with hospitalizations), and Japan (58 facilities, 883 patients with hospitalizations). Associations between median facility LOS (estimated from all hospitalizations at the facility and interpreted as a dialysis facility practice pattern) and odds of readmission were assessed using logistic regression, adjusted for patient characteristics and the LOS of each index hospitalization. Results:  Risk of readmission was directly and significantly associated with LOS of the index hospitalization (adjusted odds ratio [AOR] 1.005 per day in median facility LOS, p = 0.007) and inversely associated with median facility LOS (AOR = 0.974 per day, p = 0.016). This latter association was strongest for US hemodialysis centers (AOR = 0.954 per day, p = 0.015). Conclusions:  Dialysis facilities with shorter median hospital LOS for their patients have higher odds of readmission, particularly in the United States, where there is greater pressure to shorten LOS. The determinants and consequences of practices related to hospital LOS for hemodialysis patients should be further studied.Peer Reviewedhttp://deepblue.lib.umich.edu/bitstream/2027.42/73641/1/j.1492-7535.2004.01107.x.pd

    PicXAA-R: Efficient structural alignment of multiple RNA sequences using a greedy approach

    Get PDF
    <p>Abstract</p> <p>Background</p> <p>Accurate and efficient structural alignment of non-coding RNAs (ncRNAs) has grasped more and more attentions as recent studies unveiled the significance of ncRNAs in living organisms. While the Sankoff style structural alignment algorithms cannot efficiently serve for multiple sequences, mostly progressive schemes are used to reduce the complexity. However, this idea tends to propagate the early stage errors throughout the entire process, thereby degrading the quality of the final alignment. For multiple protein sequence alignment, we have recently proposed PicXAA which constructs an accurate alignment in a non-progressive fashion.</p> <p>Results</p> <p>Here, we propose PicXAA-R as an extension to PicXAA for greedy structural alignment of ncRNAs. PicXAA-R efficiently grasps both folding information within each sequence and local similarities between sequences. It uses a set of probabilistic consistency transformations to improve the posterior base-pairing and base alignment probabilities using the information of all sequences in the alignment. Using a graph-based scheme, we greedily build up the structural alignment from sequence regions with high base-pairing and base alignment probabilities.</p> <p>Conclusions</p> <p>Several experiments on datasets with different characteristics confirm that PicXAA-R is one of the fastest algorithms for structural alignment of multiple RNAs and it consistently yields accurate alignment results, especially for datasets with locally similar sequences. PicXAA-R source code is freely available at: <url>http://www.ece.tamu.edu/~bjyoon/picxaa/</url>.</p

    Seasonal and Long-Term Changes in Relative Abundance of Bull Sharks from a Tourist Shark Feeding Site in Fiji

    Get PDF
    Shark tourism has become increasingly popular, but remains controversial because of major concerns originating from the need of tour operators to use bait or chum to reliably attract sharks. We used direct underwater sampling to document changes in bull shark Carcharhinus leucas relative abundance at the Shark Reef Marine Reserve, a shark feeding site in Fiji, and the reproductive cycle of the species in Fijian waters. Between 2003 and 2009, the total number of C. leucas counted on each day ranged from 0 to 40. Whereas the number of C. leucas counted at the feeding site increased over the years, shark numbers decreased over the course of a calendar year with fewest animals counted in November. Externally visible reproductive status information indicates that the species' seasonal departure from the feeding site may be related to reproductive activity

    Modeling the Evolution of Regulatory Elements by Simultaneous Detection and Alignment with Phylogenetic Pair HMMs

    Get PDF
    The computational detection of regulatory elements in DNA is a difficult but important problem impacting our progress in understanding the complex nature of eukaryotic gene regulation. Attempts to utilize cross-species conservation for this task have been hampered both by evolutionary changes of functional sites and poor performance of general-purpose alignment programs when applied to non-coding sequence. We describe a new and flexible framework for modeling binding site evolution in multiple related genomes, based on phylogenetic pair hidden Markov models which explicitly model the gain and loss of binding sites along a phylogeny. We demonstrate the value of this framework for both the alignment of regulatory regions and the inference of precise binding-site locations within those regions. As the underlying formalism is a stochastic, generative model, it can also be used to simulate the evolution of regulatory elements. Our implementation is scalable in terms of numbers of species and sequence lengths and can produce alignments and binding-site predictions with accuracy rivaling or exceeding current systems that specialize in only alignment or only binding-site prediction. We demonstrate the validity and power of various model components on extensive simulations of realistic sequence data and apply a specific model to study Drosophila enhancers in as many as ten related genomes and in the presence of gain and loss of binding sites. Different models and modeling assumptions can be easily specified, thus providing an invaluable tool for the exploration of biological hypotheses that can drive improvements in our understanding of the mechanisms and evolution of gene regulation

    Responsibility Ascriptions in Technology Development and Engineering: Three Perspectives

    Get PDF
    In the last decades increasing attention is paid to the topic of responsibility in technology development and engineering. The discussion of this topic is often guided by questions related to liability and blameworthiness. Recent discussions in engineering ethics call for a reconsideration of the traditional quest for responsibility. Rather than on alleged wrongdoing and blaming, the focus should shift to more socially responsible engineering, some authors argue. The present paper aims at exploring the different approaches to responsibility in order to see which one is most appropriate to apply to engineering and technology development. Using the example of the development of a new sewage water treatment technology, the paper shows how different approaches for ascribing responsibilities have different implications for engineering practice in general, and R&D or technological design in particular. It was found that there was a tension between the demands that follow from these different approaches, most notably between efficacy and fairness. Although the consequentialist approach with its efficacy criterion turned out to be most powerful, it was also shown that the fairness of responsibility ascriptions should somehow be taken into account. It is proposed to look for alternative, more procedural ways to approach the fairness of responsibility ascriptions

    Histone Deacetylase Inhibitors Globally Enhance H3/H4 Tail Acetylation Without Affecting H3 Lysine 56 Acetylation

    Get PDF
    Histone deacetylase inhibitors (HDACi) represent a promising avenue for cancer therapy. We applied mass spectrometry (MS) to determine the impact of clinically relevant HDACi on global levels of histone acetylation. Intact histone profiling revealed that the HDACi SAHA and MS-275 globally increased histone H3 and H4 acetylation in both normal diploid fibroblasts and transformed human cells. Histone H3 lysine 56 acetylation (H3K56ac) recently elicited much interest and controversy due to its potential as a diagnostic and prognostic marker for a broad diversity of cancers. Using quantitative MS, we demonstrate that H3K56ac is much less abundant than previously reported in human cells. Unexpectedly, in contrast to H3/H4 N-terminal tail acetylation, H3K56ac did not increase in response to inhibitors of each class of HDACs. In addition, we demonstrate that antibodies raised against H3K56ac peptides cross-react against H3 N-terminal tail acetylation sites that carry sequence similarity to residues flanking H3K56

    Trait Variation in Yeast Is Defined by Population History

    Get PDF
    A fundamental goal in biology is to achieve a mechanistic understanding of how and to what extent ecological variation imposes selection for distinct traits and favors the fixation of specific genetic variants. Key to such an understanding is the detailed mapping of the natural genomic and phenomic space and a bridging of the gap that separates these worlds. Here we chart a high-resolution map of natural trait variation in one of the most important genetic model organisms, the budding yeast Saccharomyces cerevisiae, and its closest wild relatives and trace the genetic basis and timing of major phenotype changing events in its recent history. We show that natural trait variation in S. cerevisiae exceeds that of its relatives, despite limited genetic variation, and follows the population history rather than the source environment. In particular, the West African population is phenotypically unique, with an extreme abundance of low-performance alleles, notably a premature translational termination signal in GAL3 that cause inability to utilize galactose. Our observations suggest that many S. cerevisiae traits may be the consequence of genetic drift rather than selection, in line with the assumption that natural yeast lineages are remnants of recent population bottlenecks. Disconcertingly, the universal type strain S288C was found to be highly atypical, highlighting the danger of extrapolating gene-trait connections obtained in mosaic, lab-domesticated lineages to the species as a whole. Overall, this study represents a step towards an in-depth understanding of the causal relationship between co-variation in ecology, selection pressure, natural traits, molecular mechanism, and alleles in a key model organism

    Parameters for accurate genome alignment

    Get PDF
    <p>Abstract</p> <p>Background</p> <p>Genome sequence alignments form the basis of much research. Genome alignment depends on various mundane but critical choices, such as how to mask repeats and which score parameters to use. Surprisingly, there has been no large-scale assessment of these choices using real genomic data. Moreover, rigorous procedures to control the rate of spurious alignment have not been employed.</p> <p>Results</p> <p>We have assessed 495 combinations of score parameters for alignment of animal, plant, and fungal genomes. As our gold-standard of accuracy, we used genome alignments implied by multiple alignments of proteins and of structural RNAs. We found the HOXD scoring schemes underlying alignments in the UCSC genome database to be far from optimal, and suggest better parameters. Higher values of the X-drop parameter are not always better. E-values accurately indicate the rate of spurious alignment, but only if tandem repeats are masked in a non-standard way. Finally, we show that γ-centroid (probabilistic) alignment can find highly reliable subsets of aligned bases.</p> <p>Conclusions</p> <p>These results enable more accurate genome alignment, with reliability measures for local alignments and for individual aligned bases. This study was made possible by our new software, LAST, which can align vertebrate genomes in a few hours <url>http://last.cbrc.jp/</url>.</p
    corecore