31 research outputs found

    Improving RNA-Seq Precision with MapAl

    Get PDF
    With currently available RNA-Seq pipelines, expression estimates for most genes are very noisy. We here introduce MapAl, a tool for RNA-Seq expression profiling that builds on the established programs Bowtie and Cufflinks. In the post-processing of RNA-Seq reads, it incorporates gene models already at the stage of read alignment, increasing the number of reliably measured known transcripts consistently by 50%. Adding genes identified de novo then allows a reliable assessment of double the total number of transcripts compared to other available pipelines. This substantial improvement is of general relevance: Measurement precision determines the power of any analysis to reliably identify significant signals, such as in screens for differential expression, independent of whether the experimental design incorporates replicates or not

    An analysis of single amino acid repeats as use case for application specific background models

    Get PDF
    Background Sequence analysis aims to identify biologically relevant signals against a backdrop of functionally meaningless variation. Increasingly, it is recognized that the quality of the background model directly affects the performance of analyses. State-of-the-art approaches rely on classical sequence models that are adapted to the studied dataset. Although performing well in the analysis of globular protein domains, these models break down in regions of stronger compositional bias or low complexity. While these regions are typically filtered, there is increasing anecdotal evidence of functional roles. This motivates an exploration of more complex sequence models and application-specific approaches for the investigation of biased regions. Results Traditional Markov-chains and application-specific regression models are compared using the example of predicting runs of single amino acids, a particularly simple class of biased regions. Cross-fold validation experiments reveal that the alternative regression models capture the multi-variate trends well, despite their low dimensionality and in contrast even to higher-order Markov-predictors. We show how the significance of unusual observations can be computed for such empirical models. The power of a dedicated model in the detection of biologically interesting signals is then demonstrated in an analysis identifying the unexpected enrichment of contiguous leucine-repeats in signal-peptides. Considering different reference sets, we show how the question examined actually defines what constitutes the 'background'. Results can thus be highly sensitive to the choice of appropriate model training sets. Conversely, the choice of reference data determines the questions that can be investigated in an analysis. Conclusions Using a specific case of studying biased regions as an example, we have demonstrated that the construction of application-specific background models is both necessary and feasible in a challenging sequence analysis situation

    The Metagenomics and Metadesign of the Subways and Urban Biomes (MetaSUB) International Consortium inaugural meeting report

    Get PDF
    The Metagenomics and Metadesign of the Subways and Urban Biomes (MetaSUB) International Consortium is a novel, interdisciplinary initiative comprised of experts across many fields, including genomics, data analysis, engineering, public health, and architecture. The ultimate goal of the MetaSUB Consortium is to improve city utilization and planning through the detection, measurement, and design of metagenomics within urban environments. Although continual measures occur for temperature, air pressure, weather, and human activity, including longitudinal, cross-kingdom ecosystem dynamics can alter and improve the design of cities. The MetaSUB Consortium is aiding these efforts by developing and testing metagenomic methods and standards, including optimized methods for sample collection, DNA/RNA isolation, taxa characterization, and data visualization. The data produced by the consortium can aid city planners, public health officials, and architectural designers. In addition, the study will continue to lead to the discovery of new species, global maps of antimicrobial resistance (AMR) markers, and novel biosynthetic gene clusters (BGCs). Finally, we note that engineered metagenomic ecosystems can help enable more responsive, safer, and quantified cities

    Single amino acid repeats in signal peptides

    No full text
    There has been an increasing interest in single amino acid repeats ever since it was shown that these are the cause of a variety of diseases. Although a systematic study of single amino acid repeats is challenging, they have subsequently been implicated in a number of functional roles. In general surveys, leucine runs were among the most frequent. In the present study, we present a detailed investigation of repeats in signal peptides of secreted and type I membrane proteins in comparison with their mature parts. We focus on eukaryotic species because single amino acid repeats are generally rather rare in archaea and bacteria. Our analysis of over 100 species shows that repeats of leucine (but not of other hydrophobic amino acids) are over-represented in signal peptides. This trend is most pronounced in higher eukaryotes, particularly in mammals. In the human proteome, although less than one-fifth of all proteins have a signal peptide, approximately two-thirds of all leucine repeats are located in these transient regions. Signal peptides are cleaved early from the growing polypeptide chain and then degraded rapidly. This may explain why leucine repeats, which can be toxic, are tolerated at such high frequencies. The substantial fraction of proteins affected by the strong enrichment of repeats in these transient segments highlights the bias that they can introduce for systematic analyses of protein sequences. In contrast to a general lack of conservation of single amino acid repeats, leucine repeats were found to be more conserved than the remaining signal peptide regions, indicating that they may have an as yet unknown functional role

    J Grid Computing DOI 10.1007/s10723-013-9260-9 Managing and Optimizing Bioinformatics Workflows for Data Analysis in Clouds

    No full text
    Abstract The rapid advancements in recent years of high-throughput technologies in the life sciences are facilitating the generation and storage of huge amount of data in different databases. Despite significant developments in computing capacity and performance, an analysis of these large-scale data in a search for biomedical relevant patterns remains a challenging task. Scientific workflow applications are deemed to support data-mining in more complex scenarios that include many data sources and computational tools
    corecore