54 research outputs found

    Improving RNA-Seq Precision with MapAl

    Get PDF
    With currently available RNA-Seq pipelines, expression estimates for most genes are very noisy. We here introduce MapAl, a tool for RNA-Seq expression profiling that builds on the established programs Bowtie and Cufflinks. In the post-processing of RNA-Seq reads, it incorporates gene models already at the stage of read alignment, increasing the number of reliably measured known transcripts consistently by 50%. Adding genes identified de novo then allows a reliable assessment of double the total number of transcripts compared to other available pipelines. This substantial improvement is of general relevance: Measurement precision determines the power of any analysis to reliably identify significant signals, such as in screens for differential expression, independent of whether the experimental design incorporates replicates or not

    Experiences with workflows for automating data-intensive bioinformatics

    Get PDF
    High-throughput technologies, such as next-generation sequencing, have turned molecular biology into a data-intensive discipline, requiring bioinformaticians to use high-performance computing resources and carry out data management and analysis tasks on large scale. Workflow systems can be useful to simplify construction of analysis pipelines that automate tasks, support reproducibility and provide measures for fault-tolerance. However, workflow systems can incur significant development and administration overhead so bioinformatics pipelines are often still built without them. We present the experiences with workflows and workflow systems within the bioinformatics community participating in a series of hackathons and workshops of the EU COST action SeqAhead. The organizations are working on similar problems, but we have addressed them with different strategies and solutions. This fragmentation of efforts is inefficient and leads to redundant and incompatible solutions. Based on our experiences we define a set of recommendations for future systems to enable efficient yet simple bioinformatics workflow construction and execution.Pubblicat

    An analysis of single amino acid repeats as use case for application specific background models

    Get PDF
    Background Sequence analysis aims to identify biologically relevant signals against a backdrop of functionally meaningless variation. Increasingly, it is recognized that the quality of the background model directly affects the performance of analyses. State-of-the-art approaches rely on classical sequence models that are adapted to the studied dataset. Although performing well in the analysis of globular protein domains, these models break down in regions of stronger compositional bias or low complexity. While these regions are typically filtered, there is increasing anecdotal evidence of functional roles. This motivates an exploration of more complex sequence models and application-specific approaches for the investigation of biased regions. Results Traditional Markov-chains and application-specific regression models are compared using the example of predicting runs of single amino acids, a particularly simple class of biased regions. Cross-fold validation experiments reveal that the alternative regression models capture the multi-variate trends well, despite their low dimensionality and in contrast even to higher-order Markov-predictors. We show how the significance of unusual observations can be computed for such empirical models. The power of a dedicated model in the detection of biologically interesting signals is then demonstrated in an analysis identifying the unexpected enrichment of contiguous leucine-repeats in signal-peptides. Considering different reference sets, we show how the question examined actually defines what constitutes the 'background'. Results can thus be highly sensitive to the choice of appropriate model training sets. Conversely, the choice of reference data determines the questions that can be investigated in an analysis. Conclusions Using a specific case of studying biased regions as an example, we have demonstrated that the construction of application-specific background models is both necessary and feasible in a challenging sequence analysis situation

    Identification of co-expression gene networks, regulatory genes and pathways for obesity based on adipose tissue RNA Sequencing in a porcine model

    Get PDF
    Background: Obesity is a complex metabolic condition in strong association with various diseases, like type 2 diabetes, resulting in major public health and economic implications. Obesity is the result of environmental and genetic factors and their interactions, including genome-wide genetic interactions. Identification of co-expressed and regulatory genes in RNA extracted from relevant tissues representing lean and obese individuals provides an entry point for the identification of genes and pathways of importance to the development of obesity. The pig, an omnivorous animal, is an excellent model for human obesity, offering the possibility to study in-depth organ-level transcriptomic regulations of obesity, unfeasible in humans. Our aim was to reveal adipose tissue co-expression networks, pathways and transcriptional regulations of obesity using RNA Sequencing based systems biology approaches in a porcine model. Methods: We selected 36 animals for RNA Sequencing from a previously created F2 pig population representing three extreme groups based on their predicted genetic risks for obesity. We applied Weighted Gene Co-expression Network Analysis (WGCNA) to detect clusters of highly co-expressed genes (modules). Additionally, regulator genes were detected using Lemon-Tree algorithms. Results: WGCNA revealed five modules which were strongly correlated with at least one obesity-related phenotype (correlations ranging from -0.54 to 0.72, P <0.001). Functional annotation identified pathways enlightening the association between obesity and other diseases, like osteoporosis (osteoclast differentiation, P = 1.4E(-7)), and immune-related complications (e. g. Natural killer cell mediated cytotoxity, P = 3.8E(-5); B cell receptor signaling pathway, P = 7.2E(-5)). Lemon-Tree identified three potential regulator genes, using confident scores, for the WGCNA module which was associated with osteoclast differentiation: CCR1, MSR1 and SI1 (probability scores respectively 95.30, 62.28, and 34.58). Moreover, detection of differentially connected genes identified various genes previously identified to be associated with obesity in humans and rodents, e.g. CSF1R and MARC2. Conclusions: To our knowledge, this is the first study to apply systems biology approaches using porcine adipose tissue RNA-Sequencing data in a genetically characterized porcine model for obesity. We revealed complex networks, pathways, candidate and regulatory genes related to obesity, confirming the complexity of obesity and its association with immune-related disorders and osteoporosis

    The Metagenomics and Metadesign of the Subways and Urban Biomes (MetaSUB) International Consortium inaugural meeting report

    Get PDF
    The Metagenomics and Metadesign of the Subways and Urban Biomes (MetaSUB) International Consortium is a novel, interdisciplinary initiative comprised of experts across many fields, including genomics, data analysis, engineering, public health, and architecture. The ultimate goal of the MetaSUB Consortium is to improve city utilization and planning through the detection, measurement, and design of metagenomics within urban environments. Although continual measures occur for temperature, air pressure, weather, and human activity, including longitudinal, cross-kingdom ecosystem dynamics can alter and improve the design of cities. The MetaSUB Consortium is aiding these efforts by developing and testing metagenomic methods and standards, including optimized methods for sample collection, DNA/RNA isolation, taxa characterization, and data visualization. The data produced by the consortium can aid city planners, public health officials, and architectural designers. In addition, the study will continue to lead to the discovery of new species, global maps of antimicrobial resistance (AMR) markers, and novel biosynthetic gene clusters (BGCs). Finally, we note that engineered metagenomic ecosystems can help enable more responsive, safer, and quantified cities

    Assessment of hydromorphological conditions of rivers in urbanised catchments with River Habitat Survey metod

    No full text
    Do oceny warunków hydromorfologicznych cieków stosuje się szereg metod. Jedną z nich jest metoda oceny wód płynących River Habitat Survey (RHS), stanowiąca narzędzie do szczegółowego opisu warunków hydromorfologicznych rzek na podstawie rejestracji elementów środowiska doliny rzecznej. W celu przetestowania możliwości stosowania metody RHS do oceny warunków hydromorfologicznych rzek miejskich przeprowadzono badania na wybranych odcinkach czterech rzek przepływających przez centralną część konurbacji katowickiej. Uzyskane wyniki wykazują duże zróżnicowanie. Różnice otrzymanych wyników wystąpiły zarówno między badanymi rzekami, jak i w obrębie tego samego cieku na odcinkach ze sobą sąsiadujących. Przyczyn takiego zróżnicowania wyników, poza urozmaiconymi warunkami zlewniowymi śląskich miast, należy upatrywać w niewielkiej liczbie czynników silnie wpływających na ocenę końcową wskaźników naturalności i przekształcenia siedliska. Wskazane jest podjęcie próby zastosowania metody Urban River Survey (URS), która stanowi zmodyfikowaną wersję metody RHS, dostosowaną do warunków miejskich.To assess the stream hydromorphological conditions number of methods is used. One of them is a method of evaluating flowing waters River Habitat Survey (RHS) being a tool for the detailed description of the river hydromorphological conditions based on the registration of the environment of the river valley. In order to test the applicability of the RHS method to evaluate the hydromorphological conditions of urban rivers selected sections of four rivers flowing through the central part of the Katowice conurbation were investigated. The results show a wide variation. Differences in the results were both between the analyze drivers, as well as within the same stream on sections adjacent to each other. The reasons for this results diversity, except varied river catchment conditions of Silesian cities, should be seen in a small number of factors that strongly influence on the final evaluation indicators and natural habitat transformation. It is advisable to try to use the method Urban River Survey (URS), which is a modified version of the RHS method adapted to urban conditions

    J Grid Computing DOI 10.1007/s10723-013-9260-9 Managing and Optimizing Bioinformatics Workflows for Data Analysis in Clouds

    No full text
    Abstract The rapid advancements in recent years of high-throughput technologies in the life sciences are facilitating the generation and storage of huge amount of data in different databases. Despite significant developments in computing capacity and performance, an analysis of these large-scale data in a search for biomedical relevant patterns remains a challenging task. Scientific workflow applications are deemed to support data-mining in more complex scenarios that include many data sources and computational tools
    corecore