36 research outputs found

    AdapterRemoval:easy cleaning of next generation sequencing reads

    Get PDF
    BACKGROUND: With the advent of next-generation sequencing there is an increased demand for tools to pre-process and handle the vast amounts of data generated. One recurring problem is adapter contamination in the reads, i.e. the partial or complete sequencing of adapter sequences. These adapter sequences have to be removed as they can hinder correct mapping of the reads and influence SNP calling and other downstream analyses. FINDINGS: We present a tool called AdapterRemoval which is able to pre-process both single and paired-end data. The program locates and removes adapter residues from the reads, it is able to combine paired reads if they overlap, and it can optionally trim low-quality nucleotides. Furthermore, it can look for adapter sequence in both the 5’ and 3’ ends of the reads. This is a flexible tool that can be tuned to accommodate different experimental settings and sequencing platforms producing FASTQ files. AdapterRemoval is shown to be good at trimming adapters from both single-end and paired-end data. CONCLUSIONS: AdapterRemoval is a comprehensive tool for analyzing next-generation sequencing data. It exhibits good performance both in terms of sensitivity and specificity. AdapterRemoval has already been used in various large projects and it is possible to extend it further to accommodate application-specific biases in the data

    SNPest:a probabilistic graphical model for estimating genotypes

    Get PDF
    BACKGROUND: As the use of next-generation sequencing technologies is becoming more widespread, the need for robust software to help with the analysis is growing as well. A key challenge when analyzing sequencing data is the prediction of genotypes from the reads, i.e. correct inference of the underlying DNA sequences that gave rise to the sequenced fragments. For diploid organisms, the genotyper should be able to predict both alleles in the individual. Variations between the individual and the population can then be analyzed by looking for SNPs (single nucleotide polymorphisms) in order to investigate diseases or phenotypic features. To perform robust and high confidence genotyping and SNP calling, methods are needed that take the technology specific limitations into account and can model different sources of error. As an example, ancient DNA poses special challenges as the data is often shallow and subject to errors induced by post mortem damage. FINDINGS: We present a novel approach to the genotyping problem where a probabilistic framework describing the process from sampling to sequencing is implemented as a graphical model. This makes it possible to model technology specific errors and other sources of variation that can affect the result. The inferred genotype is given a posterior probability to signify the confidence in the result. SNPest has already been used to genotype large scale projects such as the first ancient human genome published in 2010. CONCLUSIONS: We compare the performance of SNPest to a number of other widely used genotypers on both real and simulated data, covering both haploid and diploid genomes. We investigate the effects of read depth, of removing adapters before mapping and genotyping, of using different mapping tools, and of using the correct model in the genotyping process. We show that the performance of SNPest is comparable to existing methods, and we also illustrate cases where SNPest has an advantage over other methods, e.g. when dealing with simulated ancient DNA. ELECTRONIC SUPPLEMENTARY MATERIAL: The online version of this article (doi:10.1186/1756-0500-7-698) contains supplementary material, which is available to authorized users

    AdapterRemoval v2:rapid adapter trimming, identification, and read merging

    Get PDF
    BACKGROUND: As high-throughput sequencing platforms produce longer and longer reads, sequences generated from short inserts, such as those obtained from fossil and degraded material, are increasingly expected to contain adapter sequences. Efficient adapter trimming algorithms are also needed to process the growing amount of data generated per sequencing run. FINDINGS: We introduce AdapterRemoval v2, a major revision of AdapterRemoval v1, which introduces (i) striking improvements in throughput, through the use of single instruction, multiple data (SIMD; SSE1 and SSE2) instructions and multi-threading support, (ii) the ability to handle datasets containing reads or read-pairs with different adapters or adapter pairs, (iii) simultaneous demultiplexing and adapter trimming, (iv) the ability to reconstruct adapter sequences from paired-end reads for poorly documented data sets, and (v) native gzip and bzip2 support. CONCLUSIONS: We show that AdapterRemoval v2 compares favorably with existing tools, while offering superior throughput to most alternatives examined here, both for single and multi-threaded operations. ELECTRONIC SUPPLEMENTARY MATERIAL: The online version of this article (doi:10.1186/s13104-016-1900-2) contains supplementary material, which is available to authorized users

    Above and belowground community strategies respond to different global change drivers

    Get PDF
    Environmental changes alter the diversity and structure of communities. By shifting the range of species traits that will be successful under new conditions, environmental drivers can also dramatically impact ecosystem functioning and resilience. Above and belowground communities jointly regulate whole-ecosystem processes and responses to change, yet they are frequently studied separately. To determine whether these communities respond similarly to environmental changes, we measured taxonomic and trait-based responses of plant and soil microbial communities to four years of experimental warming and nitrogen deposition in a temperate grassland. Plant diversity responded strongly to N addition, whereas soil microbial communities responded primarily to warming, likely via an associated decrease in soil moisture. These above and belowground changes were associated with selection for more resource-conservative plant and microbe growth strategies, which reduced community functional diversity. Functional characteristics of plant and soil microbial communities were weakly correlated (P = 0.07) under control conditions, but not when above or belowground communities were altered by either global change driver. These results highlight the potential for global change drivers operating simultaneously to have asynchronous impacts on above and belowground components of ecosystems. Assessment of a single ecosystem component may therefore greatly underestimate the whole-system impact of global environmental changes

    Rfam: updates to the RNA families database

    Get PDF
    Rfam is a collection of RNA sequence families, represented by multiple sequence alignments and covariance models (CMs). The primary aim of Rfam is to annotate new members of known RNA families on nucleotide sequences, particularly complete genomes, using sensitive BLAST filters in combination with CMs. A minority of families with a very broad taxonomic range (e.g. tRNA and rRNA) provide the majority of the sequence annotations, whilst the majority of Rfam families (e.g. snoRNAs and miRNAs) have a limited taxonomic range and provide a limited number of annotations. Recent improvements to the website, methodologies and data used by Rfam are discussed. Rfam is freely available on the Web at http://rfam.sanger.ac.uk/and http://rfam.janelia.org/

    MASTR: multiple alignment and structure prediction of non-coding RNAs using simulated annealing.

    No full text
    Motivation: As more non–coding RNAs are discovered, the import-ance of methods for RNA analysis increases. Since the structure of ncRNA is intimately tied to the function of the molecule, programs for RNA structure prediction are necessary tools in this growing field of research. Furthermore, it is known that RNA structure is often evolutionarily more conserved than sequence. However, few exi-sting methods are capable of simultaneously considering multiple sequence alignment and structure prediction. Results: We present a novel solution to the problem of simulta-neous structure prediction and multiple alignment of RNA sequences. Using Markov chain Monte Carlo in a simulated annealing framework, the algorithm MASTR (Multiple Alignment of ST ructural RNAs) ite-ratively improves both sequence alignment and structure prediction for a set of RNA sequences. This is done by minimizing a combi-ned cost function that considers sequence conservation, covariation and basepairing probabilities. The results show that the method is very competitive to similar programs available today, both in terms of accuracy and computational efficiency
    corecore