5,261 research outputs found

    Removing duplicate reads using graphics processing units

    Get PDF
    Background: During library construction polymerase chain reaction is used to enrich the DNA before sequencing. Typically, this process generates duplicate read sequences. Removal of these artifacts is mandatory, as they can affect the correct interpretation of data in several analyses. Ideally, duplicate reads should be characterized by identical nucleotide sequences. However, due to sequencing errors, duplicates may also be nearly-identical. Removing nearly-identical duplicates can result in a notable computational effort. To deal with this challenge, we recently proposed a GPU method aimed at removing identical and nearly-identical duplicates generated with an Illumina platform. The method implements an approach based on prefix-suffix comparison. Read sequences with identical prefix are considered potential duplicates. Then, their suffixes are compared to identify and remove those that are actually duplicated. Although the method can be efficiently used to remove duplicates, there are some limitations that need to be overcome. In particular, it cannot to detect potential duplicates in the event that prefixes are longer than 27 bases, and it does not provide support for paired-end read libraries. Moreover, large clusters of potential duplicates are split into smaller with the aim to guarantees a reasonable computing time. This heuristic may affect the accuracy of the analysis. Results: In this work we propose GPU-DupRemoval, a new implementation of our method able to (i) cluster reads without constraints on the maximum length of the prefixes, (ii) support both single- and paired-end read libraries, and (iii) analyze large clusters of potential duplicates. Conclusions: Due to the massive parallelization obtained by exploiting graphics cards, GPU-DupRemoval removes duplicate reads faster than other cutting-edge solutions, while outperforming most of them in terms of amount of duplicates reads

    G-CNV: A GPU-based tool for preparing data to detect CNVs with read-depth methods

    Get PDF
    Copy number variations (CNVs) are the most prevalent types of structural variations (SVs) in the human genome and are involved in a wide range of common human diseases. Different computational methods have been devised to detect this type of SVs and to study how they are implicated in human diseases. Recently, computational methods based on high-throughput sequencing (HTS) are increasingly used. The majority of these methods focus on mapping short-read sequences generated from a donor against a reference genome to detect signatures distinctive of CNVs. In particular, read-depth based methods detect CNVs by analyzing genomic regions with significantly different read-depth from the other ones. The pipeline analysis of these methods consists of four main stages: (i) data preparation, (ii) data normalization, (iii) CNV regions identification, and (iv) copy number estimation. However, available tools do not support most of the operations required at the first two stages of this pipeline. Typically, they start the analysis by building the read-depth signal from pre-processed alignments. Therefore, third-party tools must be used to perform most of the preliminary operations required to build the read-depth signal. These data-intensive operations can be efficiently parallelized on graphics processing units (GPUs). In this article, we present G-CNV, a GPU-based tool devised to perform the common operations required at the first two stages of the analysis pipeline. G-CNV is able to filter low-quality read sequences, to mask low-quality nucleotides, to remove adapter sequences, to remove duplicated read sequences, to map the short-reads, to resolve multiple mapping ambiguities, to build the read-depth signal, and to normalize it. G-CNV can be efficiently used as a third-party tool able to prepare data for the subsequent read-depth signal generation and analysis. Moreover, it can also be integrated in CNV detection tools to generate read-depth signals

    BITS 2015: The annual meeting of the Italian Society of Bioinformatics

    Get PDF
    This preface introduces the content of the BioMed Central journal Supplements related to the BITS 2015 meeting, held in Milan, Italy, from the 3th to the 5th of June, 2015

    The effects of indigenous microorganisms and water treatment with ion exchange resin on Cu-Ni flotation performance

    Get PDF
    Mineral processing utilizes large amounts of water and aims to reduce water consumption by recirculation and closing the water loops. This results in accumulation of chemical and biological contaminants in process water that may have adverse outcomes on the process performance. To optimize water quality suitable for each process step and plant, knowledge of both chemical and biological effects are needed as well as techniques to best remove the contaminants. This study focused on the consequences of microorganisms, enriched from the actual process earlier, on the flotation performance in the multi-metal Kevitsa mine in Northern Finland and the applicability of ion exchange for the removal of dissolved sulfur species and microorganisms from water. The increase of microbial load from the original 106 to added 107 16S rRNA copies mL−1 affected positively the flotation selectivity, especially in the case of nickel. Two tested water types, process water (PW) and final tailings water (FT), behaved slightly differently. In the Cu flotation phase added microorganisms did not affect the Cu recovery of FT but decreased significantly the recovery of Cu in PW. With equal Cu grade, the recovery was as high as approximately 25 percentage points lower. However, added microorganisms in both water types decreased notably the recovery of Ni in Cu concentrate (18 to 37 %-points). At the same time the amount of Ni recovered in the Ni concentrate increased by 18 to 33 %-points with added microorganisms. Visually the froth layer was higher and more stable in the Ni flotation in experiments with added microorganisms compared to experiments without added microorganisms. The concentrations of dissolved sulfate and thiosulfate ions were low in the studied waters compared to operations treating massive sulfide ores and did not significantly affect the flotation performance. For this reason, the IX water treatment was not required for these ions. However, the IX treatment proved to be effective in removing both sulfur species and microorganisms. The use of dissolved air flotation (DAF) was a successful pretreatment for ion exchange in removal of microorganisms. However, microorganisms are not usually taken into consideration when process performance or water cleaning techniques are designed and optimization could result generally in even better outcome

    Chitin mixed in potting soil alters lettuce growth, the survival of zoonotic bacteria on the leaves and associated rhizosphere microbiology

    Get PDF
    Chitin is a promising soil amendment for improving soil quality, plant growth, and plant resilience. The objectives of this study were twofold. First, to study the effect of chitin mixed in potting soil on lettuce growth and on the survival of two zoonotic bacterial pathogens, Escherichia colt O157:H7 and Salmonella enterica on the lettuce leaves. Second, to assess the related changes in the microbial lettuce rhizosphere, using phospholipid fatty acid (PLFA) analysis and amplicon sequencing of a bacterial 16S rRNA gene fragment and the fungal ITS2. As a result of chitin addition, lettuce fresh yield weight was significantly increased. S. enterica survival in the lettuce phyllosphere was significantly reduced. The E. coli O157:H7 survival was also lowered, but not significantly. Moreover, significant changes were observed in the bacterial and fungal community of the lettuce rhizosphere. PLFA analysis showed a significant increase in fungal and bacterial biomass. Amplicon sequencing showed no increase in fungal and bacterial biodiversity, but relative abundances of the bacterial phyla Acidobacteria, Verrucomicrobia, Actinobacteria, Bacteroidetes, and Proteobacteria and the fungal phyla Ascomycota, Basidiomycota, and Zygomycota were significantly changed. More specifically, a more than 10-fold increase was observed for operational taxonomic units belonging to the bacterial genera Cellvibrio, Pedobacter, Dyadobacter, and Streptomyces and to the fungal genera Lecanicillium and Mortierella. These genera include several species previously reported to be involved in biocontrol, plant growth promotion, the nitrogen cycle and chitin degradation. These results enhance the understanding of the response of the rhizosphere microbiome to chitin amendment. Moreover, this is the first study to investigate the use of soil amendments to control the survival of S. enterica on plant leaves

    Accelerating K-mer Frequency Counting with GPU and Non-Volatile Memory

    Get PDF
    The emergence of Next Generation Sequencing (NGS) platforms has increased the throughput of genomic sequencing and in turn the amount of data that needs to be processed, requiring highly efficient computation for its analysis. In this context, modern architectures including accelerators and non-volatile memory are essential to enable the mass exploitation of these bioinformatics workloads. This paper presents a redesign of the main component of a state-of-the-art reference-free method for variant calling, SMUFIN, which has been adapted to make the most of GPUs and NVM devices. SMUFIN relies on counting the frequency of k-mers (substrings of length k) in DNA sequences, which also constitutes a well-known problem for many bioinformatics workloads, such as genome assembly. We propose techniques to improve the efficiency of k-mer counting and to scale-up workloads like SMUFIN that used to require 16 nodes of Marenostrum 3 to a single machine with a GPU and NVM drives. Results show that although the single machine is not able to improve the time to solution of 16 nodes, its CPU time is 7.5x shorter than the aggregate CPU time of the 16 nodes, with a reduction in energy consumption of 5.5x.This project has received funding from the European Research Council (ERC) under the European Union’s Horizon 2020 research and innovation programme (grant agreement No 639595). It is also partially supported by the Ministry of Economy of Spain under contract TIN2015-65316-P and Generalitat de Catalunya under contract 2014SGR1051, by the ICREA Academia program, and by the BSC-CNS Severo Ochoa program (SEV-2015-0493). We are also grateful to SandDisk for lending the FusionIO cards and to Nvidia who donated the Tesla K40c.Peer ReviewedPostprint (author's final draft
    • 

    corecore