362 research outputs found

    Benchtop sequencing on benchtop computers

    Get PDF
    Next Generation Sequencing (NGS) is a powerful tool to gain new insights in molecular biology. With the introduction of the first bench top NGS sequencing machines (e.g. Ion Torrent, MiSeq), this technology became even more versatile in its applications and the amount of data that are produced in a short time is ever increasing. The demand for new and more efficient sequence analysis tools increases at the same rate as the throughput of sequencing technologies. New methods and algorithms not only need to be more efficient but also need to account for a higher genetic variability between the sequenced and annotated data. To obtain reliable results, information about errors and limitations of NGS technologies should also be investigated. Furthermore, methods need to be able to cope with contamination in the data. In this thesis we present methods and algorithms for NGS analysis. Firstly, we present a fast and precise method to align NGS reads to a reference genome. This method, called NextGenMap, was designed to work with data from Illumina, 454 and Ion Torrent technologies, and is easily extendable to new upcoming technologies. We use a pairwise sequence alignment in combination with an exact match filter approach to maximize the number of correctly mapped reads. To reduce runtime (mapping a 16x coverage human genome data set within hours) we developed an optimized banded pairwise alignment algorithm for NGS data. We implemented this algorithm using high performance programing interfaces for central processing units using SSE (Streaming SIMD Extensions) and OpenCL as well as for graphic processing units using OpenCL and CUDA. Thus, NextGenMap can make maximal use of all existing hardware no matter whether it is a high end compute cluster or a standard desktop computer or even a laptop. We demonstrated the advantages of NextGenMap based on real and simulated data over other mapping methods and showed that NextGenMap outperforms current methods with respect to the number of correctly mapped reads. The second part of the thesis is an analysis of limitations and errors of Ion Torrent and MiSeq. Sequencing errors were defined as the percentage of mismatches, insertion and deletions per position given a semi-global alignment mapping between read and reference sequence. We measured a mean error rate for MiSeq of 0.8\% and for Ion Torrent of 1.5\%. Moreover we identified for both technologies a non-uniform distribution of errors and even more severe of the corresponding nucleotide frequencies given a difference in the alignment. This is an important result since it reveals that some differences (e.g. mismatches) are more likely to occur than others and thus lead to a biased analysis. When looking at the distribution of the reads accross the sample carrier of the sequencing machine we discovered a clustering of reads that have a high difference (>30%> 30\%) compared to the reference sequence. This is unexpected since reads with a high difference are believed to origin either from contamination or errors in the library preparation, and should therefore be uniformly distributed on the sample carrier of the sequencing machine. Finally, we present a method called DeFenSe (Detection of Falsely Aligned Sequences) to detect and reduce contamination in NGS data. DeFenSe computes a pairwise alignment score threshold based on the alignment of randomly sampled reads to the reference genome. This threshold is then used to filter the mapped reads. It was applied in combination with two widely used mapping programs to real data resulting in a reduction of contamination of up to 99.8\%. In contrast to previous methods DeFenSe works independently of the number of differences between the reference and the targeted genome. Moreover, DeFenSe neither relies on ad hoc decisions like identity threshold or mapping quality thresholds nor does it require prior knowledge of the sequenced organism. The combination of these methods may lead to the possibility of transferring knowledge from model organisms to non model organisms by the usage of NGS. In addition, it enables to study biological mechanisms even in high polymorphic regions.Next Generation Sequencing (NGS) is a powerful tool to gain new insights in molecular biology. With the introduction of the first bench top NGS sequencing machines (e.g. Ion Torrent, MiSeq), this technology became even more versatile in its applications and the amount of data that are produced in a short time is ever increasing. The demand for new and more efficient sequence analysis tools increases at the same rate as the throughput of sequencing technologies. New methods and algorithms not only need to be more efficient but also need to account for a higher genetic variability between the sequenced and annotated data. To obtain reliable results, information about errors and limitations of NGS technologies should also be investigated. Furthermore, methods need to be able to cope with contamination in the data. In this thesis we present methods and algorithms for NGS analysis. Firstly, we present a fast and precise method to align NGS reads to a reference genome. This method, called NextGenMap, was designed to work with data from Illumina, 454 and Ion Torrent technologies, and is easily extendable to new upcoming technologies. We use a pairwise sequence alignment in combination with an exact match filter approach to maximize the number of correctly mapped reads. To reduce runtime (mapping a 16x coverage human genome data set within hours) we developed an optimized banded pairwise alignment algorithm for NGS data. We implemented this algorithm using high performance programing interfaces for central processing units using SSE (Streaming SIMD Extensions) and OpenCL as well as for graphic processing units using OpenCL and CUDA. Thus, NextGenMap can make maximal use of all existing hardware no matter whether it is a high end compute cluster or a standard desktop computer or even a laptop. We demonstrated the advantages of NextGenMap based on real and simulated data over other mapping methods and showed that NextGenMap outperforms current methods with respect to the number of correctly mapped reads. The second part of the thesis is an analysis of limitations and errors of Ion Torrent and MiSeq. Sequencing errors were defined as the percentage of mismatches, insertion and deletions per position given a semi-global alignment mapping between read and reference sequence. We measured a mean error rate for MiSeq of 0.8\% and for Ion Torrent of 1.5\%. Moreover we identified for both technologies a non-uniform distribution of errors and even more severe of the corresponding nucleotide frequencies given a difference in the alignment. This is an important result since it reveals that some differences (e.g. mismatches) are more likely to occur than others and thus lead to a biased analysis. When looking at the distribution of the reads accross the sample carrier of the sequencing machine we discovered a clustering of reads that have a high difference (>30%> 30\%) compared to the reference sequence. This is unexpected since reads with a high difference are believed to origin either from contamination or errors in the library preparation, and should therefore be uniformly distributed on the sample carrier of the sequencing machine. Finally, we present a method called DeFenSe (Detection of Falsely Aligned Sequences) to detect and reduce contamination in NGS data. DeFenSe computes a pairwise alignment score threshold based on the alignment of randomly sampled reads to the reference genome. This threshold is then used to filter the mapped reads. It was applied in combination with two widely used mapping programs to real data resulting in a reduction of contamination of up to 99.8\%. In contrast to previous methods DeFenSe works independently of the number of differences between the reference and the targeted genome. Moreover, DeFenSe neither relies on ad hoc decisions like identity threshold or mapping quality thresholds nor does it require prior knowledge of the sequenced organism. The combination of these methods may lead to the possibility of transferring knowledge from model organisms to non model organisms by the usage of NGS. In addition, it enables to study biological mechanisms even in high polymorphic regions

    Use of Photogrammetry for Non-Disturbance Underwater Survey: An Analysis of In Situ Stone Anchors

    Get PDF
    Stone anchors comprise a significant portion of observable underwater cultural heritage in the Mediterranean and provide evidence for trade networks as early as the Bronze Age. Full documentation of these anchors, however, often requires their removal from their underwater environment, especially to calculate mass. We offer a methodology for using photogrammetry to record stone anchors still in situ and calculate their approximate mass. We compare measurements derived using measuring tapes with those derived using two different software programs for photogrammetric analysis, PhotoModeler Scanner (Eos Systems, Inc.) and PhotoScan Pro (Agisoft). First, we analyze stone anchors that had previously been removed from the underwater environment to establish a reference methodology. Next, we implement this methodology in an underwater survey off the southern coastline of Cyprus. Linear measurements for both programs correlate closely with those attained via measuring tape. The resulting estimates of volume of anchors in situ and on land are slightly greater using the photogrammetric methodology than the reference volumes obtained using a water displacement methodology. Overall, as an analytical tool, this methodology generates detailed surface information in minimal time underwater and preserves data for future analysis without necessitating the removal of the anchor from its underwater environment

    Teaser: Individualized benchmarking and optimization of read mapping results for NGS data

    Get PDF
    Mapping reads to a genome remains challenging, especially for non-model organisms with lower quality assemblies, or for organisms with higher mutation rates. While most research has focused on speeding up the mapping process, little attention has been paid to optimize the choice of mapper and parameters for a user's dataset. Here, we present Teaser, a software that assists in these choices through rapid automated benchmarking of different mappers and parameter settings for individualized data. Within minutes, Teaser completes a quantitative evaluation of an ensemble of mapping algorithms and parameters. We use Teaser to demonstrate how Bowtie2 can be optimized for different data

    Structural variant calling: the long and the short of it.

    Get PDF
    Recent research into structural variants (SVs) has established their importance to medicine and molecular biology, elucidating their role in various diseases, regulation of gene expression, ethnic diversity, and large-scale chromosome evolution-giving rise to the differences within populations and among species. Nevertheless, characterizing SVs and determining the optimal approach for a given experimental design remains a computational and scientific challenge. Multiple approaches have emerged to target various SV classes, zygosities, and size ranges. Here, we review these approaches with respect to their ability to infer SVs across the full spectrum of large, complex variations and present computational methods for each approach

    The third international hackathon for applying insights into large-scale genomic composition to use cases in a wide range of organisms

    Get PDF
    In October 2021, 59 scientists from 14 countries and 13 U.S. states collaborated virtually in the Third Annual Baylor College of Medicine & DNANexus Structural Variation hackathon. The goal of the hackathon was to advance research on structural variants (SVs) by prototyping and iterating on open-source software. This led to nine hackathon projects focused on diverse genomics research interests, including various SV discovery and genotyping methods, SV sequence reconstruction, and clinically relevant structural variation, including SARS-CoV-2 variants. Repositories for the projects that participated in the hackathon are available at https://github.com/collaborativebioinformatics

    Recovery of molybdenum, chromium, tungsten, copper, silver, and zinc from industrial waste waters using zero-valent iron and tailored beneficiation processes

    Get PDF
    Zero-valent iron (ZVI) has been used for water treatment for more than 160 years. However, passivation of its surface often constituted a problem which could only be tackled recently by the innovative Ferrodecont process using a fluidized bed reactor. In this study, pilot scale experiments for the removal of Mo, Cr, W, Cu, Ag and Zn from two industrial waste water samples and lab-scale experiments for the beneficiation of the abrasion products are presented to integrate the Ferrodecont process into a complete recycling process chain. Firstly, 38.5 % of Cu was removed from sample A, yielding abrasion products containing 33.1 wt% Cu as metallic copper (Cu) and various Cu compounds. The treatment of sample B removed 99.8 % of Mo, yielding abrasion products containing 17.8 wt% of Mo as amorphous phases or adsorbed species. Thermal treatment (1300 °C) of the abrasion product A indicated a reduction of delafossite to metallic Cu according to differential scanning calorimetry (DSC), thermogravimetry (TG) and X-ray diffraction (XRD), which was successfully separated from the magnetic iron phases. Hydrometallurgical treatment (1.5 M NaOH, 3 d, liquid:solid ratio (L:S) = 15:1) of sample B yielded aqueous extracts with Mo concentrations of 5820 to 6300 mgL−1. In conclusion, this corresponds to an up to 53-fold enrichment of Mo during the entire process chain
    corecore