4 research outputs found

    third generation sequencing data analytics on mobile devices cache oblivious and out of core approaches as a proof of concept

    Get PDF
    Abstract Mobile (third-generation) sequencing technologies, including Oxford Nanopore's MinION and SmidgION, have the benefit of outputting long sequence reads (up to hundred thousands of bases) in a portable manner. These sequencing devices fit in the palm of a hand and only require a USB outlet. Unfortunately, the development of data analysis tools for these technologies is in a nascent stage, impeding on the portability of these devices. The objective of this work is to introduce an out-of-core approach to port Nanopore analytics on mobile devices such as tablets or smartphones, often used in extreme experimental settings with special ergonomics needs and ease of sterilization. In this paper, we present a serial k-mer parser/counter for FAST5 files, and a de Bruijn graph construction method which can run on a hand-held device. In order to accomplish this portability we develop novel cache oblivious data structures and out-of-core chunked processing methods. Our toolset, which we refer to as Nanopore Portable Analytics Library (NanoPAL), wase implemented in ISO C++ v.14 and compiled for Android devices. Using MinION data (Zaire Ebolavirus species and others), we evaluate the time required to parse and build the de Bruijn graph with respect to the file sizes and RAM allocation. These metrics were compared to those of minimap/miniasm. On an LG Nexus 5 with 2GB or RAM, 2MB L2 cache and 16GB storage, the out-of-core NanoPAL is able to process FAST5 files at about 30 minutes per 0.5 GB, creating sorted k-mer and de Bruijn graph files. The recompiled minimap/miniasm tool cannot complete FAST5 files larger than 170MB. In conjunction with base calling/error correction, and with addition of assembly procedures downstream, NanoPAL can be effectively used to perform analyses of MinION/SmidgION data locally on a mobile device

    efficient data structures for mobile de novo genome assembly by third generation sequencing

    Get PDF
    Abstract Mobile/portable (third-generation) sequencing technologies, including Oxford Nanopore's MinION and SmidgION, are revolutionizing once again –after the advent of high-throughput sequencing– biomedical sciences. They combine an increase in sequence length (up to hundred thousands of bases) with extreme portability. While a sequencer now fits the palm of a hand and needs only a USB outlet or a mobile phone/tablet to work, the data analysis phases are bound to an available Internet connection and cloud computing. This somehow hampers the portability paradigm, especially if the technology is used in resource-limited settings or remote areas with limited connectivity. In this work, we introduce efficient data structures to effectively enable portable data analytics by means of third-generation sequencing. Specifically, we show how sequence overlap graphs (fixed length k-mers, with an extension on variable lengths) can be built and stored on a mobile phone, thereby allowing the execution of de novo genome assembly algorithms (along with ad-hoc strategies for error correction) without the need of transfer data over the Internet nor execution on a desktop

    Visual programming for next-generation sequencing data analytics

    Get PDF
    Background: High-throughput or next-generation sequencing (NGS) technologies have become an established and affordable experimental framework in biological and medical sciences for all basic and translational research. Processing and analyzing NGS data is challenging. NGS data are big, heterogeneous, sparse, and error prone. Although a plethora of tools for NGS data analysis has emerged in the past decade, (i) software development is still lagging behind data generation capabilities, and (ii) there is a 'cultural' gap between the end user and the developer. Text: Generic software template libraries specifically developed for NGS can help in dealing with the former problem, whilst coupling template libraries with visual programming may help with the latter. Here we scrutinize the state-of-the-art low-level software libraries implemented specifically for NGS and graphical tools for NGS analytics. An ideal developing environment for NGS should be modular (with a native library interface), scalable in computational methods (i.e. serial, multithread, distributed), transparent (platform-independent), interoperable (with external software interface), and usable (via an intuitive graphical user interface). These characteristics should facilitate both the run of standardized NGS pipelines and the development of new workflows based on technological advancements or users' needs. We discuss in detail the potential of a computational framework blending generic template programming and visual programming that addresses all of the current limitations. Conclusion: In the long term, a proper, well-developed (although not necessarily unique) software framework will bridge the current gap between data generation and hypothesis testing. This will eventually facilitate the development of novel diagnostic tools embedded in routine healthcare
    corecore