240 research outputs found

    Analyzing large-scale DNA Sequences on Multi-core Architectures

    Full text link
    Rapid analysis of DNA sequences is important in preventing the evolution of different viruses and bacteria during an early phase, early diagnosis of genetic predispositions to certain diseases (cancer, cardiovascular diseases), and in DNA forensics. However, real-world DNA sequences may comprise several Gigabytes and the process of DNA analysis demands adequate computational resources to be completed within a reasonable time. In this paper we present a scalable approach for parallel DNA analysis that is based on Finite Automata, and which is suitable for analyzing very large DNA segments. We evaluate our approach for real-world DNA segments of mouse (2.7GB), cat (2.4GB), dog (2.4GB), chicken (1GB), human (3.2GB) and turkey (0.2GB). Experimental results on a dual-socket shared-memory system with 24 physical cores show speed-ups of up to 17.6x. Our approach is up to 3x faster than a pattern-based parallel approach that uses the RE2 library.Comment: The 18th IEEE International Conference on Computational Science and Engineering (CSE 2015), Porto, Portugal, 20 - 23 October 201

    Computing Platforms for Big Biological Data Analytics: Perspectives and Challenges.

    Full text link
    The last decade has witnessed an explosion in the amount of available biological sequence data, due to the rapid progress of high-throughput sequencing projects. However, the biological data amount is becoming so great that traditional data analysis platforms and methods can no longer meet the need to rapidly perform data analysis tasks in life sciences. As a result, both biologists and computer scientists are facing the challenge of gaining a profound insight into the deepest biological functions from big biological data. This in turn requires massive computational resources. Therefore, high performance computing (HPC) platforms are highly needed as well as efficient and scalable algorithms that can take advantage of these platforms. In this paper, we survey the state-of-the-art HPC platforms for big biological data analytics. We first list the characteristics of big biological data and popular computing platforms. Then we provide a taxonomy of different biological data analysis applications and a survey of the way they have been mapped onto various computing platforms. After that, we present a case study to compare the efficiency of different computing platforms for handling the classical biological sequence alignment problem. At last we discuss the open issues in big biological data analytics

    An energy‐aware performance analysis of SWIMM: Smith–Waterman implementation on Intel's Multicore and Manycore architectures

    Get PDF
    Alignment is essential in many areas such as biological, chemical and criminal forensics. The well‐known Smith–Waterman (SW) algorithm is able to retrieve the optimal local alignment with quadratic time and space complexity. There are several implementations that take advantage of computing parallelization, such as manycores, FPGAs or GPUs, in order to reduce the alignment effort. In this research, we adapt, develop and tune the SW algorithm named SWIMM on a heterogeneous platform based on Intel's Xeon and Xeon Phi coprocessor. SWIMM is a free tool available in a public git repository https://github.com/enzorucci/SWIMM. We efficiently exploit data and thread‐level parallelism, reaching up to 380 GCUPS on heterogeneous architecture, 350 GCUPS for the isolated Xeon and 50 GCUPS on Xeon Phi. Despite the heterogeneous implementation obtaining the best performance, it is also the most energy‐demanding. In fact, we also present a trade‐off analysis between performance and power consumption. The greenest configuration is based on an isolated multicore system that exploits AVX2 instruction set architecture reaching 1.5 GCUPS/Watts.Facultad de Informátic

    Smith-Waterman algorithm on heterogeneous systems: A case study

    Get PDF
    The well-known Smith-Waterman (SW) algorithm is a high-sensitivity method for local alignments. However, SW is expensive in terms of both execution time and memory usage, which makes it impractical in many applications. Some heuristics are possible but at the expense of losing sensitivity. Fortunately, previous research have shown that new computing platforms such as GPUs and FPGAs are able to accelerate SW and achieve impressive speedups. In this paper we have explored SW acceleration on a heterogeneous platform equipped with an Intel Xeon Phi coprocessor. Our evaluation, using the well-known Swiss-Prot database as a benchmark, has shown that a hybrid CPU-Phi heterogeneous system is able to achieve competitive performance (62.6 GCUPS), even with moderate low-level optimisations.Facultad de Informátic

    Accelerating Smith-Waterman Alignment of Long DNA Sequences with OpenCL on FPGA

    Get PDF
    With the greater importance of parallel architectures such as GPUs or Xeon Phi accelerators, the scientific community has developed efficient solutions in the bioinformatics field. In this context, FPGAs begin to stand out as high performance devices with moderate power consumption. This paper presents and evaluates a parallel strategy of the well-known Smith-Waterman algorithm using OpenCL on Intel/Altera’s FPGA for long DNA sequences. We efficiently exploit data and pipeline parallelism on a Intel/Altera Stratix V FPGA reaching upto 114 GCUPS in less than 25 watt power requirements.Publicado en Lecture Notes in Computer Science book series (LNCS, vol. 10209).Facultad de Informátic

    Accelerating Smith-Waterman Alignment of Long DNA Sequences with OpenCL on FPGA

    Get PDF
    With the greater importance of parallel architectures such as GPUs or Xeon Phi accelerators, the scientific community has developed efficient solutions in the bioinformatics field. In this context, FPGAs begin to stand out as high performance devices with moderate power consumption. This paper presents and evaluates a parallel strategy of the well-known Smith-Waterman algorithm using OpenCL on Intel/Altera’s FPGA for long DNA sequences. We efficiently exploit data and pipeline parallelism on a Intel/Altera Stratix V FPGA reaching upto 114 GCUPS in less than 25 watt power requirements.Publicado en Lecture Notes in Computer Science book series (LNCS, vol. 10209).Facultad de Informátic

    State-of-the-art in Smith-Waterman Protein Database Search on HPC Platforms

    Get PDF
    Searching biological sequence database is a common and repeated task in bioinformatics and molecular biology. The Smith–Waterman algorithm is the most accurate method for this kind of search. Unfortunately, this algorithm is computationally demanding and the situation gets worse due to the exponential growth of biological data in the last years. For that reason, the scientific community has made great efforts to accelerate Smith–Waterman biological database searches in a wide variety of hardware platforms. We give a survey of the state-of-the-art in Smith–Waterman protein database search, focusing on four hardware architectures: central processing units, graphics processing units, field programmable gate arrays and Xeon Phi coprocessors. After briefly describing each hardware platform, we analyse temporal evolution, contributions, limitations and experimental work and the results of each implementation. Additionally, as energy efficiency is becoming more important every day, we also survey performance/power consumption works. Finally, we give our view on the future of Smith–Waterman protein searches considering next generations of hardware architectures and its upcoming technologies.Instituto de Investigación en InformáticaUniversidad Complutense de Madri

    Accelerating the pace of protein functional annotation with intel xeon phi coprocessors

    Get PDF
    © 2002-2011 IEEE. Intel Xeon Phi is a new addition to the family of powerful parallel accelerators. The range of its potential applications in computationally driven research is broad; however, at present, the repository of scientific codes is still relatively limited. In this study, we describe the development and benchmarking of a parallel version of {\mmb e}FindSite, a structural bioinformatics algorithm for the prediction of ligand-binding sites in proteins. Implemented for the Intel Xeon Phi platform, the parallelization of the structure alignment portion of {\mmb e}FindSite using pragma-based OpenMP brings about the desired performance improvements, which scale well with the number of computing cores. Compared to a serial version, the parallel code runs 11.8 and 10.1 times faster on the CPU and the coprocessor, respectively; when both resources are utilized simultaneously, the speedup is 17.6. For example, ligand-binding predictions for 501 benchmarking proteins are completed in 2.1 hours on a single Stampede node equipped with the Intel Xeon Phi card compared to 3.1 hours without the accelerator and 36.8 hours required by a serial version. In addition to the satisfactory parallel performance, porting existing scientific codes to the Intel Xeon Phi architecture is relatively straightforward with a short development time due to the support of common parallel programming models by the coprocessor. The parallel version of {\mmb e}FindSite is freely available to the academic community at www.brylinski.org/efindsite
    corecore