1,796 research outputs found
A Tale of Two Data-Intensive Paradigms: Applications, Abstractions, and Architectures
Scientific problems that depend on processing large amounts of data require
overcoming challenges in multiple areas: managing large-scale data
distribution, co-placement and scheduling of data with compute resources, and
storing and transferring large volumes of data. We analyze the ecosystems of
the two prominent paradigms for data-intensive applications, hereafter referred
to as the high-performance computing and the Apache-Hadoop paradigm. We propose
a basis, common terminology and functional factors upon which to analyze the
two approaches of both paradigms. We discuss the concept of "Big Data Ogres"
and their facets as means of understanding and characterizing the most common
application workloads found across the two paradigms. We then discuss the
salient features of the two paradigms, and compare and contrast the two
approaches. Specifically, we examine common implementation/approaches of these
paradigms, shed light upon the reasons for their current "architecture" and
discuss some typical workloads that utilize them. In spite of the significant
software distinctions, we believe there is architectural similarity. We discuss
the potential integration of different implementations, across the different
levels and components. Our comparison progresses from a fully qualitative
examination of the two paradigms, to a semi-quantitative methodology. We use a
simple and broadly used Ogre (K-means clustering), characterize its performance
on a range of representative platforms, covering several implementations from
both paradigms. Our experiments provide an insight into the relative strengths
of the two paradigms. We propose that the set of Ogres will serve as a
benchmark to evaluate the two paradigms along different dimensions.Comment: 8 pages, 2 figure
Accelerated iterative image reconstruction for cone-beam computed tomography through Big Data frameworks
One of the latest trends in Computed Tomography (CT) is the reduction of the radiation dose delivered to patients through the decrease of the amount of acquired data. This reduction results in artifacts in the final images if conventional reconstruction methods are used, making it advisable to employ iterative algorithms to enhance image quality. Most approaches are built around two main operators, backprojection and projection, which are computationally expensive. In this work, we present an implementation of those operators for iterative reconstruction methods exploiting the Big Data paradigm. We define an architecture based on Apache Spark that supports both Graphical Processing Units (GPU) and CPU-based architectures. The aforementioned are parallelized using a partitioning scheme based on the division of the volume and irregular data structures in order to reduce the cost of communication and computation of the final images. Our solution accelerates the execution of the two most computational expensive components with Apache Spark, improving the programming experience of new iterative reconstruction algorithms and the maintainability of the source code increasing the level of abstraction for non-experienced high performance programmers. Through an experimental evaluation, we show that we can obtain results up to 10 faster for projection and 21 faster for backprojection when using a GPU-based cluster compared to a traditional multi-core version. Although a linear speed up was not reached, the proposed approach can be a good alternative for porting previous medical image reconstruction applications already implemented in C/C++ or even with CUDA or OpenCL programming models. Our solution enables the automatic detection of the GPU devices and execution on CPU and GPU tasks at the same time under the same system, using all the available resources.This work was supported by the NIH, United States under Grant R01-HL-098686 and Grant U01 EB018753, the Spanish Ministerio de Economia y Competitividad (projects TEC2013-47270-R, RTC-2014-3028 and TIN2016-79637-P), the Spanish Ministerio de Educacion (grant FPU14/03875), the Spanish Ministerio de Ciencia, Innovacion y Universidades (Instituto de Salud Carlos III, project DTS17/00122; Agencia Estatal de Investigacion, project DPI2016-79075-R-AEI/FEDER, UE), co-funded by European Regional Development Fund (ERDF), ‘‘A way of making Europe’’. The CNIC is supported by the Ministerio de Ciencia, Spain, Innovacion y Universidades, Spain and the Pro CNIC Foundation, Spain, and is a Severo Ochoa Center of Excellence, Spain (SEV-2015-0505). Finally, this research was partially supported by Madrid regional Government, Spain under the grant ’’Convergencia Big data-Hpc: de los sensores a las Aplicaciones. (CABAHLA-CM)’’. Ref: S2018/TCS-4423
- …