1,991 research outputs found
Astronomy in the Cloud: Using MapReduce for Image Coaddition
In the coming decade, astronomical surveys of the sky will generate tens of
terabytes of images and detect hundreds of millions of sources every night. The
study of these sources will involve computation challenges such as anomaly
detection and classification, and moving object tracking. Since such studies
benefit from the highest quality data, methods such as image coaddition
(stacking) will be a critical preprocessing step prior to scientific
investigation. With a requirement that these images be analyzed on a nightly
basis to identify moving sources or transient objects, these data streams
present many computational challenges. Given the quantity of data involved, the
computational load of these problems can only be addressed by distributing the
workload over a large number of nodes. However, the high data throughput
demanded by these applications may present scalability challenges for certain
storage architectures. One scalable data-processing method that has emerged in
recent years is MapReduce, and in this paper we focus on its popular
open-source implementation called Hadoop. In the Hadoop framework, the data is
partitioned among storage attached directly to worker nodes, and the processing
workload is scheduled in parallel on the nodes that contain the required input
data. A further motivation for using Hadoop is that it allows us to exploit
cloud computing resources, e.g., Amazon's EC2. We report on our experience
implementing a scalable image-processing pipeline for the SDSS imaging database
using Hadoop. This multi-terabyte imaging dataset provides a good testbed for
algorithm development since its scope and structure approximate future surveys.
First, we describe MapReduce and how we adapted image coaddition to the
MapReduce framework. Then we describe a number of optimizations to our basic
approach and report experimental results comparing their performance.Comment: 31 pages, 11 figures, 2 table
QuPARA: Query-Driven Large-Scale Portfolio Aggregate Risk Analysis on MapReduce
Stochastic simulation techniques are used for portfolio risk analysis. Risk
portfolios may consist of thousands of reinsurance contracts covering millions
of insured locations. To quantify risk each portfolio must be evaluated in up
to a million simulation trials, each capturing a different possible sequence of
catastrophic events over the course of a contractual year. In this paper, we
explore the design of a flexible framework for portfolio risk analysis that
facilitates answering a rich variety of catastrophic risk queries. Rather than
aggregating simulation data in order to produce a small set of high-level risk
metrics efficiently (as is often done in production risk management systems),
the focus here is on allowing the user to pose queries on unaggregated or
partially aggregated data. The goal is to provide a flexible framework that can
be used by analysts to answer a wide variety of unanticipated but natural ad
hoc queries. Such detailed queries can help actuaries or underwriters to better
understand the multiple dimensions (e.g., spatial correlation, seasonality,
peril features, construction features, and financial terms) that can impact
portfolio risk. We implemented a prototype system, called QuPARA (Query-Driven
Large-Scale Portfolio Aggregate Risk Analysis), using Hadoop, which is Apache's
implementation of the MapReduce paradigm. This allows the user to take
advantage of large parallel compute servers in order to answer ad hoc risk
analysis queries efficiently even on very large data sets typically encountered
in practice. We describe the design and implementation of QuPARA and present
experimental results that demonstrate its feasibility. A full portfolio risk
analysis run consisting of a 1,000,000 trial simulation, with 1,000 events per
trial, and 3,200 risk transfer contracts can be completed on a 16-node Hadoop
cluster in just over 20 minutes.Comment: 9 pages, IEEE International Conference on Big Data (BigData), Santa
Clara, USA, 201
A Framework for Developing Real-Time OLAP algorithm using Multi-core processing and GPU: Heterogeneous Computing
The overwhelmingly increasing amount of stored data has spurred researchers
seeking different methods in order to optimally take advantage of it which
mostly have faced a response time problem as a result of this enormous size of
data. Most of solutions have suggested materialization as a favourite solution.
However, such a solution cannot attain Real- Time answers anyhow. In this paper
we propose a framework illustrating the barriers and suggested solutions in the
way of achieving Real-Time OLAP answers that are significantly used in decision
support systems and data warehouses
The Family of MapReduce and Large Scale Data Processing Systems
In the last two decades, the continuous increase of computational power has
produced an overwhelming flow of data which has called for a paradigm shift in
the computing architecture and large scale data processing mechanisms.
MapReduce is a simple and powerful programming model that enables easy
development of scalable parallel applications to process vast amounts of data
on large clusters of commodity machines. It isolates the application from the
details of running a distributed program such as issues on data distribution,
scheduling and fault tolerance. However, the original implementation of the
MapReduce framework had some limitations that have been tackled by many
research efforts in several followup works after its introduction. This article
provides a comprehensive survey for a family of approaches and mechanisms of
large scale data processing mechanisms that have been implemented based on the
original idea of the MapReduce framework and are currently gaining a lot of
momentum in both research and industrial communities. We also cover a set of
introduced systems that have been implemented to provide declarative
programming interfaces on top of the MapReduce framework. In addition, we
review several large scale data processing systems that resemble some of the
ideas of the MapReduce framework for different purposes and application
scenarios. Finally, we discuss some of the future research directions for
implementing the next generation of MapReduce-like solutions.Comment: arXiv admin note: text overlap with arXiv:1105.4252 by other author
- โฆ