17,382 research outputs found
21st Century Simulation: Exploiting High Performance Computing and Data Analysis
This paper identifies, defines, and analyzes the limitations imposed on Modeling and Simulation by outmoded
paradigms in computer utilization and data analysis. The authors then discuss two emerging capabilities to
overcome these limitations: High Performance Parallel Computing and Advanced Data Analysis. First, parallel
computing, in supercomputers and Linux clusters, has proven effective by providing users an advantage in
computing power. This has been characterized as a ten-year lead over the use of single-processor computers.
Second, advanced data analysis techniques are both necessitated and enabled by this leap in computing power.
JFCOM's JESPP project is one of the few simulation initiatives to effectively embrace these concepts. The
challenges facing the defense analyst today have grown to include the need to consider operations among non-combatant
populations, to focus on impacts to civilian infrastructure, to differentiate combatants from non-combatants,
and to understand non-linear, asymmetric warfare. These requirements stretch both current
computational techniques and data analysis methodologies. In this paper, documented examples and potential
solutions will be advanced. The authors discuss the paths to successful implementation based on their experience.
Reviewed technologies include parallel computing, cluster computing, grid computing, data logging, OpsResearch,
database advances, data mining, evolutionary computing, genetic algorithms, and Monte Carlo sensitivity analyses.
The modeling and simulation community has significant potential to provide more opportunities for training and
analysis. Simulations must include increasingly sophisticated environments, better emulations of foes, and more
realistic civilian populations. Overcoming the implementation challenges will produce dramatically better insights,
for trainees and analysts. High Performance Parallel Computing and Advanced Data Analysis promise increased
understanding of future vulnerabilities to help avoid unneeded mission failures and unacceptable personnel losses.
The authors set forth road maps for rapid prototyping and adoption of advanced capabilities. They discuss the
beneficial impact of embracing these technologies, as well as risk mitigation required to ensure success
The Family of MapReduce and Large Scale Data Processing Systems
In the last two decades, the continuous increase of computational power has
produced an overwhelming flow of data which has called for a paradigm shift in
the computing architecture and large scale data processing mechanisms.
MapReduce is a simple and powerful programming model that enables easy
development of scalable parallel applications to process vast amounts of data
on large clusters of commodity machines. It isolates the application from the
details of running a distributed program such as issues on data distribution,
scheduling and fault tolerance. However, the original implementation of the
MapReduce framework had some limitations that have been tackled by many
research efforts in several followup works after its introduction. This article
provides a comprehensive survey for a family of approaches and mechanisms of
large scale data processing mechanisms that have been implemented based on the
original idea of the MapReduce framework and are currently gaining a lot of
momentum in both research and industrial communities. We also cover a set of
introduced systems that have been implemented to provide declarative
programming interfaces on top of the MapReduce framework. In addition, we
review several large scale data processing systems that resemble some of the
ideas of the MapReduce framework for different purposes and application
scenarios. Finally, we discuss some of the future research directions for
implementing the next generation of MapReduce-like solutions.Comment: arXiv admin note: text overlap with arXiv:1105.4252 by other author
Integrating R and Hadoop for Big Data Analysis
Analyzing and working with big data could be very diffi cult using classical
means like relational database management systems or desktop software packages
for statistics and visualization. Instead, big data requires large clusters
with hundreds or even thousands of computing nodes. Offi cial statistics is
increasingly considering big data for deriving new statistics because big data
sources could produce more relevant and timely statistics than traditional
sources. One of the software tools successfully and wide spread used for
storage and processing of big data sets on clusters of commodity hardware is
Hadoop. Hadoop framework contains libraries, a distributed fi le-system (HDFS),
a resource-management platform and implements a version of the MapReduce
programming model for large scale data processing. In this paper we investigate
the possibilities of integrating Hadoop with R which is a popular software used
for statistical computing and data visualization. We present three ways of
integrating them: R with Streaming, Rhipe and RHadoop and we emphasize the
advantages and disadvantages of each solution.Comment: Romanian Statistical Review no. 2 / 201
Learning from the Success of MPI
The Message Passing Interface (MPI) has been extremely successful as a
portable way to program high-performance parallel computers. This success has
occurred in spite of the view of many that message passing is difficult and
that other approaches, including automatic parallelization and directive-based
parallelism, are easier to use. This paper argues that MPI has succeeded
because it addresses all of the important issues in providing a parallel
programming model.Comment: 12 pages, 1 figur
- …