11,550 research outputs found
ALOJA: A benchmarking and predictive platform for big data performance analysis
The main goals of the ALOJA research project from BSC-MSR, are to explore and automate the characterization of cost-effectivenessof Big Data deployments. The development of the project over its first year, has resulted in a open source benchmarking platform, an online public repository of results with over 42,000 Hadoop job runs, and web-based analytic tools to gather insights about system's cost-performance1.
This article describes the evolution of the project's focus and research
lines from over a year of continuously benchmarking Hadoop under dif-
ferent configuration and deployments options, presents results, and dis
cusses the motivation both technical and market-based of such changes.
During this time, ALOJA's target has evolved from a previous low-level
profiling of Hadoop runtime, passing through extensive benchmarking
and evaluation of a large body of results via aggregation, to currently
leveraging Predictive Analytics (PA) techniques. Modeling benchmark
executions allow us to estimate the results of new or untested configu-
rations or hardware set-ups automatically, by learning techniques from
past observations saving in benchmarking time and costs.This work is partially supported the BSC-Microsoft Research Centre, the Span-
ish Ministry of Education (TIN2012-34557), the MINECO Severo Ochoa Research program (SEV-2011-0067) and the Generalitat de Catalunya (2014-SGR-1051).Peer ReviewedPostprint (author's final draft
Scratchpad Sharing in GPUs
GPGPU applications exploit on-chip scratchpad memory available in the
Graphics Processing Units (GPUs) to improve performance. The amount of thread
level parallelism present in the GPU is limited by the number of resident
threads, which in turn depends on the availability of scratchpad memory in its
streaming multiprocessor (SM). Since the scratchpad memory is allocated at
thread block granularity, part of the memory may remain unutilized. In this
paper, we propose architectural and compiler optimizations to improve the
scratchpad utilization. Our approach, Scratchpad Sharing, addresses scratchpad
under-utilization by launching additional thread blocks in each SM. These
thread blocks use unutilized scratchpad and also share scratchpad with other
resident blocks. To improve the performance of scratchpad sharing, we propose
Owner Warp First (OWF) scheduling that schedules warps from the additional
thread blocks effectively. The performance of this approach, however, is
limited by the availability of the shared part of scratchpad.
We propose compiler optimizations to improve the availability of shared
scratchpad. We describe a scratchpad allocation scheme that helps in allocating
scratchpad variables such that shared scratchpad is accessed for short
duration. We introduce a new instruction, relssp, that when executed, releases
the shared scratchpad. Finally, we describe an analysis for optimal placement
of relssp instructions such that shared scratchpad is released as early as
possible.
We implemented the hardware changes using the GPGPU-Sim simulator and
implemented the compiler optimizations in Ocelot framework. We evaluated the
effectiveness of our approach on 19 kernels from 3 benchmarks suites: CUDA-SDK,
GPGPU-Sim, and Rodinia. The kernels that underutilize scratchpad memory show an
average improvement of 19% and maximum improvement of 92.17% compared to the
baseline approach
Multidimensional Scaling with Regional Restrictions for Facet Theory: An Application to Levi's Political Protest Data
Multidimensional scaling (MDS) is often used for the analysis of correlation matrices of items generated by a facet theory design. The emphasis of the analysis is on regional hypotheses on the location of the items in the MDS solution. An important regional hypothesis is the axial constraint where the items from different levels of a facet are assumed to be located in different parallel slices. The simplest approach is to do an MDS and draw the parallel lines separating the slices as good as possible by hand. Alternatively, Borg and Shye (1995) propose to automate the second step. Borg and Groenen (1997, 2005) proposed a simultaneous approach for ordered facets when the number of MDS dimensions equals the number of facets. In this paper, we propose a new algorithm that estimates an MDS solution subject to axial constraints without the restriction that the number of facets equals the number of dimensions. The algorithm is based on constrained iterative majorization of De Leeuw and Heiser (1980) with special constraints. This algorithm is applied to Levi’s (1983) data on political protests.Axial Partitioning;Constrained Estimation;Facet Theory;Iterative Majorization;Multidimensional Scaling;Regional Restrictions
Automated Generation of Geometric Theorems from Images of Diagrams
We propose an approach to generate geometric theorems from electronic images
of diagrams automatically. The approach makes use of techniques of Hough
transform to recognize geometric objects and their labels and of numeric
verification to mine basic geometric relations. Candidate propositions are
generated from the retrieved information by using six strategies and geometric
theorems are obtained from the candidates via algebraic computation.
Experiments with a preliminary implementation illustrate the effectiveness and
efficiency of the proposed approach for generating nontrivial theorems from
images of diagrams. This work demonstrates the feasibility of automated
discovery of profound geometric knowledge from simple image data and has
potential applications in geometric knowledge management and education.Comment: 31 pages. Submitted to Annals of Mathematics and Artificial
Intelligence (special issue on Geometric Reasoning
REPP-H: runtime estimation of power and performance on heterogeneous data centers
Modern data centers increasingly demand improved performance with minimal power consumption. Managing the power and performance requirements of the applications is challenging because these data centers, incidentally or intentionally, have to deal with server architecture heterogeneity [19], [22]. One critical challenge that data centers have to face is how to manage system power and performance given the different application behavior across multiple different architectures.This work has been supported by the EU FP7 program (Mont-Blanc 2, ICT-610402), by the
Ministerio de Economia (CAP-VII, TIN2015-65316-P), and the Generalitat de Catalunya (MPEXPAR, 2014-SGR-1051).
The material herein is based in part upon work supported by the US NSF, grant numbers ACI-1535232 and CNS-1305220.Peer ReviewedPostprint (author's final draft
A Computational Study of Genetic Crossover Operators for Multi-Objective Vehicle Routing Problem with Soft Time Windows
The article describes an investigation of the effectiveness of genetic
algorithms for multi-objective combinatorial optimization (MOCO) by presenting
an application for the vehicle routing problem with soft time windows. The work
is motivated by the question, if and how the problem structure influences the
effectiveness of different configurations of the genetic algorithm.
Computational results are presented for different classes of vehicle routing
problems, varying in their coverage with time windows, time window size,
distribution and number of customers. The results are compared with a simple,
but effective local search approach for multi-objective combinatorial
optimization problems
- …