3,328 research outputs found
Domain knowledge specification for energy tuning
To overcome the challenges of energy consumption of HPC systems, the European Union Horizon 2020 READEX (Runtime Exploitation of Application Dynamism for Energy-efficient Exascale computing) project uses an online auto-tuning approach to improve energy efficiency of HPC applications. The READEX methodology pre-computes optimal system configurations at design-time, such as the CPU frequency, for instances of program regions and switches at runtime to the configuration given in the tuning model when the region is executed. READEX goes beyond previous approaches by exploiting dynamic changes of a region's characteristics by leveraging region and characteristic specific system configurations. While the tool suite supports an automatic approach, specifying domain knowledge such as the structure and characteristics of the application and application tuning parameters can significantly help to create a more refined tuning model. This paper presents the means available for an application expert to provide domain knowledge and presents tuning results for some benchmarks.Web of Science316art. no. E465
Simplifying the Development, Use and Sustainability of HPC Software
Developing software to undertake complex, compute-intensive scientific
processes requires a challenging combination of both specialist domain
knowledge and software development skills to convert this knowledge into
efficient code. As computational platforms become increasingly heterogeneous
and newer types of platform such as Infrastructure-as-a-Service (IaaS) cloud
computing become more widely accepted for HPC computations, scientists require
more support from computer scientists and resource providers to develop
efficient code and make optimal use of the resources available to them. As part
of the libhpc stage 1 and 2 projects we are developing a framework to provide a
richer means of job specification and efficient execution of complex scientific
software on heterogeneous infrastructure. The use of such frameworks has
implications for the sustainability of scientific software. In this paper we
set out our developing understanding of these challenges based on work carried
out in the libhpc project.Comment: 4 page position paper, submission to WSSSPE13 worksho
HPC Cloud for Scientific and Business Applications: Taxonomy, Vision, and Research Challenges
High Performance Computing (HPC) clouds are becoming an alternative to
on-premise clusters for executing scientific applications and business
analytics services. Most research efforts in HPC cloud aim to understand the
cost-benefit of moving resource-intensive applications from on-premise
environments to public cloud platforms. Industry trends show hybrid
environments are the natural path to get the best of the on-premise and cloud
resources---steady (and sensitive) workloads can run on on-premise resources
and peak demand can leverage remote resources in a pay-as-you-go manner.
Nevertheless, there are plenty of questions to be answered in HPC cloud, which
range from how to extract the best performance of an unknown underlying
platform to what services are essential to make its usage easier. Moreover, the
discussion on the right pricing and contractual models to fit small and large
users is relevant for the sustainability of HPC clouds. This paper brings a
survey and taxonomy of efforts in HPC cloud and a vision on what we believe is
ahead of us, including a set of research challenges that, once tackled, can
help advance businesses and scientific discoveries. This becomes particularly
relevant due to the fast increasing wave of new HPC applications coming from
big data and artificial intelligence.Comment: 29 pages, 5 figures, Published in ACM Computing Surveys (CSUR
An Efficient Monte Carlo-based Probabilistic Time-Dependent Routing Calculation Targeting a Server-Side Car Navigation System
Incorporating speed probability distribution to the computation of the route
planning in car navigation systems guarantees more accurate and precise
responses. In this paper, we propose a novel approach for dynamically selecting
the number of samples used for the Monte Carlo simulation to solve the
Probabilistic Time-Dependent Routing (PTDR) problem, thus improving the
computation efficiency. The proposed method is used to determine in a proactive
manner the number of simulations to be done to extract the travel-time
estimation for each specific request while respecting an error threshold as
output quality level. The methodology requires a reduced effort on the
application development side. We adopted an aspect-oriented programming
language (LARA) together with a flexible dynamic autotuning library (mARGOt)
respectively to instrument the code and to take tuning decisions on the number
of samples improving the execution efficiency. Experimental results demonstrate
that the proposed adaptive approach saves a large fraction of simulations
(between 36% and 81%) with respect to a static approach while considering
different traffic situations, paths and error requirements. Given the
negligible runtime overhead of the proposed approach, it results in an
execution-time speedup between 1.5x and 5.1x. This speedup is reflected at
infrastructure-level in terms of a reduction of around 36% of the computing
resources needed to support the whole navigation pipeline
ALOJA: A benchmarking and predictive platform for big data performance analysis
The main goals of the ALOJA research project from BSC-MSR, are to explore and automate the characterization of cost-effectivenessof Big Data deployments. The development of the project over its first year, has resulted in a open source benchmarking platform, an online public repository of results with over 42,000 Hadoop job runs, and web-based analytic tools to gather insights about system's cost-performance1.
This article describes the evolution of the project's focus and research
lines from over a year of continuously benchmarking Hadoop under dif-
ferent configuration and deployments options, presents results, and dis
cusses the motivation both technical and market-based of such changes.
During this time, ALOJA's target has evolved from a previous low-level
profiling of Hadoop runtime, passing through extensive benchmarking
and evaluation of a large body of results via aggregation, to currently
leveraging Predictive Analytics (PA) techniques. Modeling benchmark
executions allow us to estimate the results of new or untested configu-
rations or hardware set-ups automatically, by learning techniques from
past observations saving in benchmarking time and costs.This work is partially supported the BSC-Microsoft Research Centre, the Span-
ish Ministry of Education (TIN2012-34557), the MINECO Severo Ochoa Research program (SEV-2011-0067) and the Generalitat de Catalunya (2014-SGR-1051).Peer ReviewedPostprint (author's final draft
Hyperparameter optimization: Foundations, algorithms, best practices, and open challenges
Most machine learning algorithms are configured by a set of hyperparameters whose values must be carefully chosen and which often considerably impact performance. To avoid a time-consuming and irreproducible manual process of trial-and-error to find well-performing hyperparameter configurations, various automatic hyperparameter optimization (HPO) methods—for example, based on resampling error estimation for supervised machine learning—can be employed. After introducing HPO from a general perspective, this paper reviews important HPO methods, from simple techniques such as grid or random search to more advanced methods like evolution strategies, Bayesian optimization, Hyperband, and racing. This work gives practical recommendations regarding important choices to be made when conducting HPO, including the HPO algorithms themselves, performance evaluation, how to combine HPO with machine learning pipelines, runtime improvements, and parallelization. This article is categorized under: Algorithmic Development > Statistics Technologies > Machine Learning Technologies > Prediction
- …