944 research outputs found
ASlib: A Benchmark Library for Algorithm Selection
The task of algorithm selection involves choosing an algorithm from a set of
algorithms on a per-instance basis in order to exploit the varying performance
of algorithms over a set of instances. The algorithm selection problem is
attracting increasing attention from researchers and practitioners in AI. Years
of fruitful applications in a number of domains have resulted in a large amount
of data, but the community lacks a standard format or repository for this data.
This situation makes it difficult to share and compare different approaches
effectively, as is done in other, more established fields. It also
unnecessarily hinders new researchers who want to work in this area. To address
this problem, we introduce a standardized format for representing algorithm
selection scenarios and a repository that contains a growing number of data
sets from the literature. Our format has been designed to be able to express a
wide variety of different scenarios. Demonstrating the breadth and power of our
platform, we describe a set of example experiments that build and evaluate
algorithm selection models through a common interface. The results display the
potential of algorithm selection to achieve significant performance
improvements across a broad range of problems and algorithms.Comment: Accepted to be published in Artificial Intelligence Journa
Enabling trade-offs between accuracy and computational cost: Adaptive algorithms to reduce time to clinical insight
The efficacy of drug treatments depends on how tightly small molecules bind to their target proteins. Quantifying the strength of these interactions (the so called 'binding affinity') is a grand challenge of computational chemistry, surmounting which could revolutionize drug design and provide the platform for patient specific medicine. Recently, evidence from blind challenge predictions and retrospective validation studies has suggested that molecular dynamics (MD) can now achieve useful predictive accuracy (1 kcal/mol) This accuracy is sufficient to greatly accelerate hit to lead and lead optimization. To translate these advances in predictive accuracy so as to impact clinical and/or industrial decision making requires that binding free energy results must be turned around on reduced timescales without loss of accuracy. This demands advances in algorithms, scalable software systems, and intelligent and efficient utilization of supercomputing resources. This work is motivated by the real world problem of providing insight from drug candidate data on a time scale that is as short as possible. Specifically, we reproduce results from a collaborative project between UCL and GlaxoSmithKline to study a congeneric series of drug candidates binding to the BRD4 protein-inhibitors of which have shown promising preclinical efficacy in pathologies ranging from cancer to inflammation. We demonstrate the use of a framework called HTBAC, designed to support the aforementioned requirements of accurate and rapid drug binding affinity calculations. HTBAC facilitates the execution of the numbers of simulations while supporting the adaptive execution of algorithms. Furthermore, HTBAC enables the selection of simulation parameters during runtime which can, in principle, optimize the use of computational resources whilst producing results within a target uncertainty
From Facility to Application Sensor Data: Modular, Continuous and Holistic Monitoring with DCDB
Today's HPC installations are highly-complex systems, and their complexity
will only increase as we move to exascale and beyond. At each layer, from
facilities to systems, from runtimes to applications, a wide range of tuning
decisions must be made in order to achieve efficient operation. This, however,
requires systematic and continuous monitoring of system and user data. While
many insular solutions exist, a system for holistic and facility-wide
monitoring is still lacking in the current HPC ecosystem. In this paper we
introduce DCDB, a comprehensive monitoring system capable of integrating data
from all system levels. It is designed as a modular and highly-scalable
framework based on a plugin infrastructure. All monitored data is aggregated at
a distributed noSQL data store for analysis and cross-system correlation. We
demonstrate the performance and scalability of DCDB, and describe two use cases
in the area of energy management and characterization.Comment: Accepted at the The International Conference for High Performance
Computing, Networking, Storage, and Analysis (SC) 201
Hyperparameter optimization: Foundations, algorithms, best practices, and open challenges
Most machine learning algorithms are configured by a set of hyperparameters whose values must be carefully chosen and which often considerably impact performance. To avoid a time-consuming and irreproducible manual process of trial-and-error to find well-performing hyperparameter configurations, various automatic hyperparameter optimization (HPO) methods—for example, based on resampling error estimation for supervised machine learning—can be employed. After introducing HPO from a general perspective, this paper reviews important HPO methods, from simple techniques such as grid or random search to more advanced methods like evolution strategies, Bayesian optimization, Hyperband, and racing. This work gives practical recommendations regarding important choices to be made when conducting HPO, including the HPO algorithms themselves, performance evaluation, how to combine HPO with machine learning pipelines, runtime improvements, and parallelization. This article is categorized under: Algorithmic Development > Statistics Technologies > Machine Learning Technologies > Prediction
Applications
Volume 3 describes how resource-aware machine learning methods and techniques are used to successfully solve real-world problems. The book provides numerous specific application examples: in health and medicine for risk modelling, diagnosis, and treatment selection for diseases in electronics, steel production and milling for quality control during manufacturing processes in traffic, logistics for smart cities and for mobile communications
Fundamentals
Volume 1 establishes the foundations of this new field. It goes through all the steps from data collection, their summary and clustering, to different aspects of resource-aware learning, i.e., hardware, memory, energy, and communication awareness. Machine learning methods are inspected with respect to resource requirements and how to enhance scalability on diverse computing architectures ranging from embedded systems to large computing clusters
Fifty years of pulsar candidate selection: from simple filters to a new principled real-time classification approach
Improving survey specifications are causing an exponential rise in pulsar
candidate numbers and data volumes. We study the candidate filters used to
mitigate these problems during the past fifty years. We find that some existing
methods such as applying constraints on the total number of candidates
collected per observation, may have detrimental effects on the success of
pulsar searches. Those methods immune to such effects are found to be
ill-equipped to deal with the problems associated with increasing data volumes
and candidate numbers, motivating the development of new approaches. We
therefore present a new method designed for on-line operation. It selects
promising candidates using a purpose-built tree-based machine learning
classifier, the Gaussian Hellinger Very Fast Decision Tree (GH-VFDT), and a new
set of features for describing candidates. The features have been chosen so as
to i) maximise the separation between candidates arising from noise and those
of probable astrophysical origin, and ii) be as survey-independent as possible.
Using these features our new approach can process millions of candidates in
seconds (~1 million every 15 seconds), with high levels of pulsar recall
(90%+). This technique is therefore applicable to the large volumes of data
expected to be produced by the Square Kilometre Array (SKA). Use of this
approach has assisted in the discovery of 20 new pulsars in data obtained
during the LOFAR Tied-Array All-Sky Survey (LOTAAS).Comment: Accepted for publication in MNRAS, 20 pages, 8 figures. See
http://www.jb.man.ac.uk/pulsar/Surveys.html for survey data, and
https://dx.doi.org/10.6084/m9.figshare.3080389.v1 for our dat
- …