64,541 research outputs found
Benchmarking in cluster analysis: A white paper
To achieve scientific progress in terms of building a cumulative body of
knowledge, careful attention to benchmarking is of the utmost importance. This
means that proposals of new methods of data pre-processing, new data-analytic
techniques, and new methods of output post-processing, should be extensively
and carefully compared with existing alternatives, and that existing methods
should be subjected to neutral comparison studies. To date, benchmarking and
recommendations for benchmarking have been frequently seen in the context of
supervised learning. Unfortunately, there has been a dearth of guidelines for
benchmarking in an unsupervised setting, with the area of clustering as an
important subdomain. To address this problem, discussion is given to the
theoretical conceptual underpinnings of benchmarking in the field of cluster
analysis by means of simulated as well as empirical data. Subsequently, the
practicalities of how to address benchmarking questions in clustering are dealt
with, and foundational recommendations are made
BDGS: A Scalable Big Data Generator Suite in Big Data Benchmarking
Data generation is a key issue in big data benchmarking that aims to generate
application-specific data sets to meet the 4V requirements of big data.
Specifically, big data generators need to generate scalable data (Volume) of
different types (Variety) under controllable generation rates (Velocity) while
keeping the important characteristics of raw data (Veracity). This gives rise
to various new challenges about how we design generators efficiently and
successfully. To date, most existing techniques can only generate limited types
of data and support specific big data systems such as Hadoop. Hence we develop
a tool, called Big Data Generator Suite (BDGS), to efficiently generate
scalable big data while employing data models derived from real data to
preserve data veracity. The effectiveness of BDGS is demonstrated by developing
six data generators covering three representative data types (structured,
semi-structured and unstructured) and three data sources (text, graph, and
table data)
GeNN: a code generation framework for accelerated brain simulations
Large-scale numerical simulations of detailed brain circuit models are important for identifying hypotheses on brain functions and testing their consistency and plausibility. An ongoing challenge for simulating realistic models is, however, computational speed. In this paper, we present the GeNN (GPU-enhanced Neuronal Networks) framework, which aims to facilitate the use of graphics accelerators for computational models of large-scale neuronal networks to address this challenge. GeNN is an open source library that generates code to accelerate the execution of network simulations on NVIDIA GPUs, through a flexible and extensible interface, which does not require in-depth technical knowledge from the users. We present performance benchmarks showing that 200-fold speedup compared to a single core of a CPU can be achieved for a network of one million conductance based Hodgkin-Huxley neurons but that for other models the speedup can differ.
GeNN is available for Linux, Mac OS X and Windows platforms. The source code, user manual, tutorials,
Wiki, in-depth example projects and all other related information can be found on the project website http://genn-team.github.io/genn/
Readiness of Quantum Optimization Machines for Industrial Applications
There have been multiple attempts to demonstrate that quantum annealing and,
in particular, quantum annealing on quantum annealing machines, has the
potential to outperform current classical optimization algorithms implemented
on CMOS technologies. The benchmarking of these devices has been controversial.
Initially, random spin-glass problems were used, however, these were quickly
shown to be not well suited to detect any quantum speedup. Subsequently,
benchmarking shifted to carefully crafted synthetic problems designed to
highlight the quantum nature of the hardware while (often) ensuring that
classical optimization techniques do not perform well on them. Even worse, to
date a true sign of improved scaling with the number of problem variables
remains elusive when compared to classical optimization techniques. Here, we
analyze the readiness of quantum annealing machines for real-world application
problems. These are typically not random and have an underlying structure that
is hard to capture in synthetic benchmarks, thus posing unexpected challenges
for optimization techniques, both classical and quantum alike. We present a
comprehensive computational scaling analysis of fault diagnosis in digital
circuits, considering architectures beyond D-wave quantum annealers. We find
that the instances generated from real data in multiplier circuits are harder
than other representative random spin-glass benchmarks with a comparable number
of variables. Although our results show that transverse-field quantum annealing
is outperformed by state-of-the-art classical optimization algorithms, these
benchmark instances are hard and small in the size of the input, therefore
representing the first industrial application ideally suited for testing
near-term quantum annealers and other quantum algorithmic strategies for
optimization problems.Comment: 22 pages, 12 figures. Content updated according to Phys. Rev. Applied
versio
- …