62,245 research outputs found
MLPerf Inference Benchmark
Machine-learning (ML) hardware and software system demand is burgeoning.
Driven by ML applications, the number of different ML inference systems has
exploded. Over 100 organizations are building ML inference chips, and the
systems that incorporate existing models span at least three orders of
magnitude in power consumption and five orders of magnitude in performance;
they range from embedded devices to data-center solutions. Fueling the hardware
are a dozen or more software frameworks and libraries. The myriad combinations
of ML hardware and ML software make assessing ML-system performance in an
architecture-neutral, representative, and reproducible manner challenging.
There is a clear need for industry-wide standard ML benchmarking and evaluation
criteria. MLPerf Inference answers that call. In this paper, we present our
benchmarking method for evaluating ML inference systems. Driven by more than 30
organizations as well as more than 200 ML engineers and practitioners, MLPerf
prescribes a set of rules and best practices to ensure comparability across
systems with wildly differing architectures. The first call for submissions
garnered more than 600 reproducible inference-performance measurements from 14
organizations, representing over 30 systems that showcase a wide range of
capabilities. The submissions attest to the benchmark's flexibility and
adaptability.Comment: ISCA 202
CleanML: A Study for Evaluating the Impact of Data Cleaning on ML Classification Tasks
Data quality affects machine learning (ML) model performances, and data
scientists spend considerable amount of time on data cleaning before model
training. However, to date, there does not exist a rigorous study on how
exactly cleaning affects ML -- ML community usually focuses on developing ML
algorithms that are robust to some particular noise types of certain
distributions, while database (DB) community has been mostly studying the
problem of data cleaning alone without considering how data is consumed by
downstream ML analytics. We propose a CleanML study that systematically
investigates the impact of data cleaning on ML classification tasks. The
open-source and extensible CleanML study currently includes 14 real-world
datasets with real errors, five common error types, seven different ML models,
and multiple cleaning algorithms for each error type (including both commonly
used algorithms in practice as well as state-of-the-art solutions in academic
literature). We control the randomness in ML experiments using statistical
hypothesis testing, and we also control false discovery rate in our experiments
using the Benjamini-Yekutieli (BY) procedure. We analyze the results in a
systematic way to derive many interesting and nontrivial observations. We also
put forward multiple research directions for researchers.Comment: published in ICDE 202
Neural Networks for Modeling and Control of Particle Accelerators
We describe some of the challenges of particle accelerator control, highlight
recent advances in neural network techniques, discuss some promising avenues
for incorporating neural networks into particle accelerator control systems,
and describe a neural network-based control system that is being developed for
resonance control of an RF electron gun at the Fermilab Accelerator Science and
Technology (FAST) facility, including initial experimental results from a
benchmark controller.Comment: 21 p
PoseTrack: A Benchmark for Human Pose Estimation and Tracking
Human poses and motions are important cues for analysis of videos with people
and there is strong evidence that representations based on body pose are highly
effective for a variety of tasks such as activity recognition, content
retrieval and social signal processing. In this work, we aim to further advance
the state of the art by establishing "PoseTrack", a new large-scale benchmark
for video-based human pose estimation and articulated tracking, and bringing
together the community of researchers working on visual human analysis. The
benchmark encompasses three competition tracks focusing on i) single-frame
multi-person pose estimation, ii) multi-person pose estimation in videos, and
iii) multi-person articulated tracking. To facilitate the benchmark and
challenge we collect, annotate and release a new %large-scale benchmark dataset
that features videos with multiple people labeled with person tracks and
articulated pose. A centralized evaluation server is provided to allow
participants to evaluate on a held-out test set. We envision that the proposed
benchmark will stimulate productive research both by providing a large and
representative training dataset as well as providing a platform to objectively
evaluate and compare the proposed methods. The benchmark is freely accessible
at https://posetrack.net.Comment: www.posetrack.ne
- …