74,279 research outputs found
GOGGLES: Automatic Image Labeling with Affinity Coding
Generating large labeled training data is becoming the biggest bottleneck in
building and deploying supervised machine learning models. Recently, the data
programming paradigm has been proposed to reduce the human cost in labeling
training data. However, data programming relies on designing labeling functions
which still requires significant domain expertise. Also, it is prohibitively
difficult to write labeling functions for image datasets as it is hard to
express domain knowledge using raw features for images (pixels).
We propose affinity coding, a new domain-agnostic paradigm for automated
training data labeling. The core premise of affinity coding is that the
affinity scores of instance pairs belonging to the same class on average should
be higher than those of pairs belonging to different classes, according to some
affinity functions. We build the GOGGLES system that implements affinity coding
for labeling image datasets by designing a novel set of reusable affinity
functions for images, and propose a novel hierarchical generative model for
class inference using a small development set.
We compare GOGGLES with existing data programming systems on 5 image labeling
tasks from diverse domains. GOGGLES achieves labeling accuracies ranging from a
minimum of 71% to a maximum of 98% without requiring any extensive human
annotation. In terms of end-to-end performance, GOGGLES outperforms the
state-of-the-art data programming system Snuba by 21% and a state-of-the-art
few-shot learning technique by 5%, and is only 7% away from the fully
supervised upper bound.Comment: Published at 2020 ACM SIGMOD International Conference on Management
of Dat
Devito: Towards a generic Finite Difference DSL using Symbolic Python
Domain specific languages (DSL) have been used in a variety of fields to
express complex scientific problems in a concise manner and provide automated
performance optimization for a range of computational architectures. As such
DSLs provide a powerful mechanism to speed up scientific Python computation
that goes beyond traditional vectorization and pre-compilation approaches,
while allowing domain scientists to build applications within the comforts of
the Python software ecosystem. In this paper we present Devito, a new finite
difference DSL that provides optimized stencil computation from high-level
problem specifications based on symbolic Python expressions. We demonstrate
Devito's symbolic API and performance advantages over traditional Python
acceleration methods before highlighting its use in the scientific context of
seismic inversion problems.Comment: pyHPC 2016 conference submissio
Cross-Paced Representation Learning with Partial Curricula for Sketch-based Image Retrieval
In this paper we address the problem of learning robust cross-domain
representations for sketch-based image retrieval (SBIR). While most SBIR
approaches focus on extracting low- and mid-level descriptors for direct
feature matching, recent works have shown the benefit of learning coupled
feature representations to describe data from two related sources. However,
cross-domain representation learning methods are typically cast into non-convex
minimization problems that are difficult to optimize, leading to unsatisfactory
performance. Inspired by self-paced learning, a learning methodology designed
to overcome convergence issues related to local optima by exploiting the
samples in a meaningful order (i.e. easy to hard), we introduce the cross-paced
partial curriculum learning (CPPCL) framework. Compared with existing
self-paced learning methods which only consider a single modality and cannot
deal with prior knowledge, CPPCL is specifically designed to assess the
learning pace by jointly handling data from dual sources and modality-specific
prior information provided in the form of partial curricula. Additionally,
thanks to the learned dictionaries, we demonstrate that the proposed CPPCL
embeds robust coupled representations for SBIR. Our approach is extensively
evaluated on four publicly available datasets (i.e. CUFS, Flickr15K, QueenMary
SBIR and TU-Berlin Extension datasets), showing superior performance over
competing SBIR methods
autoAx: An Automatic Design Space Exploration and Circuit Building Methodology utilizing Libraries of Approximate Components
Approximate computing is an emerging paradigm for developing highly
energy-efficient computing systems such as various accelerators. In the
literature, many libraries of elementary approximate circuits have already been
proposed to simplify the design process of approximate accelerators. Because
these libraries contain from tens to thousands of approximate implementations
for a single arithmetic operation it is intractable to find an optimal
combination of approximate circuits in the library even for an application
consisting of a few operations. An open problem is "how to effectively combine
circuits from these libraries to construct complex approximate accelerators".
This paper proposes a novel methodology for searching, selecting and combining
the most suitable approximate circuits from a set of available libraries to
generate an approximate accelerator for a given application. To enable fast
design space generation and exploration, the methodology utilizes machine
learning techniques to create computational models estimating the overall
quality of processing and hardware cost without performing full synthesis at
the accelerator level. Using the methodology, we construct hundreds of
approximate accelerators (for a Sobel edge detector) showing different but
relevant tradeoffs between the quality of processing and hardware cost and
identify a corresponding Pareto-frontier. Furthermore, when searching for
approximate implementations of a generic Gaussian filter consisting of 17
arithmetic operations, the proposed approach allows us to identify
approximately highly important implementations from possible
solutions in a few hours, while the exhaustive search would take four months on
a high-end processor.Comment: Accepted for publication at the Design Automation Conference 2019
(DAC'19), Las Vegas, Nevada, US
Optical Synoptic Telescopes: New Science Frontiers
Over the past decade, sky surveys such as the Sloan Digital Sky Survey have
proven the power of large data sets for answering fundamental astrophysical
questions. This observational progress, based on a synergy of advances in
telescope construction, detectors, and information technology, has had a
dramatic impact on nearly all fields of astronomy, and areas of fundamental
physics. The next-generation instruments, and the surveys that will be made
with them, will maintain this revolutionary progress. The hardware and
computational technical challenges and the exciting science opportunities are
attracting scientists and engineers from astronomy, optics, low-light-level
detectors, high-energy physics, statistics, and computer science. The history
of astronomy has taught us repeatedly that there are surprises whenever we view
the sky in a new way. This will be particularly true of discoveries emerging
from a new generation of sky surveys. Imaging data from large ground-based
active optics telescopes with sufficient etendue can address many scientific
missions simultaneously. These new investigations will rely on the statistical
precision obtainable with billions of objects. For the first time, the full sky
will be surveyed deep and fast, opening a new window on a universe of faint
moving and distant exploding objects as well as unraveling the mystery of dark
energy.Comment: 12 pages, 7 figure
- …