2,947 research outputs found
Physical Representation-based Predicate Optimization for a Visual Analytics Database
Querying the content of images, video, and other non-textual data sources
requires expensive content extraction methods. Modern extraction techniques are
based on deep convolutional neural networks (CNNs) and can classify objects
within images with astounding accuracy. Unfortunately, these methods are slow:
processing a single image can take about 10 milliseconds on modern GPU-based
hardware. As massive video libraries become ubiquitous, running a content-based
query over millions of video frames is prohibitive.
One promising approach to reduce the runtime cost of queries of visual
content is to use a hierarchical model, such as a cascade, where simple cases
are handled by an inexpensive classifier. Prior work has sought to design
cascades that optimize the computational cost of inference by, for example,
using smaller CNNs. However, we observe that there are critical factors besides
the inference time that dramatically impact the overall query time. Notably, by
treating the physical representation of the input image as part of our query
optimization---that is, by including image transforms, such as resolution
scaling or color-depth reduction, within the cascade---we can optimize data
handling costs and enable drastically more efficient classifier cascades.
In this paper, we propose Tahoma, which generates and evaluates many
potential classifier cascades that jointly optimize the CNN architecture and
input data representation. Our experiments on a subset of ImageNet show that
Tahoma's input transformations speed up cascades by up to 35 times. We also
find up to a 98x speedup over the ResNet50 classifier with no loss in accuracy,
and a 280x speedup if some accuracy is sacrificed.Comment: Camera-ready version of the paper submitted to ICDE 2019, In
Proceedings of the 35th IEEE International Conference on Data Engineering
(ICDE 2019
Apache Calcite: A Foundational Framework for Optimized Query Processing Over Heterogeneous Data Sources
Apache Calcite is a foundational software framework that provides query
processing, optimization, and query language support to many popular
open-source data processing systems such as Apache Hive, Apache Storm, Apache
Flink, Druid, and MapD. Calcite's architecture consists of a modular and
extensible query optimizer with hundreds of built-in optimization rules, a
query processor capable of processing a variety of query languages, an adapter
architecture designed for extensibility, and support for heterogeneous data
models and stores (relational, semi-structured, streaming, and geospatial).
This flexible, embeddable, and extensible architecture is what makes Calcite an
attractive choice for adoption in big-data frameworks. It is an active project
that continues to introduce support for the new types of data sources, query
languages, and approaches to query processing and optimization.Comment: SIGMOD'1
Efficient Nearest Neighbor Classification Using a Cascade of Approximate Similarity Measures
Nearest neighbor classification using shape context can yield highly accurate results in a number of recognition problems. Unfortunately, the approach can be too slow for practical applications, and thus approximation strategies are needed to make shape context practical. This paper proposes a method for efficient and accurate nearest neighbor classification in non-Euclidean spaces, such as the space induced by the shape context measure. First, a method is introduced for constructing a Euclidean embedding that is optimized for nearest neighbor classification accuracy. Using that embedding, multiple approximations of the underlying non-Euclidean similarity measure are obtained, at different levels of accuracy and efficiency. The approximations are automatically combined to form a cascade classifier, which applies the slower approximations only to the hardest cases. Unlike typical cascade-of-classifiers approaches, that are applied to binary classification problems, our method constructs a cascade for a multiclass problem. Experiments with a standard shape data set indicate that a two-to-three order of magnitude speed up is gained over the standard shape context classifier, with minimal losses in classification accuracy.National Science Foundation (IIS-0308213, IIS-0329009, EIA-0202067); Office of Naval Research (N00014-03-1-0108
- …