58,199 research outputs found
Engineering Crowdsourced Stream Processing Systems
A crowdsourced stream processing system (CSP) is a system that incorporates
crowdsourced tasks in the processing of a data stream. This can be seen as
enabling crowdsourcing work to be applied on a sample of large-scale data at
high speed, or equivalently, enabling stream processing to employ human
intelligence. It also leads to a substantial expansion of the capabilities of
data processing systems. Engineering a CSP system requires the combination of
human and machine computation elements. From a general systems theory
perspective, this means taking into account inherited as well as emerging
properties from both these elements. In this paper, we position CSP systems
within a broader taxonomy, outline a series of design principles and evaluation
metrics, present an extensible framework for their design, and describe several
design patterns. We showcase the capabilities of CSP systems by performing a
case study that applies our proposed framework to the design and analysis of a
real system (AIDR) that classifies social media messages during time-critical
crisis events. Results show that compared to a pure stream processing system,
AIDR can achieve a higher data classification accuracy, while compared to a
pure crowdsourcing solution, the system makes better use of human workers by
requiring much less manual work effort
Physical Representation-based Predicate Optimization for a Visual Analytics Database
Querying the content of images, video, and other non-textual data sources
requires expensive content extraction methods. Modern extraction techniques are
based on deep convolutional neural networks (CNNs) and can classify objects
within images with astounding accuracy. Unfortunately, these methods are slow:
processing a single image can take about 10 milliseconds on modern GPU-based
hardware. As massive video libraries become ubiquitous, running a content-based
query over millions of video frames is prohibitive.
One promising approach to reduce the runtime cost of queries of visual
content is to use a hierarchical model, such as a cascade, where simple cases
are handled by an inexpensive classifier. Prior work has sought to design
cascades that optimize the computational cost of inference by, for example,
using smaller CNNs. However, we observe that there are critical factors besides
the inference time that dramatically impact the overall query time. Notably, by
treating the physical representation of the input image as part of our query
optimization---that is, by including image transforms, such as resolution
scaling or color-depth reduction, within the cascade---we can optimize data
handling costs and enable drastically more efficient classifier cascades.
In this paper, we propose Tahoma, which generates and evaluates many
potential classifier cascades that jointly optimize the CNN architecture and
input data representation. Our experiments on a subset of ImageNet show that
Tahoma's input transformations speed up cascades by up to 35 times. We also
find up to a 98x speedup over the ResNet50 classifier with no loss in accuracy,
and a 280x speedup if some accuracy is sacrificed.Comment: Camera-ready version of the paper submitted to ICDE 2019, In
Proceedings of the 35th IEEE International Conference on Data Engineering
(ICDE 2019
Visual Analysis of Spatio-Temporal Event Predictions: Investigating the Spread Dynamics of Invasive Species
Invasive species are a major cause of ecological damage and commercial
losses. A current problem spreading in North America and Europe is the vinegar
fly Drosophila suzukii. Unlike other Drosophila, it infests non-rotting and
healthy fruits and is therefore of concern to fruit growers, such as vintners.
Consequently, large amounts of data about infestations have been collected in
recent years. However, there is a lack of interactive methods to investigate
this data. We employ ensemble-based classification to predict areas susceptible
to infestation by D. suzukii and bring them into a spatio-temporal context
using maps and glyph-based visualizations. Following the information-seeking
mantra, we provide a visual analysis system Drosophigator for spatio-temporal
event prediction, enabling the investigation of the spread dynamics of invasive
species. We demonstrate the usefulness of this approach in two use cases
Application of multiobjective genetic programming to the design of robot failure recognition systems
We present an evolutionary approach using multiobjective genetic programming (MOGP) to derive optimal feature extraction preprocessing stages for robot failure detection. This data-driven machine learning method is compared both with conventional (nonevolutionary) classifiers and a set of domain-dependent feature extraction methods. We conclude MOGP is an effective and practical design method for failure recognition systems with enhanced recognition accuracy over conventional classifiers, independent of domain knowledge
An Overview on Application of Machine Learning Techniques in Optical Networks
Today's telecommunication networks have become sources of enormous amounts of
widely heterogeneous data. This information can be retrieved from network
traffic traces, network alarms, signal quality indicators, users' behavioral
data, etc. Advanced mathematical tools are required to extract meaningful
information from these data and take decisions pertaining to the proper
functioning of the networks from the network-generated data. Among these
mathematical tools, Machine Learning (ML) is regarded as one of the most
promising methodological approaches to perform network-data analysis and enable
automated network self-configuration and fault management. The adoption of ML
techniques in the field of optical communication networks is motivated by the
unprecedented growth of network complexity faced by optical networks in the
last few years. Such complexity increase is due to the introduction of a huge
number of adjustable and interdependent system parameters (e.g., routing
configurations, modulation format, symbol rate, coding schemes, etc.) that are
enabled by the usage of coherent transmission/reception technologies, advanced
digital signal processing and compensation of nonlinear effects in optical
fiber propagation. In this paper we provide an overview of the application of
ML to optical communications and networking. We classify and survey relevant
literature dealing with the topic, and we also provide an introductory tutorial
on ML for researchers and practitioners interested in this field. Although a
good number of research papers have recently appeared, the application of ML to
optical networks is still in its infancy: to stimulate further work in this
area, we conclude the paper proposing new possible research directions
One-Class Classification: Taxonomy of Study and Review of Techniques
One-class classification (OCC) algorithms aim to build classification models
when the negative class is either absent, poorly sampled or not well defined.
This unique situation constrains the learning of efficient classifiers by
defining class boundary just with the knowledge of positive class. The OCC
problem has been considered and applied under many research themes, such as
outlier/novelty detection and concept learning. In this paper we present a
unified view of the general problem of OCC by presenting a taxonomy of study
for OCC problems, which is based on the availability of training data,
algorithms used and the application domains applied. We further delve into each
of the categories of the proposed taxonomy and present a comprehensive
literature review of the OCC algorithms, techniques and methodologies with a
focus on their significance, limitations and applications. We conclude our
paper by discussing some open research problems in the field of OCC and present
our vision for future research.Comment: 24 pages + 11 pages of references, 8 figure
- …