1,125 research outputs found
Non-monotone Submodular Maximization with Nearly Optimal Adaptivity and Query Complexity
Submodular maximization is a general optimization problem with a wide range
of applications in machine learning (e.g., active learning, clustering, and
feature selection). In large-scale optimization, the parallel running time of
an algorithm is governed by its adaptivity, which measures the number of
sequential rounds needed if the algorithm can execute polynomially-many
independent oracle queries in parallel. While low adaptivity is ideal, it is
not sufficient for an algorithm to be efficient in practice---there are many
applications of distributed submodular optimization where the number of
function evaluations becomes prohibitively expensive. Motivated by these
applications, we study the adaptivity and query complexity of submodular
maximization. In this paper, we give the first constant-factor approximation
algorithm for maximizing a non-monotone submodular function subject to a
cardinality constraint that runs in adaptive rounds and makes
oracle queries in expectation. In our empirical study, we use
three real-world applications to compare our algorithm with several benchmarks
for non-monotone submodular maximization. The results demonstrate that our
algorithm finds competitive solutions using significantly fewer rounds and
queries.Comment: 12 pages, 8 figure
Contextual Anomaly Detection Framework for Big Sensor Data
Performing predictive modelling, such as anomaly detection, in Big Data is a difficult task. This problem is compounded as more and more sources of Big Data are generated from environmental sensors, logging applications, and the Internet of Things. Further, most current techniques for anomaly detection only consider the content of the data source, i.e. the data itself, without concern for the context of the data. As data becomes more complex it is increasingly important to bias anomaly detection techniques for the context, whether it is spatial, temporal, or semantic. The work proposed in this thesis outlines a contextual anomaly detection framework for use in Big sensor Data systems. The framework uses a well-defined content anomaly detection algorithm for real-time point anomaly detection. Additionally, we present a post-processing context-aware anomaly detection algorithm based on sensor profiles, which are groups of contextually similar sensors generated by a multivariate clustering algorithm. The contextual anomaly detection framework is evaluated with respect to two different Big sensor Data data sets; one for electrical sensors, and another for temperature sensors within a building
Somoclu: An Efficient Parallel Library for Self-Organizing Maps
Somoclu is a massively parallel tool for training self-organizing maps on
large data sets written in C++. It builds on OpenMP for multicore execution,
and on MPI for distributing the workload across the nodes in a cluster. It is
also able to boost training by using CUDA if graphics processing units are
available. A sparse kernel is included, which is useful for high-dimensional
but sparse data, such as the vector spaces common in text mining workflows.
Python, R and MATLAB interfaces facilitate interactive use. Apart from fast
execution, memory use is highly optimized, enabling training large emergent
maps even on a single computer.Comment: 26 pages, 9 figures. The code is available at
https://peterwittek.github.io/somoclu
- …