Search CORE

1,125 research outputs found

Non-monotone Submodular Maximization with Nearly Optimal Adaptivity and Query Complexity

Author: Fahrbach Matthew
Mirrokni Vahab
Zadimoghaddam Morteza
Publication venue
Publication date: 28/05/2019
Field of study

Submodular maximization is a general optimization problem with a wide range of applications in machine learning (e.g., active learning, clustering, and feature selection). In large-scale optimization, the parallel running time of an algorithm is governed by its adaptivity, which measures the number of sequential rounds needed if the algorithm can execute polynomially-many independent oracle queries in parallel. While low adaptivity is ideal, it is not sufficient for an algorithm to be efficient in practice---there are many applications of distributed submodular optimization where the number of function evaluations becomes prohibitively expensive. Motivated by these applications, we study the adaptivity and query complexity of submodular maximization. In this paper, we give the first constant-factor approximation algorithm for maximizing a non-monotone submodular function subject to a cardinality constraint

k

that runs in

O(\log(n))

adaptive rounds and makes

O(n \log(k))

oracle queries in expectation. In our empirical study, we use three real-world applications to compare our algorithm with several benchmarks for non-monotone submodular maximization. The results demonstrate that our algorithm finds competitive solutions using significantly fewer rounds and queries.Comment: 12 pages, 8 figure

arXiv.org e-Print Archive

Contextual Anomaly Detection Framework for Big Sensor Data

Author: Hayes Michael
Publication venue: Scholarship@Western
Publication date: 16/04/2014
Field of study

Performing predictive modelling, such as anomaly detection, in Big Data is a difficult task. This problem is compounded as more and more sources of Big Data are generated from environmental sensors, logging applications, and the Internet of Things. Further, most current techniques for anomaly detection only consider the content of the data source, i.e. the data itself, without concern for the context of the data. As data becomes more complex it is increasingly important to bias anomaly detection techniques for the context, whether it is spatial, temporal, or semantic. The work proposed in this thesis outlines a contextual anomaly detection framework for use in Big sensor Data systems. The framework uses a well-defined content anomaly detection algorithm for real-time point anomaly detection. Additionally, we present a post-processing context-aware anomaly detection algorithm based on sensor profiles, which are groups of contextually similar sensors generated by a multivariate clustering algorithm. The contextual anomaly detection framework is evaluated with respect to two different Big sensor Data data sets; one for electrical sensors, and another for temperature sensors within a building

Scholarship@Western

Somoclu: An Efficient Parallel Library for Self-Organizing Maps

Author: Gao Shi Chao
Lim Ik Soo
Wittek Peter
Zhao Li
Publication venue: 'Foundation for Open Access Statistic'
Publication date: 01/06/2017
Field of study

Somoclu is a massively parallel tool for training self-organizing maps on large data sets written in C++. It builds on OpenMP for multicore execution, and on MPI for distributing the workload across the nodes in a cluster. It is also able to boost training by using CUDA if graphics processing units are available. A sparse kernel is included, which is useful for high-dimensional but sparse data, such as the vector spaces common in text mining workflows. Python, R and MATLAB interfaces facilitate interactive use. Apart from fast execution, memory use is highly optimized, enabling training large emergent maps even on a single computer.Comment: 26 pages, 9 figures. The code is available at https://peterwittek.github.io/somoclu

arXiv.org e-Print Archive

Directory of Open Access Journals

Journal of Statistical Software

Bangor University Research Portal