Search CORE

34,208 research outputs found

A taxonomy framework for unsupervised outlier detection techniques for multi-type data sets

Author: Havinga P.J.M.
Meratnia N.
Zhang Yang
Publication venue: Centre for Telematics and Information Technology, University of Twente
Publication date: 01/01/2007
Field of study

The term "outlier" can generally be defined as an observation that is significantly different from the other values in a data set. The outliers may be instances of error or indicate events. The task of outlier detection aims at identifying such outliers in order to improve the analysis of data and further discover interesting and useful knowledge about unusual events within numerous applications domains. In this paper, we report on contemporary unsupervised outlier detection techniques for multiple types of data sets and provide a comprehensive taxonomy framework and two decision trees to select the most suitable technique based on data set. Furthermore, we highlight the advantages, disadvantages and performance issues of each class of outlier detection techniques under this taxonomy framework

University of Twente Research Information

An efficient parallel immersed boundary algorithm using a pseudo-compressible fluid solver

Author: Stockie John M.
Wiens Jeffrey K.
Publication venue: 'Elsevier BV'
Publication date: 19/05/2014
Field of study

We propose an efficient algorithm for the immersed boundary method on distributed-memory architectures, with the computational complexity of a completely explicit method and excellent parallel scaling. The algorithm utilizes the pseudo-compressibility method recently proposed by Guermond and Minev [Comptes Rendus Mathematique, 348:581-585, 2010] that uses a directional splitting strategy to discretize the incompressible Navier-Stokes equations, thereby reducing the linear systems to a series of one-dimensional tridiagonal systems. We perform numerical simulations of several fluid-structure interaction problems in two and three dimensions and study the accuracy and convergence rates of the proposed algorithm. For these problems, we compare the proposed algorithm against other second-order projection-based fluid solvers. Lastly, the strong and weak scaling properties of the proposed algorithm are investigated

arXiv.org e-Print Archive

CiteSeerX

Hierarchical Bin Buffering: Online Local Moments for Dynamic External Memory Arrays

Author: Chakrabarti K.
Daniel Lemire
Geffner S.
Gray J.
Lemire D.
Li B.-C.
Moerkotte G.
Owen Kaser
Schmidt R. R.
Scott D.
Vitter J. S.
Zhou F.
Publication venue: 'Association for Computing Machinery (ACM)'
Publication date: 01/05/2008
Field of study

Local moments are used for local regression, to compute statistical measures such as sums, averages, and standard deviations, and to approximate probability distributions. We consider the case where the data source is a very large I/O array of size n and we want to compute the first N local moments, for some constant N. Without precomputation, this requires O(n) time. We develop a sequence of algorithms of increasing sophistication that use precomputation and additional buffer space to speed up queries. The simpler algorithms partition the I/O array into consecutive ranges called bins, and they are applicable not only to local-moment queries, but also to algebraic queries (MAX, AVERAGE, SUM, etc.). With N buffers of size sqrt{n}, time complexity drops to O(sqrt n). A more sophisticated approach uses hierarchical buffering and has a logarithmic time complexity (O(b log_b n)), when using N hierarchical buffers of size n/b. Using Overlapped Bin Buffering, we show that only a single buffer is needed, as with wavelet-based algorithms, but using much less storage. Applications exist in multidimensional and statistical databases over massive data sets, interactive image processing, and visualization

arXiv.org e-Print Archive

R-libre

Crossref

The IBMAP approach for Markov networks structure learning

Author: Alejandro Edera
C Aliferis
C Aliferis
D Koller
D Margaritis
DM Chickering
F Bromberg
F Bromberg
Facundo Bromberg
Federico Schlüter
J Pearl
M Mitchell
MJ Wainwright
P Larraṅaga
P Ravikumar
P Spirtes
R Santana
S Della Pietra
S Shakya
SL Lauritzen
TM Cover
Publication venue: 'Springer Science and Business Media LLC'
Publication date: 25/02/2014
Field of study

In this work we consider the problem of learning the structure of Markov networks from data. We present an approach for tackling this problem called IBMAP, together with an efficient instantiation of the approach: the IBMAP-HC algorithm, designed for avoiding important limitations of existing independence-based algorithms. These algorithms proceed by performing statistical independence tests on data, trusting completely the outcome of each test. In practice tests may be incorrect, resulting in potential cascading errors and the consequent reduction in the quality of the structures learned. IBMAP contemplates this uncertainty in the outcome of the tests through a probabilistic maximum-a-posteriori approach. The approach is instantiated in the IBMAP-HC algorithm, a structure selection strategy that performs a polynomial heuristic local search in the space of possible structures. We present an extensive empirical evaluation on synthetic and real data, showing that our algorithm outperforms significantly the current independence-based algorithms, in terms of data efficiency and quality of learned structures, with equivalent computational complexities. We also show the performance of IBMAP-HC in a real-world application of knowledge discovery: EDAs, which are evolutionary algorithms that use structure learning on each generation for modeling the distribution of populations. The experiments show that when IBMAP-HC is used to learn the structure, EDAs improve the convergence to the optimum

arXiv.org e-Print Archive

Crossref

CONICET Digital

Semi-supervised cross-entropy clustering with information bottleneck constraint

Author: Geiger Bernhard C.
Śmieja Marek
Publication venue: 'Elsevier BV'
Publication date: 01/01/2017
Field of study

In this paper, we propose a semi-supervised clustering method, CEC-IB, that models data with a set of Gaussian distributions and that retrieves clusters based on a partial labeling provided by the user (partition-level side information). By combining the ideas from cross-entropy clustering (CEC) with those from the information bottleneck method (IB), our method trades between three conflicting goals: the accuracy with which the data set is modeled, the simplicity of the model, and the consistency of the clustering with side information. Experiments demonstrate that CEC-IB has a performance comparable to Gaussian mixture models (GMM) in a classical semi-supervised scenario, but is faster, more robust to noisy labels, automatically determines the optimal number of clusters, and performs well when not all classes are present in the side information. Moreover, in contrast to other semi-supervised models, it can be successfully applied in discovering natural subgroups if the partition-level side information is derived from the top levels of a hierarchical clustering

arXiv.org e-Print Archive

Jagiellonian Univeristy Repository

Color Filtering Localization for Three-Dimensional Underwater Acoustic Sensor Networks

Author: Chang Shuai
Chen Jiaxing
Gao Han
Liu Zhihua
Wang Wuling
Publication venue
Publication date: 30/12/2014
Field of study

Accurate localization for mobile nodes has been an important and fundamental problem in underwater acoustic sensor networks (UASNs). The detection information returned from a mobile node is meaningful only if its location is known. In this paper, we propose two localization algorithms based on color filtering technology called PCFL and ACFL. PCFL and ACFL aim at collaboratively accomplishing accurate localization of underwater mobile nodes with minimum energy expenditure. They both adopt the overlapping signal region of task anchors which can communicate with the mobile node directly as the current sampling area. PCFL employs the projected distances between each of the task projections and the mobile node, while ACFL adopts the direct distance between each of the task anchors and the mobile node. Also the proportion factor of distance is proposed to weight the RGB values. By comparing the nearness degrees of the RGB sequences between the samples and the mobile node, samples can be filtered out. And the normalized nearness degrees are considered as the weighted standards to calculate coordinates of the mobile nodes. The simulation results show that the proposed methods have excellent localization performance and can timely localize the mobile node. The average localization error of PCFL can decline by about 30.4% than the AFLA method.Comment: 18 pages, 11 figures, 2 table

arXiv.org e-Print Archive

Multidisciplinary Digital Publishing Institute

Directory of Open Access Journals

PubMed Central

Scaling Reinforcement Learning Paradigms for Motor Control

Author: Peters J.
Schaal S.
Vijayakumar S.
Publication venue
Publication date: 01/01/2003
Field of study

Reinforcement learning offers a general framework to explain reward related learning in artificial and biological motor control. However, current reinforcement learning methods rarely scale to high dimensional movement systems and mainly operate in discrete, low dimensional domains like game-playing, artificial toy problems, etc. This drawback makes them unsuitable for application to human or bio-mimetic motor control. In this poster, we look at promising approaches that can potentially scale and suggest a novel formulation of the actor-critic algorithm which takes steps towards alleviating the current shortcomings. We argue that methods based on greedy policies are not likely to scale into high-dimensional domains as they are problematic when used with function approximation a must when dealing with continuous domains. We adopt the path of direct policy gradient based policy improvements since they avoid the problems of unstabilizing dynamics encountered in traditional value iteration based updates. While regular policy gradient methods have demonstrated promising results in the domain of humanoid notor control, we demonstrate that these methods can be significantly improved using the natural policy gradient instead of the regular policy gradient. Based on this, it is proved that Kakades average natural policy gradient is indeed the true natural gradient. A general algorithm for estimating the natural gradient, the Natural Actor-Critic algorithm, is introduced. This algorithm converges with probability one to the nearest local minimum in Riemannian space of the cost function. The algorithm outperforms nonnatural policy gradients by far in a cart-pole balancing evaluation, and offers a promising route for the development of reinforcement learning for truly high-dimensionally continuous state-action systems. Keywords: Reinforcement learning, neurodynamic programming, actorcritic methods, policy gradient methods, natural policy gradien

Edinburgh Research Explorer

MPG.PuRe