34,208 research outputs found
A taxonomy framework for unsupervised outlier detection techniques for multi-type data sets
The term "outlier" can generally be defined as an observation that is significantly different from
the other values in a data set. The outliers may be instances of error or indicate events. The
task of outlier detection aims at identifying such outliers in order to improve the analysis of
data and further discover interesting and useful knowledge about unusual events within numerous
applications domains. In this paper, we report on contemporary unsupervised outlier detection
techniques for multiple types of data sets and provide a comprehensive taxonomy framework and
two decision trees to select the most suitable technique based on data set. Furthermore, we
highlight the advantages, disadvantages and performance issues of each class of outlier detection
techniques under this taxonomy framework
An efficient parallel immersed boundary algorithm using a pseudo-compressible fluid solver
We propose an efficient algorithm for the immersed boundary method on
distributed-memory architectures, with the computational complexity of a
completely explicit method and excellent parallel scaling. The algorithm
utilizes the pseudo-compressibility method recently proposed by Guermond and
Minev [Comptes Rendus Mathematique, 348:581-585, 2010] that uses a directional
splitting strategy to discretize the incompressible Navier-Stokes equations,
thereby reducing the linear systems to a series of one-dimensional tridiagonal
systems. We perform numerical simulations of several fluid-structure
interaction problems in two and three dimensions and study the accuracy and
convergence rates of the proposed algorithm. For these problems, we compare the
proposed algorithm against other second-order projection-based fluid solvers.
Lastly, the strong and weak scaling properties of the proposed algorithm are
investigated
Hierarchical Bin Buffering: Online Local Moments for Dynamic External Memory Arrays
Local moments are used for local regression, to compute statistical measures
such as sums, averages, and standard deviations, and to approximate probability
distributions. We consider the case where the data source is a very large I/O
array of size n and we want to compute the first N local moments, for some
constant N. Without precomputation, this requires O(n) time. We develop a
sequence of algorithms of increasing sophistication that use precomputation and
additional buffer space to speed up queries. The simpler algorithms partition
the I/O array into consecutive ranges called bins, and they are applicable not
only to local-moment queries, but also to algebraic queries (MAX, AVERAGE, SUM,
etc.). With N buffers of size sqrt{n}, time complexity drops to O(sqrt n). A
more sophisticated approach uses hierarchical buffering and has a logarithmic
time complexity (O(b log_b n)), when using N hierarchical buffers of size n/b.
Using Overlapped Bin Buffering, we show that only a single buffer is needed, as
with wavelet-based algorithms, but using much less storage. Applications exist
in multidimensional and statistical databases over massive data sets,
interactive image processing, and visualization
The IBMAP approach for Markov networks structure learning
In this work we consider the problem of learning the structure of Markov
networks from data. We present an approach for tackling this problem called
IBMAP, together with an efficient instantiation of the approach: the IBMAP-HC
algorithm, designed for avoiding important limitations of existing
independence-based algorithms. These algorithms proceed by performing
statistical independence tests on data, trusting completely the outcome of each
test. In practice tests may be incorrect, resulting in potential cascading
errors and the consequent reduction in the quality of the structures learned.
IBMAP contemplates this uncertainty in the outcome of the tests through a
probabilistic maximum-a-posteriori approach. The approach is instantiated in
the IBMAP-HC algorithm, a structure selection strategy that performs a
polynomial heuristic local search in the space of possible structures. We
present an extensive empirical evaluation on synthetic and real data, showing
that our algorithm outperforms significantly the current independence-based
algorithms, in terms of data efficiency and quality of learned structures, with
equivalent computational complexities. We also show the performance of IBMAP-HC
in a real-world application of knowledge discovery: EDAs, which are
evolutionary algorithms that use structure learning on each generation for
modeling the distribution of populations. The experiments show that when
IBMAP-HC is used to learn the structure, EDAs improve the convergence to the
optimum
Semi-supervised cross-entropy clustering with information bottleneck constraint
In this paper, we propose a semi-supervised clustering method, CEC-IB, that
models data with a set of Gaussian distributions and that retrieves clusters
based on a partial labeling provided by the user (partition-level side
information). By combining the ideas from cross-entropy clustering (CEC) with
those from the information bottleneck method (IB), our method trades between
three conflicting goals: the accuracy with which the data set is modeled, the
simplicity of the model, and the consistency of the clustering with side
information. Experiments demonstrate that CEC-IB has a performance comparable
to Gaussian mixture models (GMM) in a classical semi-supervised scenario, but
is faster, more robust to noisy labels, automatically determines the optimal
number of clusters, and performs well when not all classes are present in the
side information. Moreover, in contrast to other semi-supervised models, it can
be successfully applied in discovering natural subgroups if the partition-level
side information is derived from the top levels of a hierarchical clustering
Color Filtering Localization for Three-Dimensional Underwater Acoustic Sensor Networks
Accurate localization for mobile nodes has been an important and fundamental
problem in underwater acoustic sensor networks (UASNs). The detection
information returned from a mobile node is meaningful only if its location is
known. In this paper, we propose two localization algorithms based on color
filtering technology called PCFL and ACFL. PCFL and ACFL aim at collaboratively
accomplishing accurate localization of underwater mobile nodes with minimum
energy expenditure. They both adopt the overlapping signal region of task
anchors which can communicate with the mobile node directly as the current
sampling area. PCFL employs the projected distances between each of the task
projections and the mobile node, while ACFL adopts the direct distance between
each of the task anchors and the mobile node. Also the proportion factor of
distance is proposed to weight the RGB values. By comparing the nearness
degrees of the RGB sequences between the samples and the mobile node, samples
can be filtered out. And the normalized nearness degrees are considered as the
weighted standards to calculate coordinates of the mobile nodes. The simulation
results show that the proposed methods have excellent localization performance
and can timely localize the mobile node. The average localization error of PCFL
can decline by about 30.4% than the AFLA method.Comment: 18 pages, 11 figures, 2 table
Scaling Reinforcement Learning Paradigms for Motor Control
Reinforcement learning offers a general framework to explain reward related learning in artificial and biological motor control. However, current reinforcement learning methods rarely scale to high dimensional movement systems and mainly operate in discrete, low dimensional domains like game-playing, artificial toy problems, etc. This drawback makes them unsuitable for application to human or bio-mimetic motor control. In this poster, we look at promising approaches that can potentially scale and suggest a novel formulation of the actor-critic algorithm which takes steps towards alleviating the current shortcomings. We argue that methods based on greedy policies are not likely to scale into high-dimensional domains as they are problematic when used with function approximation a must when dealing with continuous domains. We adopt the path of direct policy gradient based policy improvements since they avoid the problems of unstabilizing dynamics encountered in traditional value iteration based updates. While regular policy gradient methods have demonstrated promising results in the domain of humanoid notor control, we demonstrate that these methods can be significantly improved using the natural policy gradient instead of the regular policy gradient. Based on this, it is proved that Kakades average natural policy gradient is indeed the true natural gradient. A general algorithm for estimating the natural gradient, the Natural Actor-Critic algorithm, is introduced. This algorithm converges with probability one to the nearest local minimum in Riemannian space of the cost function. The algorithm outperforms nonnatural policy gradients by far in a cart-pole balancing evaluation, and offers a promising route for the development of reinforcement learning for truly high-dimensionally continuous state-action systems. Keywords: Reinforcement learning, neurodynamic programming, actorcritic methods, policy gradient methods, natural policy gradien
- …