5,406 research outputs found
Detecting outlying subspaces for high-dimensional data: the new task, algorithms and performance
[Abstract]: In this paper, we identify a new task for studying the outlying degree (OD) of high-dimensional data, i.e. finding the subspaces (subsets of features)
in which the given points are outliers, which are called their outlying subspaces. Since the state-of-the-art outlier detection techniques fail to handle this
new problem, we propose a novel detection algorithm, called High-Dimension Outlying subspace Detection (HighDOD), to detect the outlying subspaces of
high-dimensional data efficiently. The intuitive idea of HighDOD is that we measure the OD of the point using the sum of distances between this point and its k nearest neighbors. Two heuristic pruning strategies are proposed to realize fast pruning in the subspace search and an efficient dynamic subspace search method with a sample-based learning process has been implemented. Experimental results show that HighDOD is efficient and outperforms other searching alternatives such as the naive top–down, bottom–up and random search methods, and the existing
outlier detection methods cannot fulfill this new task effectively
Randomized hybrid linear modeling by local best-fit flats
The hybrid linear modeling problem is to identify a set of d-dimensional
affine sets in a D-dimensional Euclidean space. It arises, for example, in
object tracking and structure from motion. The hybrid linear model can be
considered as the second simplest (behind linear) manifold model of data. In
this paper we will present a very simple geometric method for hybrid linear
modeling based on selecting a set of local best fit flats that minimize a
global l1 error measure. The size of the local neighborhoods is determined
automatically by the Jones' l2 beta numbers; it is proven under certain
geometric conditions that good local neighborhoods exist and are found by our
method. We also demonstrate how to use this algorithm for fast determination of
the number of affine subspaces. We give extensive experimental evidence
demonstrating the state of the art accuracy and speed of the algorithm on
synthetic and real hybrid linear data.Comment: To appear in the proceedings of CVPR 201
Fitness landscape of the cellular automata majority problem: View from the Olympus
In this paper we study cellular automata (CAs) that perform the computational
Majority task. This task is a good example of what the phenomenon of emergence
in complex systems is. We take an interest in the reasons that make this
particular fitness landscape a difficult one. The first goal is to study the
landscape as such, and thus it is ideally independent from the actual
heuristics used to search the space. However, a second goal is to understand
the features a good search technique for this particular problem space should
possess. We statistically quantify in various ways the degree of difficulty of
searching this landscape. Due to neutrality, investigations based on sampling
techniques on the whole landscape are difficult to conduct. So, we go exploring
the landscape from the top. Although it has been proved that no CA can perform
the task perfectly, several efficient CAs for this task have been found.
Exploiting similarities between these CAs and symmetries in the landscape, we
define the Olympus landscape which is regarded as the ''heavenly home'' of the
best local optima known (blok). Then we measure several properties of this
subspace. Although it is easier to find relevant CAs in this subspace than in
the overall landscape, there are structural reasons that prevent a searcher
from finding overfitted CAs in the Olympus. Finally, we study dynamics and
performance of genetic algorithms on the Olympus in order to confirm our
analysis and to find efficient CAs for the Majority problem with low
computational cost
Masking Strategies for Image Manifolds
We consider the problem of selecting an optimal mask for an image manifold,
i.e., choosing a subset of the pixels of the image that preserves the
manifold's geometric structure present in the original data. Such masking
implements a form of compressive sensing through emerging imaging sensor
platforms for which the power expense grows with the number of pixels acquired.
Our goal is for the manifold learned from masked images to resemble its full
image counterpart as closely as possible. More precisely, we show that one can
indeed accurately learn an image manifold without having to consider a large
majority of the image pixels. In doing so, we consider two masking methods that
preserve the local and global geometric structure of the manifold,
respectively. In each case, the process of finding the optimal masking pattern
can be cast as a binary integer program, which is computationally expensive but
can be approximated by a fast greedy algorithm. Numerical experiments show that
the relevant manifold structure is preserved through the data-dependent masking
process, even for modest mask sizes
Computation of multiple eigenvalues and generalized eigenvectors for matrices dependent on parameters
The paper develops Newton's method of finding multiple eigenvalues with one
Jordan block and corresponding generalized eigenvectors for matrices dependent
on parameters. It computes the nearest value of a parameter vector with a
matrix having a multiple eigenvalue of given multiplicity. The method also
works in the whole matrix space (in the absence of parameters). The approach is
based on the versal deformation theory for matrices. Numerical examples are
given. The implementation of the method in MATLAB code is available.Comment: 19 pages, 3 figure
- …