1,962 research outputs found
Detecting outlying subspaces for high-dimensional data: the new task, algorithms and performance
[Abstract]: In this paper, we identify a new task for studying the outlying degree (OD) of high-dimensional data, i.e. finding the subspaces (subsets of features)
in which the given points are outliers, which are called their outlying subspaces. Since the state-of-the-art outlier detection techniques fail to handle this
new problem, we propose a novel detection algorithm, called High-Dimension Outlying subspace Detection (HighDOD), to detect the outlying subspaces of
high-dimensional data efficiently. The intuitive idea of HighDOD is that we measure the OD of the point using the sum of distances between this point and its k nearest neighbors. Two heuristic pruning strategies are proposed to realize fast pruning in the subspace search and an efficient dynamic subspace search method with a sample-based learning process has been implemented. Experimental results show that HighDOD is efficient and outperforms other searching alternatives such as the naive top–down, bottom–up and random search methods, and the existing
outlier detection methods cannot fulfill this new task effectively
Orbitopal Fixing
The topic of this paper are integer programming models in which a subset of
0/1-variables encode a partitioning of a set of objects into disjoint subsets.
Such models can be surprisingly hard to solve by branch-and-cut algorithms if
the order of the subsets of the partition is irrelevant, since this kind of
symmetry unnecessarily blows up the search tree. We present a general tool,
called orbitopal fixing, for enhancing the capabilities of branch-and-cut
algorithms in solving such symmetric integer programming models. We devise a
linear time algorithm that, applied at each node of the search tree, removes
redundant parts of the tree produced by the above mentioned symmetry. The
method relies on certain polyhedra, called orbitopes, which have been
introduced bei Kaibel and Pfetsch (Math. Programm. A, 114 (2008), 1-36). It
does, however, not explicitly add inequalities to the model. Instead, it uses
certain fixing rules for variables. We demonstrate the computational power of
orbitopal fixing at the example of a graph partitioning problem.Comment: 22 pages, revised and extended version of a previous version that has
appeared under the same title in Proc. IPCO 200
Using rule extraction to improve the comprehensibility of predictive models.
Whereas newer machine learning techniques, like artifficial neural net-works and support vector machines, have shown superior performance in various benchmarking studies, the application of these techniques remains largely restricted to research environments. A more widespread adoption of these techniques is foiled by their lack of explanation capability which is required in some application areas, like medical diagnosis or credit scoring. To overcome this restriction, various algorithms have been proposed to extract a meaningful description of the underlying `blackbox' models. These algorithms' dual goal is to mimic the behavior of the black box as closely as possible while at the same time they have to ensure that the extracted description is maximally comprehensible. In this research report, we first develop a formal definition of`rule extraction and comment on the inherent trade-off between accuracy and comprehensibility. Afterwards, we develop a taxonomy by which rule extraction algorithms can be classiffied and discuss some criteria by which these algorithms can be evaluated. Finally, an in-depth review of the most important algorithms is given.This report is concluded by pointing out some general shortcomings of existing techniques and opportunities for future research.Models; Model; Algorithms; Criteria; Opportunities; Research; Learning; Neural networks; Networks; Performance; Benchmarking; Studies; Area; Credit; Credit scoring; Behavior; Time;
Pruning Attributes From Data Cubes with Diamond Dicing
Data stored in a data warehouse are inherently multidimensional, but most data-pruning techniques (such as iceberg and top-k queries) are unidimensional. However, analysts need to issue multidimensional queries. For example, an analyst may need to select not just the most profitable stores or--separately--the most profitable products, but simultaneous sets of stores and products fulfilling some profitability constraints. To fill this need, we propose a new operator, the diamond dice. Because of the interaction between dimensions, the computation of diamonds is challenging. We present the first diamond-dicing experiments on large data sets. Experiments show that we can compute diamond cubes over fact tables containing 100 million facts in less than 35 minutes using a standard PC
A Novel Method for the Absolute Pose Problem with Pairwise Constraints
Absolute pose estimation is a fundamental problem in computer vision, and it
is a typical parameter estimation problem, meaning that efforts to solve it
will always suffer from outlier-contaminated data. Conventionally, for a fixed
dimensionality d and the number of measurements N, a robust estimation problem
cannot be solved faster than O(N^d). Furthermore, it is almost impossible to
remove d from the exponent of the runtime of a globally optimal algorithm.
However, absolute pose estimation is a geometric parameter estimation problem,
and thus has special constraints. In this paper, we consider pairwise
constraints and propose a globally optimal algorithm for solving the absolute
pose estimation problem. The proposed algorithm has a linear complexity in the
number of correspondences at a given outlier ratio. Concretely, we first
decouple the rotation and the translation subproblems by utilizing the pairwise
constraints, and then we solve the rotation subproblem using the
branch-and-bound algorithm. Lastly, we estimate the translation based on the
known rotation by using another branch-and-bound algorithm. The advantages of
our method are demonstrated via thorough testing on both synthetic and
real-world dataComment: 10 pages, 7figure
- …