Search CORE

727 research outputs found

A survey of outlier detection methodologies

Author: Austin J.
Hodge V.J.
Publication venue: 'Springer Science and Business Media LLC'
Publication date: 01/01/2004
Field of study

Outlier detection has been used for centuries to detect and, where appropriate, remove anomalous observations from data. Outliers arise due to mechanical faults, changes in system behaviour, fraudulent behaviour, human error, instrument error or simply through natural deviations in populations. Their detection can identify system faults and fraud before they escalate with potentially catastrophic consequences. It can identify errors and remove their contaminating effect on the data set and as such to purify the data for processing. The original outlier detection methods were arbitrary but now, principled and systematic techniques are used, drawn from the full gamut of Computer Science and Statistics. In this paper, we introduce a survey of contemporary techniques for outlier detection. We identify their respective motivations and distinguish their advantages and disadvantages in a comparative review

CiteSeerX

Crossref

White Rose Research Online

Optimally fast incremental Manhattan plane embedding and planar tight span construction

Author: Eppstein David
Publication venue
Publication date: 01/01/2011
Field of study

We describe a data structure, a rectangular complex, that can be used to represent hyperconvex metric spaces that have the same topology (although not necessarily the same distance function) as subsets of the plane. We show how to use this data structure to construct the tight span of a metric space given as an n x n distance matrix, when the tight span is homeomorphic to a subset of the plane, in time O(n^2), and to add a single point to a planar tight span in time O(n). As an application of this construction, we show how to test whether a given finite metric space embeds isometrically into the Manhattan plane in time O(n^2), and add a single point to the space and re-test whether it has such an embedding in time O(n).Comment: 39 pages, 15 figure

arXiv.org e-Print Archive

CiteSeerX

Directory of Open Access Journals

Journal of Computational Geometry (JoCG - Carleton University, Computational Geometry Lab)

An O(n^{2.75}) algorithm for online topological ordering

Author: Ajwani Deepak
Friedrich Tobias
Meyer Ulrich
Publication venue
Publication date: 01/01/2006
Field of study

We present a simple algorithm which maintains the topological order of a directed acyclic graph with n nodes under an online edge insertion sequence in O(n^{2.75}) time, independent of the number of edges m inserted. For dense DAGs, this is an improvement over the previous best result of O(min(m^{3/2} log(n), m^{3/2} + n^2 log(n)) by Katriel and Bodlaender. We also provide an empirical comparison of our algorithm with other algorithms for online topological sorting. Our implementation outperforms them on certain hard instances while it is still competitive on random edge insertion sequences leading to complete DAGs.Comment: 20 pages, long version of SWAT'06 pape

arXiv.org e-Print Archive

CiteSeerX

MPG.PuRe

Search Through Systematic Set Enumeration

Author: Rymon Ron
Publication venue: ScholarlyCommons
Publication date: 01/08/1992
Field of study

In many problem domains, solutions take the form of unordered sets. We present the Set-Enumerations (SE)-tree - a vehicle for representing sets and/or enumerating them in a best-first fashion. We demonstrate its usefulness as the basis for a unifying search-based framework for domains where minimal (maximal) elements of a power set are targeted, where minimal (maximal) partial instantiations of a set of variables are sought, or where a composite decision is not dependent on the order in which its primitive component-decisions are taken. Particular instantiations of SE-tree-based algorithms for some AI problem domains are used to demonstrate the general features of the approach. These algorithms are compared theoretically and empirically with current algorithms

ScholarlyCommons@Penn

On dualization in products of forests, in

Author: E. Boros
H. Mannila
J. C. Bioch
K. Makino
M. L. Fredman
R. Agrawal
R. C. Read
T. Eiter
Publication venue
Publication date: 01/01/2002
Field of study

Abstract. Let P = P1 ×...×Pn be the product of n partially ordered sets, each with an acyclic precedence graph in which either the in-degree or the out-degree of each element is bounded. Given a subset A⊆P,it is shown that the set of maximal independent elements of A in P can be incrementally generated in quasi-polynomial time. We discuss some applications in data mining related to this dualization problem

CiteSeerX

Crossref

MPG.PuRe

Connectivity in the Presence of an Opponent

Author: Khoussainov Bakh
Liang Zihui
Takisaka Toru
Xiao Mingyu
Publication venue: LIPIcs - Leibniz International Proceedings in Informatics. 31st Annual European Symposium on Algorithms (ESA 2023)
Publication date: 01/01/2023
Field of study

Dagstuhl Research Online Publication Server

Efficient Loop Detection in Forwarding Networks and Representing Atoms in a Field of Sets

Author: Boufkhad Yacine
Linguaglossa Leonardo
Mathieu Fabien
Perino Diego
Viennot Laurent
Publication venue
Publication date: 05/09/2018
Field of study

The problem of detecting loops in a forwarding network is known to be NP-complete when general rules such as wildcard expressions are used. Yet, network analyzer tools such as Netplumber (Kazemian et al., NSDI'13) or Veriflow (Khurshid et al., NSDI'13) efficiently solve this problem in networks with thousands of forwarding rules. In this paper, we complement such experimental validation of practical heuristics with the first provably efficient algorithm in the context of general rules. Our main tool is a canonical representation of the atoms (i.e. the minimal non-empty sets) of the field of sets generated by a collection of sets. This tool is particularly suited when the intersection of two sets can be efficiently computed and represented. In the case of forwarding networks, each forwarding rule is associated with the set of packet headers it matches. The atoms then correspond to classes of headers with same behavior in the network. We propose an algorithm for atom computation and provide the first polynomial time algorithm for loop detection in terms of number of classes (which can be exponential in general). This contrasts with previous methods that can be exponential, even in simple cases with linear number of classes. Second, we introduce a notion of network dimension captured by the overlapping degree of forwarding rules. The values of this measure appear to be very low in practice and constant overlapping degree ensures polynomial number of header classes. Forwarding loop detection is thus polynomial in forwarding networks with constant overlapping degree

arXiv.org e-Print Archive

INRIA a CCSD electronic archive server

HAL Descartes

Hal-Diderot

A higher-dimensional homologically persistent skeleton

Author: Kalisnik Sara
Kurlin Vitaliy
Lesnik Davorin
Publication venue: 'Elsevier BV'
Publication date: 01/01/2019
Field of study

Real data is often given as a point cloud, i.e. a finite set of points with pairwise distances between them. An important problem is to detect the topological shape of data — for example, to approximate a point cloud by a low-dimensional non-linear subspace such as an embedded graph or a simplicial complex. Classical clustering methods and principal component analysis work well when data points split into good clusters or lie near linear subspaces of a Euclidean space. Methods from topological data analysis in general metric spaces detect more complicated patterns such as holes and voids that persist for a large interval in a 1-parameter family of shapes associated to a cloud. These features can be visualized in the form of a 1-dimensional homologically persistent skeleton, which optimally extends a minimum spanning tree of a point cloud to a graph with cycles. We generalize this skeleton to higher dimensions and prove its optimality among all complexes that preserve topological features of data at any scale

University of Liverpool Repository

MPG.PuRe