1,967 research outputs found
HOS-Miner: a system for detecting outlying subspaces of high-dimensional data
[Abstract]: We identify a new and interesting high-dimensional outlier detection problem in this paper that is, detecting the subspaces in which given data points are outliers. We call the subspaces in which a data point is an outlier as its Outlying Subspaces. In this paper, we will propose the prototype of a dynamic subspace search system, called HOS-Miner (HOS stands for High-dimensional Outlying Subspaces) that utilizes a sample-based learning process to effectively identify the outlying subspaces of a given point
Outlier Detection from Network Data with Subnetwork Interpretation
Detecting a small number of outliers from a set of data observations is
always challenging. This problem is more difficult in the setting of multiple
network samples, where computing the anomalous degree of a network sample is
generally not sufficient. In fact, explaining why the network is exceptional,
expressed in the form of subnetwork, is also equally important. In this paper,
we develop a novel algorithm to address these two key problems. We treat each
network sample as a potential outlier and identify subnetworks that mostly
discriminate it from nearby regular samples. The algorithm is developed in the
framework of network regression combined with the constraints on both network
topology and L1-norm shrinkage to perform subnetwork discovery. Our method thus
goes beyond subspace/subgraph discovery and we show that it converges to a
global optimum. Evaluation on various real-world network datasets demonstrates
that our algorithm not only outperforms baselines in both network and high
dimensional setting, but also discovers highly relevant and interpretable local
subnetworks, further enhancing our understanding of anomalous networks
Detecting outlying subspaces for high-dimensional data: a heuristic search approach
[Abstract]: In this paper, we identify a new task for studying the out-lying degree of high-dimensional data, i.e. finding the sub-spaces (subset of features) in which given points are out-liers, and propose a novel detection algorithm, called High-D Outlying subspace Detection (HighDOD). We measure the outlying degree of the point using the sum of distances between this point and its k nearest neighbors. Heuristic pruning strategies are proposed to realize fast pruning in the subspace search and an efficient dynamic subspace search
method with a sample-based learning process has been im-
plemented. Experimental results show that HighDOD is efficient and outperforms other searching alternatives such as the naive top-down, bottom-up and random search methods. Points in these sparse subspaces are assumed to be
the outliers. While knowing which data points are the
outliers can be useful, in many applications, it is more
important to identify the subspaces in which a given
point is an outlier, which motivates the proposal of a
new technique in this paper to handle this new task
Detecting outlying subspaces for high-dimensional data: the new task, algorithms and performance
[Abstract]: In this paper, we identify a new task for studying the outlying degree (OD) of high-dimensional data, i.e. finding the subspaces (subsets of features)
in which the given points are outliers, which are called their outlying subspaces. Since the state-of-the-art outlier detection techniques fail to handle this
new problem, we propose a novel detection algorithm, called High-Dimension Outlying subspace Detection (HighDOD), to detect the outlying subspaces of
high-dimensional data efficiently. The intuitive idea of HighDOD is that we measure the OD of the point using the sum of distances between this point and its k nearest neighbors. Two heuristic pruning strategies are proposed to realize fast pruning in the subspace search and an efficient dynamic subspace search method with a sample-based learning process has been implemented. Experimental results show that HighDOD is efficient and outperforms other searching alternatives such as the naive top–down, bottom–up and random search methods, and the existing
outlier detection methods cannot fulfill this new task effectively
Empirical performance analysis of two algorithms for mining intentional knowledge of distance-based outliers
This thesis studies the empirical analysis of two algorithms, Uplattice and Jumplattice for mining intentional knowledge of distance-based outliers [19]. These algorithms detect strongest and weak outliers among them. Finding outliers is an important task required in major applications such as credit-card fraud detection, and the NHL statistical studies. Datasets of varying sizes have been tested to analyze the empirical values of these two algorithms. Effective data structures have been used to gain efficiency in memory-performance. The two algorithms provide intentional knowledge of the detected outliers which determines as to why an identified outlier is exceptional. This knowledge helps the user to analyze the validity of outliers and hence provides an improved understanding of the data
Contextual Outlier Interpretation
Outlier detection plays an essential role in many data-driven applications to
identify isolated instances that are different from the majority. While many
statistical learning and data mining techniques have been used for developing
more effective outlier detection algorithms, the interpretation of detected
outliers does not receive much attention. Interpretation is becoming
increasingly important to help people trust and evaluate the developed models
through providing intrinsic reasons why the certain outliers are chosen. It is
difficult, if not impossible, to simply apply feature selection for explaining
outliers due to the distinct characteristics of various detection models,
complicated structures of data in certain applications, and imbalanced
distribution of outliers and normal instances. In addition, the role of
contrastive contexts where outliers locate, as well as the relation between
outliers and contexts, are usually overlooked in interpretation. To tackle the
issues above, in this paper, we propose a novel Contextual Outlier
INterpretation (COIN) method to explain the abnormality of existing outliers
spotted by detectors. The interpretability for an outlier is achieved from
three aspects: outlierness score, attributes that contribute to the
abnormality, and contextual description of its neighborhoods. Experimental
results on various types of datasets demonstrate the flexibility and
effectiveness of the proposed framework compared with existing interpretation
approaches
Homophily Outlier Detection in Non-IID Categorical Data
Most of existing outlier detection methods assume that the outlier factors
(i.e., outlierness scoring measures) of data entities (e.g., feature values and
data objects) are Independent and Identically Distributed (IID). This
assumption does not hold in real-world applications where the outlierness of
different entities is dependent on each other and/or taken from different
probability distributions (non-IID). This may lead to the failure of detecting
important outliers that are too subtle to be identified without considering the
non-IID nature. The issue is even intensified in more challenging contexts,
e.g., high-dimensional data with many noisy features. This work introduces a
novel outlier detection framework and its two instances to identify outliers in
categorical data by capturing non-IID outlier factors. Our approach first
defines and incorporates distribution-sensitive outlier factors and their
interdependence into a value-value graph-based representation. It then models
an outlierness propagation process in the value graph to learn the outlierness
of feature values. The learned value outlierness allows for either direct
outlier detection or outlying feature selection. The graph representation and
mining approach is employed here to well capture the rich non-IID
characteristics. Our empirical results on 15 real-world data sets with
different levels of data complexities show that (i) the proposed outlier
detection methods significantly outperform five state-of-the-art methods at the
95%/99% confidence level, achieving 10%-28% AUC improvement on the 10 most
complex data sets; and (ii) the proposed feature selection methods
significantly outperform three competing methods in enabling subsequent outlier
detection of two different existing detectors.Comment: To appear in Data Ming and Knowledge Discovery Journa
ALMA and Herschel Observations of the Prototype Dusty and Polluted White Dwarf G29-38
ALMA Cycle 0 and Herschel PACS observations are reported for the prototype,
nearest, and brightest example of a dusty and polluted white dwarf, G29-38.
These long wavelength programs attempted to detect an outlying, parent
population of bodies at 1-100 AU, from which originates the disrupted
planetesimal debris that is observed within 0.01 AU and which exhibits L_IR/L =
0.039. No associated emission sources were detected in any of the data down to
L_IR/L ~ 1e-4, generally ruling out cold dust masses greater than 1e24 - 1e25 g
for reasonable grain sizes and properties in orbital regions corresponding to
evolved versions of both asteroid and Kuiper belt analogs. Overall, these null
detections are consistent with models of long-term collisional evolution in
planetesimal disks, and the source regions for the disrupted parent bodies at
stars like G29-38 may only be salient in exceptional circumstances, such as a
recent instability. A larger sample of polluted white dwarfs, targeted with the
full ALMA array, has the potential to unambiguously identify the parent
source(s) of their planetary debris.Comment: 8 pages, 5 figures and 1 table. Accepted to MNRA
- …