9,929 research outputs found
Steganographer Identification
Conventional steganalysis detects the presence of steganography within single
objects. In the real-world, we may face a complex scenario that one or some of
multiple users called actors are guilty of using steganography, which is
typically defined as the Steganographer Identification Problem (SIP). One might
use the conventional steganalysis algorithms to separate stego objects from
cover objects and then identify the guilty actors. However, the guilty actors
may be lost due to a number of false alarms. To deal with the SIP, most of the
state-of-the-arts use unsupervised learning based approaches. In their
solutions, each actor holds multiple digital objects, from which a set of
feature vectors can be extracted. The well-defined distances between these
feature sets are determined to measure the similarity between the corresponding
actors. By applying clustering or outlier detection, the most suspicious
actor(s) will be judged as the steganographer(s). Though the SIP needs further
study, the existing works have good ability to identify the steganographer(s)
when non-adaptive steganographic embedding was applied. In this chapter, we
will present foundational concepts and review advanced methodologies in SIP.
This chapter is self-contained and intended as a tutorial introducing the SIP
in the context of media steganography.Comment: A tutorial with 30 page
A Parametric Framework for the Comparison of Methods of Very Robust Regression
There are several methods for obtaining very robust estimates of regression
parameters that asymptotically resist 50% of outliers in the data. Differences
in the behaviour of these algorithms depend on the distance between the
regression data and the outliers. We introduce a parameter that
defines a parametric path in the space of models and enables us to study, in a
systematic way, the properties of estimators as the groups of data move from
being far apart to close together. We examine, as a function of , the
variance and squared bias of five estimators and we also consider their power
when used in the detection of outliers. This systematic approach provides tools
for gaining knowledge and better understanding of the properties of robust
estimators.Comment: Published in at http://dx.doi.org/10.1214/13-STS437 the Statistical
Science (http://www.imstat.org/sts/) by the Institute of Mathematical
Statistics (http://www.imstat.org
A Local Density-Based Approach for Local Outlier Detection
This paper presents a simple but effective density-based outlier detection
approach with the local kernel density estimation (KDE). A Relative
Density-based Outlier Score (RDOS) is introduced to measure the local
outlierness of objects, in which the density distribution at the location of an
object is estimated with a local KDE method based on extended nearest neighbors
of the object. Instead of using only nearest neighbors, we further consider
reverse nearest neighbors and shared nearest neighbors of an object for density
distribution estimation. Some theoretical properties of the proposed RDOS
including its expected value and false alarm probability are derived. A
comprehensive experimental study on both synthetic and real-life data sets
demonstrates that our approach is more effective than state-of-the-art outlier
detection methods.Comment: 22 pages, 14 figures, submitted to Pattern Recognition Letter
A Near-linear Time Approximation Algorithm for Angle-based Outlier Detection in High-dimensional Data
Outlier mining in d-dimensional point sets is a fundamental and well studied data mining task due to its variety of ap-plications. Most such applications arise in high-dimensional domains. A bottleneck of existing approaches is that implicit or explicit assessments on concepts of distance or nearest neighbor are deteriorated in high-dimensional data. Follow-ing up on the work of Kriegel et al. (KDD ’08), we inves-tigate the use of angle-based outlier factor in mining high-dimensional outliers. While their algorithm runs in cubic time (with a quadratic time heuristic), we propose a novel random projection-based technique that is able to estimate the angle-based outlier factor for all data points in time near-linear in the size of the data. Also, our approach is suitable to be performed in parallel environment to achieve a parallel speedup. We introduce a theoretical analysis of the quality of approximation to guarantee the reliability of our estima-tion algorithm. The empirical experiments on synthetic and real world data sets demonstrate that our approach is effi-cient and scalable to very large high-dimensional data sets
Finding an unknown number of multivariate outliers
We use the forward search to provide robust Mahalanobis distances to detect the presence of outliers in a sample of multivariate normal data. Theoretical results on order statistics and on estimation in truncated samples provide the distribution of our test statistic. We also introduce several new robust distances with associated distributional results. Comparisons of our procedure with tests using other robust Mahalanobis distances show the good size and high power of our procedure. We also provide a unification of results on correction factors for estimation from truncated samples
- …