26,731 research outputs found
Contextual Outlier Interpretation
Outlier detection plays an essential role in many data-driven applications to
identify isolated instances that are different from the majority. While many
statistical learning and data mining techniques have been used for developing
more effective outlier detection algorithms, the interpretation of detected
outliers does not receive much attention. Interpretation is becoming
increasingly important to help people trust and evaluate the developed models
through providing intrinsic reasons why the certain outliers are chosen. It is
difficult, if not impossible, to simply apply feature selection for explaining
outliers due to the distinct characteristics of various detection models,
complicated structures of data in certain applications, and imbalanced
distribution of outliers and normal instances. In addition, the role of
contrastive contexts where outliers locate, as well as the relation between
outliers and contexts, are usually overlooked in interpretation. To tackle the
issues above, in this paper, we propose a novel Contextual Outlier
INterpretation (COIN) method to explain the abnormality of existing outliers
spotted by detectors. The interpretability for an outlier is achieved from
three aspects: outlierness score, attributes that contribute to the
abnormality, and contextual description of its neighborhoods. Experimental
results on various types of datasets demonstrate the flexibility and
effectiveness of the proposed framework compared with existing interpretation
approaches
Unsupervised routine discovery in egocentric photo-streams
The routine of a person is defined by the occurrence of activities throughout
different days, and can directly affect the person's health. In this work, we
address the recognition of routine related days. To do so, we rely on
egocentric images, which are recorded by a wearable camera and allow to monitor
the life of the user from a first-person view perspective. We propose an
unsupervised model that identifies routine related days, following an outlier
detection approach. We test the proposed framework over a total of 72 days in
the form of photo-streams covering around 2 weeks of the life of 5 different
camera wearers. Our model achieves an average of 76% Accuracy and 68% Weighted
F-Score for all the users. Thus, we show that our framework is able to
recognise routine related days and opens the door to the understanding of the
behaviour of people
Automatic Hyperparameter Tuning Method for Local Outlier Factor, with Applications to Anomaly Detection
In recent years, there have been many practical applications of anomaly
detection such as in predictive maintenance, detection of credit fraud, network
intrusion, and system failure. The goal of anomaly detection is to identify in
the test data anomalous behaviors that are either rare or unseen in the
training data. This is a common goal in predictive maintenance, which aims to
forecast the imminent faults of an appliance given abundant samples of normal
behaviors. Local outlier factor (LOF) is one of the state-of-the-art models
used for anomaly detection, but the predictive performance of LOF depends
greatly on the selection of hyperparameters. In this paper, we propose a novel,
heuristic methodology to tune the hyperparameters in LOF. A tuned LOF model
that uses the proposed method shows good predictive performance in both
simulations and real data sets.Comment: 15 pages, 5 figure
Multimedia Semantic Integrity Assessment Using Joint Embedding Of Images And Text
Real world multimedia data is often composed of multiple modalities such as
an image or a video with associated text (e.g. captions, user comments, etc.)
and metadata. Such multimodal data packages are prone to manipulations, where a
subset of these modalities can be altered to misrepresent or repurpose data
packages, with possible malicious intent. It is, therefore, important to
develop methods to assess or verify the integrity of these multimedia packages.
Using computer vision and natural language processing methods to directly
compare the image (or video) and the associated caption to verify the integrity
of a media package is only possible for a limited set of objects and scenes. In
this paper, we present a novel deep learning-based approach for assessing the
semantic integrity of multimedia packages containing images and captions, using
a reference set of multimedia packages. We construct a joint embedding of
images and captions with deep multimodal representation learning on the
reference dataset in a framework that also provides image-caption consistency
scores (ICCSs). The integrity of query media packages is assessed as the
inlierness of the query ICCSs with respect to the reference dataset. We present
the MultimodAl Information Manipulation dataset (MAIM), a new dataset of media
packages from Flickr, which we make available to the research community. We use
both the newly created dataset as well as Flickr30K and MS COCO datasets to
quantitatively evaluate our proposed approach. The reference dataset does not
contain unmanipulated versions of tampered query packages. Our method is able
to achieve F1 scores of 0.75, 0.89 and 0.94 on MAIM, Flickr30K and MS COCO,
respectively, for detecting semantically incoherent media packages.Comment: *Ayush Jaiswal and Ekraam Sabir contributed equally to the work in
this pape
- …