234 research outputs found
Spatially-Aware Autoencoders for Detecting Contextual Anomalies in Geo-Distributed Data
The huge amount of data generated by sensor networks enables many potential analyses. However, one important limiting factor for the analyses of sensor data is the possible presence of anomalies, which may affect the validity of any conclusion we could draw. This aspect motivates the adoption of a preliminary anomaly detection method. Existing methods usually do not consider the spatial nature of data generated by sensor networks. Properly modeling the spatial nature of the data, by explicitly considering spatial autocorrelation phenomena, has the potential to highlight the degree of agreement or disagreement of multiple sensor measurements located in different geographical positions. The intuition is that one could improve anomaly detection performance by considering the spatial context. In this paper, we propose a spatially-aware anomaly detection method based on a stacked auto-encoder architecture. Specifically, the proposed architecture includes a specific encoding stage that models the spatial autocorrelation in data observed at different locations. Finally, a distance-based approach leverages the embedding features returned by the auto-encoder to identify possible anomalies. Our experimental evaluation on real-world geo-distributed data collected from renewable energy plants shows the effectiveness of the proposed method, also when compared to state-of-the-art anomaly detection methods
Guest Editorial Special Issue on Recent Advances in Theory, Methodology, and Applications of Imbalanced Learning
Imbalanced learning is a challenging task in machine learning, faced by practitioners, and intensively investigated by researchers from a wide range of communities. However, as pointed out in the book titled “ Imbalanced Learning: Foundations, Algorithms, and Applications ” and collectively authored by experts in the field, many if not most of the approaches to imbalanced learning are heuristic and ad hoc in nature, hence leaving many questions unanswered. To fill this gap, the aim of this Special Issue is to collect recent research works that focus on the theory, methodology, and applications of imbalanced learning. After carefully reviewing a large number of submissions, we selected 15 works to be included in this Special Issue. These works can be roughly categorized into three types: deep-learning-based methods (6), methods based on other machine-learning paradigms (7), and empirical comparative studies (2)
ECHAD: Embedding-Based Change Detection from Multivariate Time Series in Smart Grids
Smart grids are power grids where clients may actively participate in energy production, storage and distribution. Smart grid management raises several challenges, including the possible changes and evolutions in terms of energy consumption and production, that must be taken into account in order to properly regulate the energy distribution. In this context, machine learning methods can be fruitfully adopted to support the analysis and to predict the behavior of smart grids, by exploiting the large amount of streaming data generated by sensor networks. In this article, we propose a novel change detection method, called ECHAD (Embedding-based CHAnge Detection), that leverages embedding techniques, one-class learning, and a dynamic detection approach that incrementally updates the learned model to reflect the new data distribution. Our experiments show that ECHAD achieves optimal performances on synthetic data representing challenging scenarios. Moreover, a qualitative analysis of the results obtained on real data of a real power grid reveals the quality of the change detection of ECHAD. Specifically, a comparison with state-of-the-art approaches shows the ability of ECHAD in identifying additional relevant changes, not detected by competitors, avoiding false positive detections
Index of balanced accuracy: a performance measure for skewed class distributions
This paper introduces a new metric, named Index of Balanced Accuracy, for evaluating learning processes in two-class imbalanced domains. The method combines an unbiased index of its overall accuracy and a measure about how dominant is the class with the highest individual accuracy rate. Some theoretical examples are conducted to illustrate the benefits of the new metric over other well-known performance measures. Finally, a number of experiments demonstrate the consistency and validity of the evaluation method here propose
A Methodology for the Diagnostic of Aircraft Engine Based on Indicators Aggregation
Aircraft engine manufacturers collect large amount of engine related data
during flights. These data are used to detect anomalies in the engines in order
to help companies optimize their maintenance costs. This article introduces and
studies a generic methodology that allows one to build automatic early signs of
anomaly detection in a way that is understandable by human operators who make
the final maintenance decision. The main idea of the method is to generate a
very large number of binary indicators based on parametric anomaly scores
designed by experts, complemented by simple aggregations of those scores. The
best indicators are selected via a classical forward scheme, leading to a much
reduced number of indicators that are tuned to a data set. We illustrate the
interest of the method on simulated data which contain realistic early signs of
anomalies.Comment: Proceedings of the 14th Industrial Conference, ICDM 2014, St.
Petersburg : Russian Federation (2014
Class Balanced Similarity-Based Instance Transfer Learning for Botnet Family Classification
The use of Transfer Learning algorithms for enhancing the performance of machine learning algorithms has gained attention over the last decade. In this paper we introduce an extension and evaluation of our novel approach Similarity Based Instance Transfer Learning (SBIT). The extended version is denoted Class Balanced SBIT (or CB-SBIT for short) because it ensures the dataset resulting after instance transfer does not contain class imbalance. We compare the performance of CB-SBIT against the original SBIT algorithm. In addition, we compare its performance against that of the classical Synthetic Minority Over-sampling Technique (SMOTE) using network tra ffic data. We also compare the performance of CB-SBIT against the performance of the open source transfer learning algorithm TransferBoost using text data. Our results show that CB-SBIT outperforms the original SBIT and SMOTE using varying sizes of network tra ffic data but falls short when
compared to TransferBoost using text data
A Multi-Armed Bandit to Smartly Select a Training Set from Big Medical Data
With the availability of big medical image data, the selection of an adequate
training set is becoming more important to address the heterogeneity of
different datasets. Simply including all the data does not only incur high
processing costs but can even harm the prediction. We formulate the smart and
efficient selection of a training dataset from big medical image data as a
multi-armed bandit problem, solved by Thompson sampling. Our method assumes
that image features are not available at the time of the selection of the
samples, and therefore relies only on meta information associated with the
images. Our strategy simultaneously exploits data sources with high chances of
yielding useful samples and explores new data regions. For our evaluation, we
focus on the application of estimating the age from a brain MRI. Our results on
7,250 subjects from 10 datasets show that our approach leads to higher accuracy
while only requiring a fraction of the training data.Comment: MICCAI 2017 Proceeding
A Decision Tree Approach to Predicting Recidivism in Domestic Violence
Domestic violence (DV) is a global social and public health issue that is
highly gendered. Being able to accurately predict DV recidivism, i.e.,
re-offending of a previously convicted offender, can speed up and improve risk
assessment procedures for police and front-line agencies, better protect
victims of DV, and potentially prevent future re-occurrences of DV. Previous
work in DV recidivism has employed different classification techniques,
including decision tree (DT) induction and logistic regression, where the main
focus was on achieving high prediction accuracy. As a result, even the diagrams
of trained DTs were often too difficult to interpret due to their size and
complexity, making decision-making challenging. Given there is often a
trade-off between model accuracy and interpretability, in this work our aim is
to employ DT induction to obtain both interpretable trees as well as high
prediction accuracy. Specifically, we implement and evaluate different
approaches to deal with class imbalance as well as feature selection. Compared
to previous work in DV recidivism prediction that employed logistic regression,
our approach can achieve comparable area under the ROC curve results by using
only 3 of 11 available features and generating understandable decision trees
that contain only 4 leaf nodes.Comment: 12 pages; Accepted at The 2018 Pacific-Asia Conference on Knowledge
Discovery and Data Mining (PAKDD
Visual Compositional Learning for Human-Object Interaction Detection
Human-Object interaction (HOI) detection aims to localize and infer
relationships between human and objects in an image. It is challenging because
an enormous number of possible combinations of objects and verbs types forms a
long-tail distribution. We devise a deep Visual Compositional Learning (VCL)
framework, which is a simple yet efficient framework to effectively address
this problem. VCL first decomposes an HOI representation into object and verb
specific features, and then composes new interaction samples in the feature
space via stitching the decomposed features. The integration of decomposition
and composition enables VCL to share object and verb features among different
HOI samples and images, and to generate new interaction samples and new types
of HOI, and thus largely alleviates the long-tail distribution problem and
benefits low-shot or zero-shot HOI detection. Extensive experiments demonstrate
that the proposed VCL can effectively improve the generalization of HOI
detection on HICO-DET and V-COCO and outperforms the recent state-of-the-art
methods on HICO-DET. Code is available at https://github.com/zhihou7/VCL.Comment: Accepted in ECCV202
Sit-to-Stand Analysis in the Wild using Silhouettes for Longitudinal Health Monitoring
We present the first fully automated Sit-to-Stand or Stand-to-Sit (StS)
analysis framework for long-term monitoring of patients in free-living
environments using video silhouettes. Our method adopts a coarse-to-fine time
localisation approach, where a deep learning classifier identifies possible StS
sequences from silhouettes, and a smart peak detection stage provides fine
localisation based on 3D bounding boxes. We tested our method on data from real
homes of participants and monitored patients undergoing total hip or knee
replacement. Our results show 94.4% overall accuracy in the coarse localisation
and an error of 0.026 m/s in the speed of ascent measurement, highlighting
important trends in the recuperation of patients who underwent surgery
- …