1,600 research outputs found
Calibration of One-Class SVM for MV set estimation
A general approach for anomaly detection or novelty detection consists in
estimating high density regions or Minimum Volume (MV) sets. The One-Class
Support Vector Machine (OCSVM) is a state-of-the-art algorithm for estimating
such regions from high dimensional data. Yet it suffers from practical
limitations. When applied to a limited number of samples it can lead to poor
performance even when picking the best hyperparameters. Moreover the solution
of OCSVM is very sensitive to the selection of hyperparameters which makes it
hard to optimize in an unsupervised setting. We present a new approach to
estimate MV sets using the OCSVM with a different choice of the parameter
controlling the proportion of outliers. The solution function of the OCSVM is
learnt on a training set and the desired probability mass is obtained by
adjusting the offset on a test set to prevent overfitting. Models learnt on
different train/test splits are then aggregated to reduce the variance induced
by such random splits. Our approach makes it possible to tune the
hyperparameters automatically and obtain nested set estimates. Experimental
results show that our approach outperforms the standard OCSVM formulation while
suffering less from the curse of dimensionality than kernel density estimates.
Results on actual data sets are also presented.Comment: IEEE DSAA' 2015, Oct 2015, Paris, Franc
How to Evaluate the Quality of Unsupervised Anomaly Detection Algorithms?
When sufficient labeled data are available, classical criteria based on
Receiver Operating Characteristic (ROC) or Precision-Recall (PR) curves can be
used to compare the performance of un-supervised anomaly detection algorithms.
However , in many situations, few or no data are labeled. This calls for
alternative criteria one can compute on non-labeled data. In this paper, two
criteria that do not require labels are empirically shown to discriminate
accurately (w.r.t. ROC or PR based criteria) between algorithms. These criteria
are based on existing Excess-Mass (EM) and Mass-Volume (MV) curves, which
generally cannot be well estimated in large dimension. A methodology based on
feature sub-sampling and aggregating is also described and tested, extending
the use of these criteria to high-dimensional datasets and solving major
drawbacks inherent to standard EM and MV curves
A machine learning framework for gait classification using inertial sensors: Application to elderly, post-stroke and huntington’s disease patients
Machine learning methods have been widely used for gait assessment through the estimation of spatio-temporal parameters. As a further step, the objective of this work is to propose and validate a general probabilistic modeling approach for the classification of different pathological gaits. Specifically, the presented methodology was tested on gait data recorded on two pathological populations (Huntington’s disease and post-stroke subjects) and healthy elderly controls using data from inertial measurement units placed at shank and waist. By extracting features from group-specific Hidden Markov Models (HMMs) and signal information in time and frequency domain, a Support Vector Machines classifier (SVM) was designed and validated. The 90.5% of subjects was assigned to the right group after leave-one-subject-out cross validation and majority voting. The long-term goal we point to is the gait assessment in everyday life to early detect gait alterations
A chemometric survey about the ability of voltammetry to discriminate pharmaceutical products from the evolution of signals as a function of pH.
Many pharmaceutical products are electroactive and, therefore, can be determined by voltammetry. However, most of these substances produce signals in the same region of oxidative potentials, which makes it difficult to identify them. In this work, chemometric tools are applied to extract characteristic information not only from the peak potential of differential pulse voltammograms (DPV), but also from their evolution as a function of pH. The chemometric approach is based on principal component analysis (PCA), partial least squares discriminant analysis (PLS-DA) and support vector machine discriminant analysis (SVM-DA) yielding promising results for the future discrimination of pharmaceutical products in water samples
Optimal sensor placement for classifier-based leak localization in drinking water networks
© 2016 IEEE. Personal use of this material is permitted. Permission from IEEE must be obtained for all other uses, in any current or future media, including reprinting/republishing this material for advertising or promotional purposes, creating new collective works, for resale or redistribution to servers or lists, or reuse of any copyrighted component of this work in other works.This paper presents a sensor placement method for classifier-based leak localization in Water Distribution Networks. The proposed approach consists in applying a Genetic Algorithm to decide the sensors to be used by a classifier (based on the k-Nearest Neighbor approach). The sensors are placed in an optimal way maximizing the accuracy of the leak localization. The results are illustrated by means of the application to the Hanoi District Metered Area and they are compared to the ones obtained by the Exhaustive Search Algorithm. A comparison with the results of a previous optimal sensor placement method is provided as well.Postprint (author's final draft
Real-time human ambulation, activity, and physiological monitoring:taxonomy of issues, techniques, applications, challenges and limitations
Automated methods of real-time, unobtrusive, human ambulation, activity, and wellness monitoring and data analysis using various algorithmic techniques have been subjects of intense research. The general aim is to devise effective means of addressing the demands of assisted living, rehabilitation, and clinical observation and assessment through sensor-based monitoring. The research studies have resulted in a large amount of literature. This paper presents a holistic articulation of the research studies and offers comprehensive insights along four main axes: distribution of existing studies; monitoring device framework and sensor types; data collection, processing and analysis; and applications, limitations and challenges. The aim is to present a systematic and most complete study of literature in the area in order to identify research gaps and prioritize future research directions
Robust asset allocation under model ambiguity
A decision maker, when facing a decision problem, often considers
several models to represent the outcomes of the decision variable considered.
More often than not, the decision maker does not trust fully
any of those models and hence displays ambiguity or model uncertainty
aversion.
In this PhD thesis, focus is given to the specific case of asset allocation
problem under ambiguity faced by financial investors. The aim is not
to find an optimal solution for the investor, but rather come up with
a general methodology that can be applied in particular to the asset
allocation problem and allows the investor to find a tractable, easy to
compute solution for this problem, taking into account ambiguity.
This PhD thesis is structured as follows: First, some classical and
widely used models to represent asset returns are presented. It is
shown that the performance of the asset portfolios built using those
single models is very volatile. No model performs better than the
others consistently over the period considered, which gives empirical
evidence that: no model can be fully trusted over the long run and
that several models are needed to achieve the best asset allocation
possible. Therefore, the classical portfolio theory must be adapted
to take into account ambiguity or model uncertainty. Many authors
have in an early stage attempted to include ambiguity aversion in
the asset allocation problem. A review of the literature is studied
to outline the main models proposed. However, those models often
lack
flexibility and tractability. The search for an optimal solution
to the asset allocation problem when considering ambiguity aversion
is often difficult to apply in practice on large dimension problems,
as the ones faced by modern financial investors. This constitutes
the motivation to put forward a novel methodology easily applicable,
robust,
flexible and tractable. The Ambiguity Robust Adjustment
(ARA) methodology is theoretically presented and then tested on a
large empirical data set. Several forms of the ARA are considered and
tested. Empirical evidence demonstrates that the ARA methodology
improves portfolio performances greatly.
Through the specific illustration of the asset allocation problem in
finance, this PhD thesis proposes a new general methodology that will
hopefully help decision makers to solve numerous different problems
under ambiguity
- …