1,600 research outputs found

    Calibration of One-Class SVM for MV set estimation

    Full text link
    A general approach for anomaly detection or novelty detection consists in estimating high density regions or Minimum Volume (MV) sets. The One-Class Support Vector Machine (OCSVM) is a state-of-the-art algorithm for estimating such regions from high dimensional data. Yet it suffers from practical limitations. When applied to a limited number of samples it can lead to poor performance even when picking the best hyperparameters. Moreover the solution of OCSVM is very sensitive to the selection of hyperparameters which makes it hard to optimize in an unsupervised setting. We present a new approach to estimate MV sets using the OCSVM with a different choice of the parameter controlling the proportion of outliers. The solution function of the OCSVM is learnt on a training set and the desired probability mass is obtained by adjusting the offset on a test set to prevent overfitting. Models learnt on different train/test splits are then aggregated to reduce the variance induced by such random splits. Our approach makes it possible to tune the hyperparameters automatically and obtain nested set estimates. Experimental results show that our approach outperforms the standard OCSVM formulation while suffering less from the curse of dimensionality than kernel density estimates. Results on actual data sets are also presented.Comment: IEEE DSAA' 2015, Oct 2015, Paris, Franc

    How to Evaluate the Quality of Unsupervised Anomaly Detection Algorithms?

    Full text link
    When sufficient labeled data are available, classical criteria based on Receiver Operating Characteristic (ROC) or Precision-Recall (PR) curves can be used to compare the performance of un-supervised anomaly detection algorithms. However , in many situations, few or no data are labeled. This calls for alternative criteria one can compute on non-labeled data. In this paper, two criteria that do not require labels are empirically shown to discriminate accurately (w.r.t. ROC or PR based criteria) between algorithms. These criteria are based on existing Excess-Mass (EM) and Mass-Volume (MV) curves, which generally cannot be well estimated in large dimension. A methodology based on feature sub-sampling and aggregating is also described and tested, extending the use of these criteria to high-dimensional datasets and solving major drawbacks inherent to standard EM and MV curves

    A machine learning framework for gait classification using inertial sensors: Application to elderly, post-stroke and huntington’s disease patients

    Get PDF
    Machine learning methods have been widely used for gait assessment through the estimation of spatio-temporal parameters. As a further step, the objective of this work is to propose and validate a general probabilistic modeling approach for the classification of different pathological gaits. Specifically, the presented methodology was tested on gait data recorded on two pathological populations (Huntington’s disease and post-stroke subjects) and healthy elderly controls using data from inertial measurement units placed at shank and waist. By extracting features from group-specific Hidden Markov Models (HMMs) and signal information in time and frequency domain, a Support Vector Machines classifier (SVM) was designed and validated. The 90.5% of subjects was assigned to the right group after leave-one-subject-out cross validation and majority voting. The long-term goal we point to is the gait assessment in everyday life to early detect gait alterations

    A chemometric survey about the ability of voltammetry to discriminate pharmaceutical products from the evolution of signals as a function of pH.

    Get PDF
    Many pharmaceutical products are electroactive and, therefore, can be determined by voltammetry. However, most of these substances produce signals in the same region of oxidative potentials, which makes it difficult to identify them. In this work, chemometric tools are applied to extract characteristic information not only from the peak potential of differential pulse voltammograms (DPV), but also from their evolution as a function of pH. The chemometric approach is based on principal component analysis (PCA), partial least squares discriminant analysis (PLS-DA) and support vector machine discriminant analysis (SVM-DA) yielding promising results for the future discrimination of pharmaceutical products in water samples

    Optimal sensor placement for classifier-based leak localization in drinking water networks

    Get PDF
    © 2016 IEEE. Personal use of this material is permitted. Permission from IEEE must be obtained for all other uses, in any current or future media, including reprinting/republishing this material for advertising or promotional purposes, creating new collective works, for resale or redistribution to servers or lists, or reuse of any copyrighted component of this work in other works.This paper presents a sensor placement method for classifier-based leak localization in Water Distribution Networks. The proposed approach consists in applying a Genetic Algorithm to decide the sensors to be used by a classifier (based on the k-Nearest Neighbor approach). The sensors are placed in an optimal way maximizing the accuracy of the leak localization. The results are illustrated by means of the application to the Hanoi District Metered Area and they are compared to the ones obtained by the Exhaustive Search Algorithm. A comparison with the results of a previous optimal sensor placement method is provided as well.Postprint (author's final draft

    Real-time human ambulation, activity, and physiological monitoring:taxonomy of issues, techniques, applications, challenges and limitations

    Get PDF
    Automated methods of real-time, unobtrusive, human ambulation, activity, and wellness monitoring and data analysis using various algorithmic techniques have been subjects of intense research. The general aim is to devise effective means of addressing the demands of assisted living, rehabilitation, and clinical observation and assessment through sensor-based monitoring. The research studies have resulted in a large amount of literature. This paper presents a holistic articulation of the research studies and offers comprehensive insights along four main axes: distribution of existing studies; monitoring device framework and sensor types; data collection, processing and analysis; and applications, limitations and challenges. The aim is to present a systematic and most complete study of literature in the area in order to identify research gaps and prioritize future research directions

    Robust asset allocation under model ambiguity

    Get PDF
    A decision maker, when facing a decision problem, often considers several models to represent the outcomes of the decision variable considered. More often than not, the decision maker does not trust fully any of those models and hence displays ambiguity or model uncertainty aversion. In this PhD thesis, focus is given to the specific case of asset allocation problem under ambiguity faced by financial investors. The aim is not to find an optimal solution for the investor, but rather come up with a general methodology that can be applied in particular to the asset allocation problem and allows the investor to find a tractable, easy to compute solution for this problem, taking into account ambiguity. This PhD thesis is structured as follows: First, some classical and widely used models to represent asset returns are presented. It is shown that the performance of the asset portfolios built using those single models is very volatile. No model performs better than the others consistently over the period considered, which gives empirical evidence that: no model can be fully trusted over the long run and that several models are needed to achieve the best asset allocation possible. Therefore, the classical portfolio theory must be adapted to take into account ambiguity or model uncertainty. Many authors have in an early stage attempted to include ambiguity aversion in the asset allocation problem. A review of the literature is studied to outline the main models proposed. However, those models often lack flexibility and tractability. The search for an optimal solution to the asset allocation problem when considering ambiguity aversion is often difficult to apply in practice on large dimension problems, as the ones faced by modern financial investors. This constitutes the motivation to put forward a novel methodology easily applicable, robust, flexible and tractable. The Ambiguity Robust Adjustment (ARA) methodology is theoretically presented and then tested on a large empirical data set. Several forms of the ARA are considered and tested. Empirical evidence demonstrates that the ARA methodology improves portfolio performances greatly. Through the specific illustration of the asset allocation problem in finance, this PhD thesis proposes a new general methodology that will hopefully help decision makers to solve numerous different problems under ambiguity
    • …
    corecore