1,621 research outputs found

    Subspace Support Vector Data Description and Extensions

    Get PDF
    Machine learning deals with discovering the knowledge that governs the learning process. The science of machine learning helps create techniques that enhance the capabilities of a system through the use of data. Typical machine learning techniques identify or predict different patterns in the data. In classification tasks, a machine learning model is trained using some training data to identify the unknown function that maps the input data to the output labels. The classification task gets challenging if the data from some categories are either unavailable or so diverse that they cannot be modelled statistically. For example, to train a model for anomaly detection, it is usually challenging to collect anomalous data for training, but the normal data is available in abundance. In such cases, it is possible to use One-Class Classification (OCC) techniques where the model is trained by using data only from one class. OCC algorithms are practical in situations where it is vital to identify one of the categories, but the examples from that specific category are scarce. Numerous OCC techniques have been proposed in the literature that model the data in the given feature space; however, such data can be high-dimensional or may not provide discriminative information for classification. In order to avoid the curse of dimensionality, standard dimensionality reduction techniques are commonly used as a preprocessing step in many machine learning algorithms. Principal Component Analysis (PCA) is an example of a widely used algorithm to transform data into a subspace suitable for the task at hand while maintaining the meaningful features of a given dataset. This thesis provides a new paradigm that jointly optimizes a subspace and data description for one-class classification via Support Vector Data Description (SVDD). We initiated the idea of subspace learning for one class classification by proposing a novel Subspace Support Vector Data Description (SSVDD) method, which was further extended to Ellipsoidal Subspace Support Vector Data Description (ESSVDD). ESSVDD generalizes SSVDD for a hypersphere by using ellipsoidal data description and it converges faster than SSVDD. It is important to train a joint model for multimodal data when data is collected from multiple sources. Therefore, we also proposed a multimodal approach, namely Multimodal Subspace Support Vector Data Description (MSSVDD) for transforming the data from multiple modalities to a common shared space for OCC. An important contribution of this thesis is to provide a framework unifying the subspace learning methods for SVDD. The proposed Graph-Embedded Subspace Support Vector Data Description (GESSVDD) framework helps revealing novel insights into the previously proposed methods and allows deriving novel variants that incorporate different optimization goals. The main focus of the thesis is on generic novel methods which can be adapted to different application domains. We experimented with standard datasets from different domains such as robotics, healthcare, and economics and achieved better performance than competing methods in most of the cases. We also proposed a taxa identification framework for rare benthic macroinvertebrates. Benthic macroinvertebrate taxa distribution is typically very imbalanced. The amounts of training images for the rarest classes are too low for properly training deep learning-based methods, while these rarest classes can be central in biodiversity monitoring. We show that the classic one-class classifiers in general, and the proposed methods in particular, can enhance a deep neural network classification performance for imbalanced datasets

    Exponentially convergent data assimilation algorithm for Navier-Stokes equations

    Full text link
    The paper presents a new state estimation algorithm for a bilinear equation representing the Fourier- Galerkin (FG) approximation of the Navier-Stokes (NS) equations on a torus in R2. This state equation is subject to uncertain but bounded noise in the input (Kolmogorov forcing) and initial conditions, and its output is incomplete and contains bounded noise. The algorithm designs a time-dependent gain such that the estimation error converges to zero exponentially. The sufficient condition for the existence of the gain are formulated in the form of algebraic Riccati equations. To demonstrate the results we apply the proposed algorithm to the reconstruction a chaotic fluid flow from incomplete and noisy data

    Kernel Ellipsoidal Trimming

    No full text
    Ellipsoid estimation is an issue of primary importance in many practical areas such as control, system identification, visual/audio tracking, experimental design, data mining, robust statistics and novelty/outlier detection. This paper presents a new method of kernel information matrix ellipsoid estimation (KIMEE) that finds an ellipsoid in a kernel defined feature space based on a centered information matrix. Although the method is very general and can be applied to many of the aforementioned problems, the main focus in this paper is the problem of novelty or outlier detection associated with fault detection. A simple iterative algorithm based on Titterington's minimum volume ellipsoid method is proposed for practical implementation. The KIMEE method demonstrates very good performance on a set of real-life and simulated datasets compared with support vector machine methods

    Credit Card Fraud Detection with Subspace Learning-based One-Class Classification

    Full text link
    In an increasingly digitalized commerce landscape, the proliferation of credit card fraud and the evolution of sophisticated fraudulent techniques have led to substantial financial losses. Automating credit card fraud detection is a viable way to accelerate detection, reducing response times and minimizing potential financial losses. However, addressing this challenge is complicated by the highly imbalanced nature of the datasets, where genuine transactions vastly outnumber fraudulent ones. Furthermore, the high number of dimensions within the feature set gives rise to the ``curse of dimensionality". In this paper, we investigate subspace learning-based approaches centered on One-Class Classification (OCC) algorithms, which excel in handling imbalanced data distributions and possess the capability to anticipate and counter the transactions carried out by yet-to-be-invented fraud techniques. The study highlights the potential of subspace learning-based OCC algorithms by investigating the limitations of current fraud detection strategies and the specific challenges of credit card fraud detection. These algorithms integrate subspace learning into the data description; hence, the models transform the data into a lower-dimensional subspace optimized for OCC. Through rigorous experimentation and analysis, the study validated that the proposed approach helps tackle the curse of dimensionality and the imbalanced nature of credit card data for automatic fraud detection to mitigate financial losses caused by fraudulent activities.Comment: 6 pages, 1 figure, 2 tables. Accepted at IEEE Symposium Series on Computational Intelligence 202

    Theory and Applications of Robust Optimization

    Full text link
    In this paper we survey the primary research, both theoretical and applied, in the area of Robust Optimization (RO). Our focus is on the computational attractiveness of RO approaches, as well as the modeling power and broad applicability of the methodology. In addition to surveying prominent theoretical results of RO, we also present some recent results linking RO to adaptable models for multi-stage decision-making problems. Finally, we highlight applications of RO across a wide spectrum of domains, including finance, statistics, learning, and various areas of engineering.Comment: 50 page
    • …
    corecore