422,250 research outputs found

    Geometric margin domain description with instance-specific margins

    Get PDF
    Support vector domain description (SVDD) is a useful tool in data mining, used for analysing the within-class distribution of multi-class data and to ascertain membership of a class with known training distribution. An important property of the method is its inner-product based formulation, resulting in its applicability to reproductive kernel Hilbert spaces using the “kernel trick”. This practice relies on full knowledge of feature values in the training set, requiring data exhibiting incompleteness to be pre-processed via imputation, sometimes adding unnecessary or incorrect data into the classifier. Based on an existing study of support vector machine (SVM) classification with structurally missing data, we present a method of domain description of incomplete data without imputation, and generalise to some times of kernel space. We review statistical techniques of dealing with missing data, and explore the properties and limitations of the SVM procedure. We present two methods to achieve this aim: the first provides an input space solution, and the second uses a given imputation of a dataset to calculate an improved solution. We apply our methods first to synthetic and commonly-used datasets, then to non-destructive assay (NDA) data provided by a third party. We compare our classification machines to the use of a standard SVDD boundary, and highlight where performance improves upon the use of imputation

    Newton Method-based Subspace Support Vector Data Description

    Full text link
    In this paper, we present an adaptation of Newton's method for the optimization of Subspace Support Vector Data Description (S-SVDD). The objective of S-SVDD is to map the original data to a subspace optimized for one-class classification, and the iterative optimization process of data mapping and description in S-SVDD relies on gradient descent. However, gradient descent only utilizes first-order information, which may lead to suboptimal results. To address this limitation, we leverage Newton's method to enhance data mapping and data description for an improved optimization of subspace learning-based one-class classification. By incorporating this auxiliary information, Newton's method offers a more efficient strategy for subspace learning in one-class classification as compared to gradient-based optimization. The paper discusses the limitations of gradient descent and the advantages of using Newton's method in subspace learning for one-class classification tasks. We provide both linear and nonlinear formulations of Newton's method-based optimization for S-SVDD. In our experiments, we explored both the minimization and maximization strategies of the objective. The results demonstrate that the proposed optimization strategy outperforms the gradient-based S-SVDD in most cases.Comment: 8 pages, 2 figures, 2 tables, 1 Algorithm. Accepted at IEEE Symposium Series on Computational Intelligence 202

    Exploration of a Polarized Surface Bidirectional Reflectance Model Using the Ground-Based Multiangle Spectropolarimetric Imager

    Get PDF
    Accurate characterization of surface reflection is essential for retrieval of aerosols using downward-looking remote sensors. In this paper, observations from the Ground-based Multiangle SpectroPolarimetric Imager (GroundMSPI) are used to evaluate a surface polarized bidirectional reflectance distribution function (PBRDF) model. GroundMSPI is an eight-band spectropolarimetric camera mounted on a rotating gimbal to acquire pushbroom imagery of outdoor landscapes. The camera uses a very accurate photoelastic-modulator-based polarimetric imaging technique to acquire Stokes vector measurements in three of the instrument's bands (470, 660, and 865 nm). A description of the instrument is presented, and observations of selected targets within a scene acquired on 6 January 2010 are analyzed. Data collected during the course of the day as the Sun moved across the sky provided a range of illumination geometries that facilitated evaluation of the surface model, which is comprised of a volumetric reflection term represented by the modified Rahman-Pinty-Verstraete function plus a specular reflection term generated by a randomly oriented array of Fresnel-reflecting microfacets. While the model is fairly successful in predicting the polarized reflection from two grass targets in the scene, it does a poorer job for two manmade targets (a parking lot and a truck roof), possibly due to their greater degree of geometric organization. Several empirical adjustments to the model are explored and lead to improved fits to the data. For all targets, the data support the notion of spectral invariance in the angular shape of the unpolarized and polarized surface reflection. As noted by others, this behavior provides valuable constraints on the aerosol retrieval problem, and highlights the importance of multiangle observations.NASAJPLCenter for Space Researc

    Star-galaxy separation strategies for WISE-2MASS all-sky infrared galaxy catalogs

    Full text link
    We combine photometric information of the WISE and 2MASS all-sky infrared databases, and demonstrate how to produce clean and complete galaxy catalogs for future analyses. Adding 2MASS colors to WISE photometry improves star-galaxy separation efficiency substantially at the expense of loosing a small fraction of the galaxies. We find that 93% of the WISE objects within W1<15.2 mag have a 2MASS match, and that a class of supervised machine learning algorithms, Support Vector Machines (SVM), are efficient classifiers of objects in our multicolor data set. We constructed a training set from the SDSS PhotoObj table with known star-galaxy separation, and determined redshift distribution of our sample from the GAMA spectroscopic survey. Varying the combination of photometric parameters input into our algorithm we show that W1 - J is a simple and effective star-galaxy separator, capable of producing results comparable to the multi-dimensional SVM classification. We present a detailed description of our star-galaxy separation methods, and characterize the robustness of our tools in terms of contamination, completeness, and accuracy. We explore systematics of the full sky WISE-2MASS galaxy map, such as contamination from Moon glow. We show that the homogeneity of the full sky galaxy map is improved by an additional J<16.5 mag flux limit. The all-sky galaxy catalog we present in this paper covers 21,200 sq. degrees with dusty regions masked out, and has an estimated stellar contamination of 1.2% and completeness of 70.1% among 2.4 million galaxies with zmed=0.14z_{med}= 0.14. WISE-2MASS galaxy maps with well controlled stellar contamination will be useful for spatial statistical analyses, including cross correlations with other cosmological random fields, such as the Cosmic Microwave Background. The same techniques also yield a statistically controlled sample of stars as well.Comment: 10 pages, 11 figures. Accepted for publication in MNRA

    Fault detection in operating helicopter drive train components based on support vector data description

    Get PDF
    The objective of the paper is to develop a vibration-based automated procedure dealing with early detection of mechanical degradation of helicopter drive train components using Health and Usage Monitoring Systems (HUMS) data. An anomaly-detection method devoted to the quantification of the degree of deviation of the mechanical state of a component from its nominal condition is developed. This method is based on an Anomaly Score (AS) formed by a combination of a set of statistical features correlated with specific damages, also known as Condition Indicators (CI), thus the operational variability is implicitly included in the model through the CI correlation. The problem of fault detection is then recast as a one-class classification problem in the space spanned by a set of CI, with the aim of a global differentiation between normal and anomalous observations, respectively related to healthy and supposedly faulty components. In this paper, a procedure based on an efficient one-class classification method that does not require any assumption on the data distribution, is used. The core of such an approach is the Support Vector Data Description (SVDD), that allows an efficient data description without the need of a significant amount of statistical data. Several analyses have been carried out in order to validate the proposed procedure, using flight vibration data collected from a H135, formerly known as EC135, servicing helicopter, for which micro-pitting damage on a gear was detected by HUMS and assessed through visual inspection. The capability of the proposed approach of providing better trade-off between false alarm rates and missed detection rates with respect to individual CI and to the AS obtained assuming jointly-Gaussian-distributed CI has been also analysed

    One-Class Classification: Taxonomy of Study and Review of Techniques

    Full text link
    One-class classification (OCC) algorithms aim to build classification models when the negative class is either absent, poorly sampled or not well defined. This unique situation constrains the learning of efficient classifiers by defining class boundary just with the knowledge of positive class. The OCC problem has been considered and applied under many research themes, such as outlier/novelty detection and concept learning. In this paper we present a unified view of the general problem of OCC by presenting a taxonomy of study for OCC problems, which is based on the availability of training data, algorithms used and the application domains applied. We further delve into each of the categories of the proposed taxonomy and present a comprehensive literature review of the OCC algorithms, techniques and methodologies with a focus on their significance, limitations and applications. We conclude our paper by discussing some open research problems in the field of OCC and present our vision for future research.Comment: 24 pages + 11 pages of references, 8 figure

    Conversion Prediction Using Multi-task Conditional Attention Networks to Support the Creation of Effective Ad Creative

    Full text link
    Accurately predicting conversions in advertisements is generally a challenging task, because such conversions do not occur frequently. In this paper, we propose a new framework to support creating high-performing ad creatives, including the accurate prediction of ad creative text conversions before delivering to the consumer. The proposed framework includes three key ideas: multi-task learning, conditional attention, and attention highlighting. Multi-task learning is an idea for improving the prediction accuracy of conversion, which predicts clicks and conversions simultaneously, to solve the difficulty of data imbalance. Furthermore, conditional attention focuses attention of each ad creative with the consideration of its genre and target gender, thus improving conversion prediction accuracy. Attention highlighting visualizes important words and/or phrases based on conditional attention. We evaluated the proposed framework with actual delivery history data (14,000 creatives displayed more than a certain number of times from Gunosy Inc.), and confirmed that these ideas improve the prediction performance of conversions, and visualize noteworthy words according to the creatives' attributes.Comment: 9 pages, 6 figures. Accepted at The 25th ACM SIGKDD Conference on Knowledge Discovery and Data Mining (KDD 2019) as an applied data science pape

    A new feature extraction approach based on non linear source separation

    Get PDF
    A new feature extraction approach is proposed in this paper to improve the classification performance in remotely sensed data. The proposed method is based on a primary sources subset (PSS) obtained by nonlinear transform that provides lower space for land pattern recognition. First, the underlying sources are approximated using multilayer neural networks. Given that, Bayesian inferences update unknown sources’ knowledge and model parameters with information’s data. Then, a source dimension minimizing technique is adopted to provide more efficient land cover description. The support vector machine (SVM) scheme is developed by using feature extraction. The experimental results on real multispectral imagery demonstrates that the proposed approach ensures efficient feature extraction by using several descriptors for texture identification and multiscale analysis. In a pixel based approach, the reduced PSS space improved the overall classification accuracy by 13% and reaches 82%. Using texture and multi resolution descriptors, the overall accuracy is 75.87% for the original observations, while using the reduced source space the overall accuracy reaches 81.67% when using jointly wavelet and Gabor transform and 86.67% when using Gabor transform. Thus, the source space enhanced the feature extraction process and allow more land use discrimination than the multispectral observations
    corecore