10,831 research outputs found

    Feature selection using misclassification counts

    Get PDF
    Dimensionality reduction of the problem space through detection and removal of variables, contributing little or not at all to classification, is able to relieve the computational load and instance acquisition effort, considering all the data attributes accessed each time around. The approach to feature selection in this paper is based on the concept of coherent accumulation of data about class centers with respect to coordinates of informative features. Ranking is done on the degree to which different variables exhibit random characteristics. The results are being verified using the Nearest Neighbor classifier. This also helps to address the feature irrelevance and redundancy, what ranking does not immediately decide. Additionally, feature ranking methods from different independent sources are called in for the direct comparison

    Using Empirical Recurrence Rates Ratio For Time Series Data Similarity

    Full text link
    Several methods exist in classification literature to quantify the similarity between two time series data sets. Applications of these methods range from the traditional Euclidean type metric to the more advanced Dynamic Time Warping metric. Most of these adequately address structural similarity but fail in meeting goals outside it. For example, a tool that could be excellent to identify the seasonal similarity between two time series vectors might prove inadequate in the presence of outliers. In this paper, we have proposed a unifying measure for binary classification that performed well while embracing several aspects of dissimilarity. This statistic is gaining prominence in various fields, such as geology and finance, and is crucial in time series database formation and clustering studies

    Highly comparative feature-based time-series classification

    Full text link
    A highly comparative, feature-based approach to time series classification is introduced that uses an extensive database of algorithms to extract thousands of interpretable features from time series. These features are derived from across the scientific time-series analysis literature, and include summaries of time series in terms of their correlation structure, distribution, entropy, stationarity, scaling properties, and fits to a range of time-series models. After computing thousands of features for each time series in a training set, those that are most informative of the class structure are selected using greedy forward feature selection with a linear classifier. The resulting feature-based classifiers automatically learn the differences between classes using a reduced number of time-series properties, and circumvent the need to calculate distances between time series. Representing time series in this way results in orders of magnitude of dimensionality reduction, allowing the method to perform well on very large datasets containing long time series or time series of different lengths. For many of the datasets studied, classification performance exceeded that of conventional instance-based classifiers, including one nearest neighbor classifiers using Euclidean distances and dynamic time warping and, most importantly, the features selected provide an understanding of the properties of the dataset, insight that can guide further scientific investigation

    Feature-Guided Black-Box Safety Testing of Deep Neural Networks

    Full text link
    Despite the improved accuracy of deep neural networks, the discovery of adversarial examples has raised serious safety concerns. Most existing approaches for crafting adversarial examples necessitate some knowledge (architecture, parameters, etc.) of the network at hand. In this paper, we focus on image classifiers and propose a feature-guided black-box approach to test the safety of deep neural networks that requires no such knowledge. Our algorithm employs object detection techniques such as SIFT (Scale Invariant Feature Transform) to extract features from an image. These features are converted into a mutable saliency distribution, where high probability is assigned to pixels that affect the composition of the image with respect to the human visual system. We formulate the crafting of adversarial examples as a two-player turn-based stochastic game, where the first player's objective is to minimise the distance to an adversarial example by manipulating the features, and the second player can be cooperative, adversarial, or random. We show that, theoretically, the two-player game can con- verge to the optimal strategy, and that the optimal strategy represents a globally minimal adversarial image. For Lipschitz networks, we also identify conditions that provide safety guarantees that no adversarial examples exist. Using Monte Carlo tree search we gradually explore the game state space to search for adversarial examples. Our experiments show that, despite the black-box setting, manipulations guided by a perception-based saliency distribution are competitive with state-of-the-art methods that rely on white-box saliency matrices or sophisticated optimization procedures. Finally, we show how our method can be used to evaluate robustness of neural networks in safety-critical applications such as traffic sign recognition in self-driving cars.Comment: 35 pages, 5 tables, 23 figure
    • …
    corecore