1,910 research outputs found

    Optimization of distributions differences for classification

    Full text link
    In this paper we introduce a new classification algorithm called Optimization of Distributions Differences (ODD). The algorithm aims to find a transformation from the feature space to a new space where the instances in the same class are as close as possible to one another while the gravity centers of these classes are as far as possible from one another. This aim is formulated as a multiobjective optimization problem that is solved by a hybrid of an evolutionary strategy and the Quasi-Newton method. The choice of the transformation function is flexible and could be any continuous space function. We experiment with a linear and a non-linear transformation in this paper. We show that the algorithm can outperform 6 other state-of-the-art classification methods, namely naive Bayes, support vector machines, linear discriminant analysis, multi-layer perceptrons, decision trees, and k-nearest neighbors, in 12 standard classification datasets. Our results show that the method is less sensitive to the imbalanced number of instances comparing to these methods. We also show that ODD maintains its performance better than other classification methods in these datasets, hence, offers a better generalization ability

    A survey of outlier detection methodologies

    Get PDF
    Outlier detection has been used for centuries to detect and, where appropriate, remove anomalous observations from data. Outliers arise due to mechanical faults, changes in system behaviour, fraudulent behaviour, human error, instrument error or simply through natural deviations in populations. Their detection can identify system faults and fraud before they escalate with potentially catastrophic consequences. It can identify errors and remove their contaminating effect on the data set and as such to purify the data for processing. The original outlier detection methods were arbitrary but now, principled and systematic techniques are used, drawn from the full gamut of Computer Science and Statistics. In this paper, we introduce a survey of contemporary techniques for outlier detection. We identify their respective motivations and distinguish their advantages and disadvantages in a comparative review

    A comprehensible SOM-based scoring system.

    Get PDF
    The significant growth of consumer credit has resulted in a wide range of statistical and non-statistical methods for classifying applicants in 'good' and 'bad' risk categories. Traditionally, (logistic) regression used to be one of the most popular methods for this task, but recently some newer techniques like neural networks and support vector machines have shown excellent classification performance. Self-organizing maps (SOMs) have existed for decades and although they have been used in various application areas, only little research has been done to investigate their appropriateness for credit scoring. In this paper, it is shown how a trained SOM can be used for classification and how the basic SOM-algorithm can be integrated with supervised techniques like the multi-layered perceptron. Classification accuracy of the models is benchmarked with results reported previously.Decision; Knowledge; Knowledge discovery; Systems; Growth; Credit; Methods; Risk; Regression; Neural networks; Networks; Classification; Performance; Area; Research; Credit scoring; Models; Model;

    Modeling, forecasting and trading the EUR exchange rates with hybrid rolling genetic algorithms: support vector regression forecast combinations

    Get PDF
    The motivation of this paper is to introduce a hybrid Rolling Genetic Algorithm-Support Vector Regression (RG-SVR) model for optimal parameter selection and feature subset combination. The algorithm is applied to the task of forecasting and trading the EUR/USD, EUR/GBP and EUR/JPY exchange rates. The proposed methodology genetically searches over a feature space (pool of individual forecasts) and then combines the optimal feature subsets (SVR forecast combinations) for each exchange rate. This is achieved by applying a fitness function specialized for financial purposes and adopting a sliding window approach. The individual forecasts are derived from several linear and non-linear models. RG-SVR is benchmarked against genetically and non-genetically optimized SVRs and SVMs models that are dominating the relevant literature, along with the robust ARBF-PSO neural network. The statistical and trading performance of all models is investigated during the period of 1999–2012. As it turns out, RG-SVR presents the best performance in terms of statistical accuracy and trading efficiency for all the exchange rates under study. This superiority confirms the success of the implemented fitness function and training procedure, while it validates the benefits of the proposed algorithm

    Deep learning for time series classification: a review

    Get PDF
    Time Series Classification (TSC) is an important and challenging problem in data mining. With the increase of time series data availability, hundreds of TSC algorithms have been proposed. Among these methods, only a few have considered Deep Neural Networks (DNNs) to perform this task. This is surprising as deep learning has seen very successful applications in the last years. DNNs have indeed revolutionized the field of computer vision especially with the advent of novel deeper architectures such as Residual and Convolutional Neural Networks. Apart from images, sequential data such as text and audio can also be processed with DNNs to reach state-of-the-art performance for document classification and speech recognition. In this article, we study the current state-of-the-art performance of deep learning algorithms for TSC by presenting an empirical study of the most recent DNN architectures for TSC. We give an overview of the most successful deep learning applications in various time series domains under a unified taxonomy of DNNs for TSC. We also provide an open source deep learning framework to the TSC community where we implemented each of the compared approaches and evaluated them on a univariate TSC benchmark (the UCR/UEA archive) and 12 multivariate time series datasets. By training 8,730 deep learning models on 97 time series datasets, we propose the most exhaustive study of DNNs for TSC to date.Comment: Accepted at Data Mining and Knowledge Discover

    Forecasting Government Bond Spreads with Heuristic Models:Evidence from the Eurozone Periphery

    Get PDF
    This study investigates the predictability of European long-term government bond spreads through the application of heuristic and metaheuristic support vector regression (SVR) hybrid structures. Genetic, krill herd and sine–cosine algorithms are applied to the parameterization process of the SVR and locally weighted SVR (LSVR) methods. The inputs of the SVR models are selected from a large pool of linear and non-linear individual predictors. The statistical performance of the main models is evaluated against a random walk, an Autoregressive Moving Average, the best individual prediction model and the traditional SVR and LSVR structures. All models are applied to forecast daily and weekly government bond spreads of Greece, Ireland, Italy, Portugal and Spain over the sample period 2000–2017. The results show that the sine–cosine LSVR is outperforming its counterparts in terms of statistical accuracy, while metaheuristic approaches seem to benefit the parameterization process more than the heuristic ones
    • 

    corecore