1,337 research outputs found

    An improved EEG pattern classification system based on dimensionality reduction and classifier fusion

    Full text link
    University of Technology, Sydney. Faculty of Engineering and Information Technology.Analysis of brain electrical activities (Electroencephalography, EEG) presents a rich source of information that helps in the advancement of affordable and effective biomedical applications such as psychotropic drug research, sleep studies, seizure detection and brain computer interface (BCI). Interpretation and understanding of EEG signal will provide clinicians and physicians with useful information for disease diagnosis and monitoring biological activities. It will also help in creating a new way of communication through brain waves. This thesis aims to investigate new algorithms for improving pattern recognition systems in two main EEG-based applications. The first application represents a simple Brain Computer Interface (BCI) based on imagined motor tasks, whilst the second one represents an automatic sleep scoring system in intensive care unit. BCI system in general aims to create a lion-muscular link between brain and external devices, thus providing a new control scheme that can most benefit the extremely immobilised persons. This link is created by utilizing pattern recognition approach to interpret EEG into device commands. The commands can then be used to control wheelchairs, computers or any other equipment. The second application relates to creating an automatic scoring system through interpreting certain properties of several biomedical signals. Traditionally, sleep specialists record and analyse brain signal using electroencephalogram (EEG), muscle tone (EMG), eye movement (EOG), and other biomedical signals to detect five sleep stages: Rapid Eye Movement (REM), stage 1,... to stage 4. Acquired signals are then scored based on 30 seconds intervals that require manually inspecting one segment at a time for certain properties to interpret sleep stages. The process is time consuming and demands competence. It is thought that an automatic scoring system mimicking sleep expert rules will speed up the process and reduce the cost. Practicality of any EEG-based system depends upon accuracy and speed. The more accurate and faster classification systems are, the better will be the chance to integrate them in wider range of applications. Thus, the performance of the previous systems is further enhanced using improved feature selection, projection and classification algorithms. As processing EEG signals requires dealing with multi-dimensional data, there is a need to minimize the dimensionality in order to achieve acceptable performance with less computational cost. The first possible candidate for dimensionality reduction is employed using channel feature selection approach. Four novel feature selection methods are developed utilizing genetic algorithms, ant colony, particle swarm and differential evolution optimization. The methods provide fast and accurate implementation in selecting the most informative features/channels that best represent mental tasks. Thus, computational burden of the classifier is kept as light as possible by removing irrelevant and highly redundant features. As an alternative to dimensionality reduction approach, a novel feature projection method is also introduced. The method maps the original feature set into a small informative subset of features that can best discriminate between the different class. Unlike most existing methods based on discriminant analysis, the proposed method considers fuzzy nature of input measurements in discovering the local manifold structure. It is able to find a projection that can maximize the margin between data points from different classes at each local area while considering the fuzzy nature. In classification phase, a number of improvements to traditional nearest neighbour classifier (kNN) are introduced. The improvements address kNN weighting scheme limitations. The traditional kNN does not take into account class distribution, importance of each feature, contribution of each neighbour, and the number of instances for each class. The proposed kNN variants are based on improved distance measure and weight optimization using differential evolution. Differential evolution optimizer is utilized to enhance kNN performance through optimizing the metric weights of features, neighbours and classes. Additionally, a Fuzzy kNN variant has also been developed to favour classification of certain classes. This variant may find use in medical examination. An alternative classifier fusion method is introduced that aims to create a set of diverse neural network ensemble. The diversity is enhanced by altering the target output of each network to create a certain amount of bias towards each class. This enables the construction of a set of neural network classifiers that complement each other

    Classifiers accuracy improvement based on missing data imputation

    Get PDF
    In this paper we investigate further and extend our previous work on radar signal identification and classification based on a data set which comprises continuous, discrete and categorical data that represent radar pulse train characteristics such as signal frequencies, pulse repetition, type of modulation, intervals, scan period, scanning type, etc. As the most of the real world datasets, it also contains high percentage of missing values and to deal with this problem we investigate three imputation techniques: Multiple Imputation (MI); K-Nearest Neighbour Imputation (KNNI); and Bagged Tree Imputation (BTI). We apply these methods to data samples with up to 60% missingness, this way doubling the number of instances with complete values in the resulting dataset. The imputation models performance is assessed with Wilcoxon’s test for statistical significance and Cohen’s effect size metrics. To solve the classification task, we employ three intelligent approaches: Neural Networks (NN); Support Vector Machines (SVM); and Random Forests (RF). Subsequently, we critically analyse which imputation method influences most the classifiers’ performance, using a multiclass classification accuracy metric, based on the area under the ROC curves. We consider two superclasses (‘military’ and ‘civil’), each containing several ‘subclasses’, and introduce and propose two new metrics: inner class accuracy (IA); and outer class accuracy (OA), in addition to the overall classification accuracy (OCA) metric. We conclude that they can be used as complementary to the OCA when choosing the best classifier for the problem at hand

    Nonparametric Transient Classification using Adaptive Wavelets

    Full text link
    Classifying transients based on multi band light curves is a challenging but crucial problem in the era of GAIA and LSST since the sheer volume of transients will make spectroscopic classification unfeasible. Here we present a nonparametric classifier that uses the transient's light curve measurements to predict its class given training data. It implements two novel components: the first is the use of the BAGIDIS wavelet methodology - a characterization of functional data using hierarchical wavelet coefficients. The second novelty is the introduction of a ranked probability classifier on the wavelet coefficients that handles both the heteroscedasticity of the data in addition to the potential non-representativity of the training set. The ranked classifier is simple and quick to implement while a major advantage of the BAGIDIS wavelets is that they are translation invariant, hence they do not need the light curves to be aligned to extract features. Further, BAGIDIS is nonparametric so it can be used for blind searches for new objects. We demonstrate the effectiveness of our ranked wavelet classifier against the well-tested Supernova Photometric Classification Challenge dataset in which the challenge is to correctly classify light curves as Type Ia or non-Ia supernovae. We train our ranked probability classifier on the spectroscopically-confirmed subsample (which is not representative) and show that it gives good results for all supernova with observed light curve timespans greater than 100 days (roughly 55% of the dataset). For such data, we obtain a Ia efficiency of 80.5% and a purity of 82.4% yielding a highly competitive score of 0.49 whilst implementing a truly "model-blind" approach to supernova classification. Consequently this approach may be particularly suitable for the classification of astronomical transients in the era of large synoptic sky surveys.Comment: 14 pages, 8 figures. Published in MNRA

    Training and assessing classification rules with unbalanced data

    Get PDF
    The problem of modeling binary responses by using cross-sectional data has been addressed with a number of satisfying solutions that draw on both parametric and nonparametric methods. However, there exist many real situations where one of the two responses (usually the most interesting for the analysis) is rare. It has been largely reported that this class imbalance heavily compromises the process of learning, because the model tends to focus on the prevalent class and to ignore the rare events. However, not only the estimation of the classification model is affected by a skewed distribution of the classes, but also the evaluation of its accuracy is jeopardized, because the scarcity of data leads to poor estimates of the model’s accuracy. In this work, the effects of class imbalance on model training and model assessing are discussed. Moreover, a unified and systematic framework for dealing with both the problems is proposed, based on a smoothed bootstrap re-sampling technique

    On the suitability of resampling techniques for the class imbalance problem in credit scoring

    Get PDF
    In real-life credit scoring applications, the case in which the class of defaulters is under-represented in comparison with the class of non-defaulters is a very common situation, but it has still received little attention. The present paper investigates the suitability and performance of several resampling techniques when applied in conjunction with statistical and artificial intelligence prediction models over five real-world credit data sets, which have artificially been modified to derive different imbalance ratios (proportion of defaulters and non-defaulters examples). Experimental results demonstrate that the use of resampling methods consistently improves the performance given by the original imbalanced data. Besides, it is also important to note that in general, over-sampling techniques perform better than any under-sampling approach.This work has partially been supported by the Spanish Ministry of Education and Science under grant TIN2009– 14205 and the Generalitat Valenciana under grant PROMETEO/2010/ 028

    Statistical methods for the detection of non-technical losses: a case study for the Nelson Mandela Bay Municipality

    Get PDF
    Electricity is one of the most stolen commodities in the world. Electricity theft can be defined as the criminal act of stealing electrical power. Several types of electricity theft exist, including illegal connections and bypassing and tampering with energy meters. The negative financial impacts, due to lost revenue, of electricity theft are far reaching and affect both developing and developed countries. . Here in South Africa, Eskom loses over R2 Billion annually due to electricity theft. Data mining and nonparametric statistical methods have been used to detect fraudulent usage of electricity by assessing abnormalities and abrupt changes in kilowatt hour (kWh) consumption patterns. Identifying effective measures to detect fraudulent electricity usage is an active area of research in the electrical domain. In this study, Support Vector Machines (SVM), Naïve Bayes (NB) and k-Nearest Neighbour (KNN) algorithms were used to design and propose an electricity fraud detection model. Using the Nelson Mandela Bay Municipality as a case study, three classifiers were built with SVM, NB and KNN algorithms. The performance of these classifiers were evaluated and compared

    Improving binary classification using filtering based on k-NN proximity graphs

    Get PDF
    © 2020, The Author(s). One of the ways of increasing recognition ability in classification problem is removing outlier entries as well as redundant and unnecessary features from training set. Filtering and feature selection can have large impact on classifier accuracy and area under the curve (AUC), as noisy data can confuse classifier and lead it to catch wrong patterns in training data. The common approach in data filtering is using proximity graphs. However, the problem of the optimal filtering parameters selection is still insufficiently researched. In this paper filtering procedure based on k-nearest neighbours proximity graph was used. Filtering parameters selection was adopted as the solution of outlier minimization problem: k-NN proximity graph, power of distance and threshold parameters are selected in order to minimize outlier percentage in training data. Then performance of six commonly used classifiers (Logistic Regression, Naïve Bayes, Neural Network, Random Forest, Support Vector Machine and Decision Tree) and one heterogeneous classifiers combiner (DES-LA) are compared with and without filtering. Dynamic ensemble selection (DES) systems work by estimating the level of competence of each classifier from a pool of classifiers. Only the most competent ones are selected to classify a given test sample. This is achieved by defining a criterion to measure the level of competence of base classifiers, such as, its accuracy in local regions of the feature space around the query instance. In our case the combiner is based on the local accuracy of single classifiers and its output is a linear combination of single classifiers ranking. As results of filtering, accuracy of DES-LA combiner shows big increase for low-accuracy datasets. But filtering doesn’t have sufficient impact on DES-LA performance while working with high-accuracy datasets. The results are discussed, and classifiers, which performance was highly affected by pre-processing filtering step, are defined. The main contribution of the paper is introducing modifications to the DES-LA combiner, as well as comparative analysis of filtering impact on the classifiers of various type. Testing the filtering algorithm on real case dataset (Taiwan default credit card dataset) confirmed the efficiency of automatic filtering approach
    • …
    corecore