1,337 research outputs found
An improved EEG pattern classification system based on dimensionality reduction and classifier fusion
University of Technology, Sydney. Faculty of Engineering and Information Technology.Analysis of brain electrical activities (Electroencephalography, EEG) presents a rich source of information that helps in the advancement of affordable and effective biomedical applications such as psychotropic drug research, sleep studies, seizure detection and brain computer interface (BCI). Interpretation and understanding of EEG signal will provide clinicians and physicians with useful information for disease diagnosis and monitoring biological activities. It will also help in creating a new way of communication through brain waves.
This thesis aims to investigate new algorithms for improving pattern recognition systems in two main EEG-based applications. The first application represents a simple Brain Computer Interface (BCI) based on imagined motor tasks, whilst the second one represents an automatic sleep scoring system in intensive care unit. BCI system in general aims to create a lion-muscular link between brain and external devices, thus providing a new control scheme that can most benefit the extremely immobilised persons. This link is created by utilizing pattern recognition approach to interpret EEG into device commands. The commands can then be used to control wheelchairs, computers or any other equipment. The second application relates to creating an automatic scoring system through interpreting certain properties of several biomedical signals. Traditionally, sleep specialists record and analyse brain signal using electroencephalogram (EEG), muscle tone (EMG), eye movement (EOG), and other biomedical signals to detect five sleep stages: Rapid Eye Movement (REM), stage 1,... to stage 4. Acquired signals are then scored based on 30 seconds intervals that require manually inspecting one segment at a time for certain properties to interpret sleep stages. The process is time consuming and demands competence. It is thought that an automatic scoring system mimicking sleep expert rules will speed up the process and reduce the cost.
Practicality of any EEG-based system depends upon accuracy and speed. The more accurate and faster classification systems are, the better will be the chance to integrate them in wider range of applications. Thus, the performance of the previous systems is further enhanced using improved feature selection, projection and classification algorithms.
As processing EEG signals requires dealing with multi-dimensional data, there is a need to minimize the dimensionality in order to achieve acceptable performance with less computational cost. The first possible candidate for dimensionality reduction is employed using channel feature selection approach. Four novel feature selection methods are developed utilizing genetic algorithms, ant colony, particle swarm and differential evolution optimization. The methods provide fast and accurate implementation in selecting the most informative features/channels that best represent mental tasks. Thus, computational burden of the classifier is kept as light as possible by removing irrelevant and highly redundant features.
As an alternative to dimensionality reduction approach, a novel feature projection method is also introduced. The method maps the original feature set into a small informative subset of features that can best discriminate between the different class. Unlike most existing methods based on discriminant analysis, the proposed method considers fuzzy nature of input measurements in discovering the local manifold structure. It is able to find a projection that can maximize the margin between data points from different classes at each local area while considering the fuzzy nature.
In classification phase, a number of improvements to traditional nearest neighbour classifier (kNN) are introduced. The improvements address kNN weighting scheme limitations. The traditional kNN does not take into account class distribution, importance of each feature, contribution of each neighbour, and the number of instances for each class. The proposed kNN variants are based on improved distance measure and weight optimization using differential evolution. Differential evolution optimizer is utilized to enhance kNN performance through optimizing the metric weights of features, neighbours and classes. Additionally, a Fuzzy kNN variant has also been developed to favour classification of certain classes. This variant may find use in medical examination. An alternative classifier fusion method is introduced that aims to create a set of diverse neural network ensemble. The diversity is enhanced by altering the target output of each network to create a certain amount of bias towards each class. This enables the construction of a set of neural network classifiers that complement each other
Classifiers accuracy improvement based on missing data imputation
In this paper we investigate further and extend our previous work on radar signal identification
and classification based on a data set which comprises continuous, discrete and
categorical data that represent radar pulse train characteristics such as signal frequencies,
pulse repetition, type of modulation, intervals, scan period, scanning type, etc. As the
most of the real world datasets, it also contains high percentage of missing values and
to deal with this problem we investigate three imputation techniques: Multiple Imputation
(MI); K-Nearest Neighbour Imputation (KNNI); and Bagged Tree Imputation (BTI).
We apply these methods to data samples with up to 60% missingness, this way doubling
the number of instances with complete values in the resulting dataset. The imputation
models performance is assessed with Wilcoxon’s test for statistical significance and Cohen’s
effect size metrics. To solve the classification task, we employ three intelligent approaches:
Neural Networks (NN); Support Vector Machines (SVM); and Random Forests
(RF). Subsequently, we critically analyse which imputation method influences most the
classifiers’ performance, using a multiclass classification accuracy metric, based on the
area under the ROC curves. We consider two superclasses (‘military’ and ‘civil’), each
containing several ‘subclasses’, and introduce and propose two new metrics: inner class
accuracy (IA); and outer class accuracy (OA), in addition to the overall classification accuracy
(OCA) metric. We conclude that they can be used as complementary to the OCA
when choosing the best classifier for the problem at hand
Nonparametric Transient Classification using Adaptive Wavelets
Classifying transients based on multi band light curves is a challenging but
crucial problem in the era of GAIA and LSST since the sheer volume of
transients will make spectroscopic classification unfeasible. Here we present a
nonparametric classifier that uses the transient's light curve measurements to
predict its class given training data. It implements two novel components: the
first is the use of the BAGIDIS wavelet methodology - a characterization of
functional data using hierarchical wavelet coefficients. The second novelty is
the introduction of a ranked probability classifier on the wavelet coefficients
that handles both the heteroscedasticity of the data in addition to the
potential non-representativity of the training set. The ranked classifier is
simple and quick to implement while a major advantage of the BAGIDIS wavelets
is that they are translation invariant, hence they do not need the light curves
to be aligned to extract features. Further, BAGIDIS is nonparametric so it can
be used for blind searches for new objects. We demonstrate the effectiveness of
our ranked wavelet classifier against the well-tested Supernova Photometric
Classification Challenge dataset in which the challenge is to correctly
classify light curves as Type Ia or non-Ia supernovae. We train our ranked
probability classifier on the spectroscopically-confirmed subsample (which is
not representative) and show that it gives good results for all supernova with
observed light curve timespans greater than 100 days (roughly 55% of the
dataset). For such data, we obtain a Ia efficiency of 80.5% and a purity of
82.4% yielding a highly competitive score of 0.49 whilst implementing a truly
"model-blind" approach to supernova classification. Consequently this approach
may be particularly suitable for the classification of astronomical transients
in the era of large synoptic sky surveys.Comment: 14 pages, 8 figures. Published in MNRA
Training and assessing classification rules with unbalanced data
The problem of modeling binary responses by using cross-sectional data has been addressed
with a number of satisfying solutions that draw on both parametric and nonparametric
methods. However, there exist many real situations where one of the two responses (usually
the most interesting for the analysis) is rare. It has been largely reported that this class
imbalance heavily compromises the process of learning, because the model tends to focus on
the prevalent class and to ignore the rare events. However, not only the estimation of the
classification model is affected by a skewed distribution of the classes, but also the evaluation
of its accuracy is jeopardized, because the scarcity of data leads to poor estimates of the
model’s accuracy.
In this work, the effects of class imbalance on model training and model assessing are
discussed. Moreover, a unified and systematic framework for dealing with both the problems is proposed, based on a smoothed bootstrap re-sampling technique
On the suitability of resampling techniques for the class imbalance problem in credit scoring
In real-life credit scoring applications, the case in which the class of defaulters is under-represented in comparison with the class of non-defaulters is a very common situation, but it has still received little attention. The present paper investigates the suitability and performance of several resampling techniques when applied in conjunction with statistical and artificial intelligence prediction models over five real-world credit data sets, which have artificially been modified to derive different imbalance ratios (proportion of defaulters and non-defaulters examples). Experimental results demonstrate that the use of resampling methods consistently improves the performance given by the original imbalanced data. Besides, it is also important to note that in general, over-sampling techniques perform better than any under-sampling approach.This work has partially been supported by the Spanish Ministry of Education and Science under grant TIN2009– 14205 and the Generalitat Valenciana under grant PROMETEO/2010/ 028
Statistical methods for the detection of non-technical losses: a case study for the Nelson Mandela Bay Municipality
Electricity is one of the most stolen commodities in the world. Electricity theft can be defined as the criminal act of stealing electrical power. Several types of electricity theft exist, including illegal connections and bypassing and tampering with energy meters. The negative financial impacts, due to lost revenue, of electricity theft are far reaching and affect both developing and developed countries. . Here in South Africa, Eskom loses over R2 Billion annually due to electricity theft. Data mining and nonparametric statistical methods have been used to detect fraudulent usage of electricity by assessing abnormalities and abrupt changes in kilowatt hour (kWh) consumption patterns. Identifying effective measures to detect fraudulent electricity usage is an active area of research in the electrical domain. In this study, Support Vector Machines (SVM), Naïve Bayes (NB) and k-Nearest Neighbour (KNN) algorithms were used to design and propose an electricity fraud detection model. Using the Nelson Mandela Bay Municipality as a case study, three classifiers were built with SVM, NB and KNN algorithms. The performance of these classifiers were evaluated and compared
Improving binary classification using filtering based on k-NN proximity graphs
© 2020, The Author(s). One of the ways of increasing recognition ability in classification problem is removing outlier entries as well as redundant and unnecessary features from training set. Filtering and feature selection can have large impact on classifier accuracy and area under the curve (AUC), as noisy data can confuse classifier and lead it to catch wrong patterns in training data. The common approach in data filtering is using proximity graphs. However, the problem of the optimal filtering parameters selection is still insufficiently researched. In this paper filtering procedure based on k-nearest neighbours proximity graph was used. Filtering parameters selection was adopted as the solution of outlier minimization problem: k-NN proximity graph, power of distance and threshold parameters are selected in order to minimize outlier percentage in training data. Then performance of six commonly used classifiers (Logistic Regression, Naïve Bayes, Neural Network, Random Forest, Support Vector Machine and Decision Tree) and one heterogeneous classifiers combiner (DES-LA) are compared with and without filtering. Dynamic ensemble selection (DES) systems work by estimating the level of competence of each classifier from a pool of classifiers. Only the most competent ones are selected to classify a given test sample. This is achieved by defining a criterion to measure the level of competence of base classifiers, such as, its accuracy in local regions of the feature space around the query instance. In our case the combiner is based on the local accuracy of single classifiers and its output is a linear combination of single classifiers ranking. As results of filtering, accuracy of DES-LA combiner shows big increase for low-accuracy datasets. But filtering doesn’t have sufficient impact on DES-LA performance while working with high-accuracy datasets. The results are discussed, and classifiers, which performance was highly affected by pre-processing filtering step, are defined. The main contribution of the paper is introducing modifications to the DES-LA combiner, as well as comparative analysis of filtering impact on the classifiers of various type. Testing the filtering algorithm on real case dataset (Taiwan default credit card dataset) confirmed the efficiency of automatic filtering approach
- …