47 research outputs found

    Combining univariate approaches for ensemble change detection in multivariate data

    Get PDF
    Detecting change in multivariate data is a challenging problem, especially when class labels are not available. There is a large body of research on univariate change detection, notably in control charts developed originally for engineering applications. We evaluate univariate change detection approaches —including those in the MOA framework — built into ensembles where each member observes a feature in the input space of an unsupervised change detection problem. We present a comparison between the ensemble combinations and three established ‘pure’ multivariate approaches over 96 data sets, and a case study on the KDD Cup 1999 network intrusion detection dataset. We found that ensemble combination of univariate methods consistently outperformed multivariate methods on the four experimental metrics.project RPG-2015-188 funded by The Leverhulme Trust, UK; Spanish Ministry of Economy and Competitiveness through project TIN 2015-67534-P and the Spanish Ministry of Education, Culture and Sport through Mobility Grant PRX16/00495. The 96 datasets were originally curated for use in the work of Fernández-Delgado et al. [53] and accessed from the personal web page of the author5. The KDD Cup 1999 dataset used in the case study was accessed from the UCI Machine Learning Repository [10

    A Method for Automatic and Objective Scoring of Bradykinesia Using Orientation Sensors and Classification Algorithms

    Get PDF
    Correct assessment of bradykinesia is a key element in the diagnosis and monitoring of Parkinson's disease. Its evaluation is based on a careful assessment of symptoms and it is quantified using rating scales, where the Movement Disorders Society-Sponsored Revision of the Unified Parkinson's Disease Rating Scale (MDS-UPDRS) is the gold standard. Regardless of their importance, the bradykinesia-related items show low agreement between different evaluators. In this study, we design an applicable tool that provides an objective quantification of bradykinesia and that evaluates all characteristics described in the MDS-UPDRS. Twenty-five patients with Parkinson's disease performed three of the five bradykinesia-related items of the MDS-UPDRS. Their movements were assessed by four evaluators and were recorded with a nine degrees-of-freedom sensor. Sensor fusion was employed to obtain a 3-D representation of movements. Based on the resulting signals, a set of features related to the characteristics described in the MDS-UPDRS was defined. Feature selection methods were employed to determine the most important features to quantify bradykinesia. The features selected were used to train support vector machine classifiers to obtain an automatic score of the movements of each patient. The best results were obtained when seven features were included in the classifiers. The classification errors for finger tapping, diadochokinesis and toe tapping were 15-16.5%, 9.3-9.8%, and 18.2-20.2% smaller than the average interrater scoring error, respectively. The introduction of objective scoring in the assessment of bradykinesia might eliminate inconsistencies within evaluators and interrater assessment disagreements and might improve the monitoring of movement disorders

    The Unbalanced Classification Problem: Detecting Breaches in Security

    Get PDF
    This research proposes several methods designed to improve solutions for security classification problems. The security classification problem involves unbalanced, high-dimensional, binary classification problems that are prevalent today. The imbalance within this data involves a significant majority of the negative class and a minority positive class. Any system that needs protection from malicious activity, intruders, theft, or other types of breaches in security must address this problem. These breaches in security are considered instances of the positive class. Given numerical data that represent observations or instances which require classification, state of the art machine learning algorithms can be applied. However, the unbalanced and high-dimensional structure of the data must be considered prior to applying these learning methods. High-dimensional data poses a “curse of dimensionality” which can be overcome through the analysis of subspaces. Exploration of intelligent subspace modeling and the fusion of subspace models is proposed. Detailed analysis of the one-class support vector machine, as well as its weaknesses and proposals to overcome these shortcomings are included. A fundamental method for evaluation of the binary classification model is the receiver operating characteristic (ROC) curve and the area under the curve (AUC). This work details the underlying statistics involved with ROC curves, contributing a comprehensive review of ROC curve construction and analysis techniques to include a novel graphic for illustrating the connection between ROC curves and classifier decision values. The major innovations of this work include synergistic classifier fusion through the analysis of ROC curves and rankings, insight into the statistical behavior of the Gaussian kernel, and novel methods for applying machine learning techniques to defend against computer intrusion detection. The primary empirical vehicle for this research is computer intrusion detection data, and both host-based intrusion detection systems (HIDS) and network-based intrusion detection systems (NIDS) are addressed. Empirical studies also include military tactical scenarios

    Detecting error related negativity using EEG potentials generated during simulated brain computer interaction

    Get PDF
    2014 Summer.Includes bibliographical references.Error related negativity (ERN) is one of the components of the Event-Related Potential (ERP) observed during stimulus based tasks. In order to improve the performance of a brain computing interface (BCI) system, it is important to capture the ERN, classify the trials as correct or incorrect and feed this information back to the system. The objective of this study was to investigate techniques to detect presence of ERN in trials. In this thesis, features based on averaged ERP recordings were used to classify incorrect from correct actions. One feature selection technique coupled with four classification methods were used and compared in this work. Data were obtained from healthy subjects who performed an interaction experiment and the presence of ERN indicating incorrect responses was studied. Using suitable classifiers trained on data recorded earlier, the average recognition rate of correct and erroneous trials was reported and analyzed. The significance of selecting a subset of features to reduce the data dimensionality and to improve the classification performance was explored and discussed. We obtained success rates as high as 72% using a highly compact feature subset

    Short-term wind energy forecasting using support vector regression

    Get PDF
    Abstract Wind energy prediction has an important part to play in a smart energy grid for load balancing and capacity planning. In this paper we explore, if wind measurements based on the existing infrastructure of windmills in neighbored wind parks can be learned with a soft computing approach for wind energy prediction in the ten-minute to six-hour range. For this sake we employ Support Vector Regression (SVR) for time series forecasting, and run experimental analyses on real-world wind data from the NREL western wind resource dataset. In the experimental part of the paper we concentrate on loss function parameterization of SVR. We try to answer how far ahead a reliable wind forecast is possible, and how much information from the past is necessary. We demonstrate the capabilities of SVR-based wind energy forecast on the micro-scale level of one wind grid point, and on the larger scale of a whole wind park

    Method for solving nonlinearity in recognising tropical wood species

    Get PDF
    Classifying tropical wood species pose a considerable economic challenge and failure to classify the wood species accurately can have significant effects on timber industries. Hence, an automatic tropical wood species recognition system was developed at Centre for Artificial Intelligence and Robotics (CAIRO), Universiti Teknologi Malaysia. The system classifies wood species based on texture analysis whereby wood surface images are captured and wood features are extracted from these images which will be used for classification. Previous research on tropical wood species recognition systems considered methods for wood species classification based on linear features. Since wood species are known to exhibit nonlinear features, a Kernel-Genetic Algorithm (Kernel-GA) is proposed in this thesis to perform nonlinear feature selection. This method combines the Kernel Discriminant Analysis (KDA) technique with Genetic Algorithm (GA) to generate nonlinear wood features and also reduce dimension of the wood database. The proposed system achieved classification accuracy of 98.69%, showing marked improvement to the work done previously. Besides, a fuzzy logic-based pre-classifier is also proposed in this thesis to mimic human interpretation on wood pores which have been proven to aid the data acquisition bottleneck and serve as a clustering mechanism for large database simplifying the classification. The fuzzy logic-based pre-classifier managed to reduce the processing time for training and testing by more than 75% and 26% respectively. Finally, the fuzzy pre-classifier is combined with the Kernal-GA algorithm to improve the performance of the tropical wood species recognition system. The experimental results show that the combination of fuzzy preclassifier and nonlinear feature selection improves the performance of the tropical wood species recognition system in terms of memory space, processing time and classification accuracy

    Estimating Dependency, Monitoring and Knowledge Discovery in High-Dimensional Data Streams

    Get PDF
    Data Mining – known as the process of extracting knowledge from massive data sets – leads to phenomenal impacts on our society, and now affects nearly every aspect of our lives: from the layout in our local grocery store, to the ads and product recommendations we receive, the availability of treatments for common diseases, the prevention of crime, or the efficiency of industrial production processes. However, Data Mining remains difficult when (1) data is high-dimensional, i.e., has many attributes, and when (2) data comes as a stream. Extracting knowledge from high-dimensional data streams is impractical because one must cope with two orthogonal sets of challenges. On the one hand, the effects of the so-called "curse of dimensionality" bog down the performance of statistical methods and yield to increasingly complex Data Mining problems. On the other hand, the statistical properties of data streams may evolve in unexpected ways, a phenomenon known in the community as "concept drift". Thus, one needs to update their knowledge about data over time, i.e., to monitor the stream. While previous work addresses high-dimensional data sets and data streams to some extent, the intersection of both has received much less attention. Nevertheless, extracting knowledge in this setting is advantageous for many industrial applications: identifying patterns from high-dimensional data streams in real-time may lead to larger production volumes, or reduce operational costs. The goal of this dissertation is to bridge this gap. We first focus on dependency estimation, a fundamental task of Data Mining. Typically, one estimates dependency by quantifying the strength of statistical relationships. We identify the requirements for dependency estimation in high-dimensional data streams and propose a new estimation framework, Monte Carlo Dependency Estimation (MCDE), that fulfils them all. We show that MCDE leads to efficient dependency monitoring. Then, we generalise the task of monitoring by introducing the Scaling Multi-Armed Bandit (S-MAB) algorithms, extending the Multi-Armed Bandit (MAB) model. We show that our algorithms can efficiently monitor statistics by leveraging user-specific criteria. Finally, we describe applications of our contributions to Knowledge Discovery. We propose an algorithm, Streaming Greedy Maximum Random Deviation (SGMRD), which exploits our new methods to extract patterns, e.g., outliers, in high-dimensional data streams. Also, we present a new approach, that we name kj-Nearest Neighbours (kj-NN), to detect outlying documents within massive text corpora. We support our algorithmic contributions with theoretical guarantees, as well as extensive experiments against both synthetic and real-world data. We demonstrate the benefits of our methods against real-world use cases. Overall, this dissertation establishes fundamental tools for Knowledge Discovery in high-dimensional data streams, which help with many applications in the industry, e.g., anomaly detection, or predictive maintenance. To facilitate the application of our results and future research, we publicly release our implementations, experiments, and benchmark data via open-source platforms

    EEG-Controlling Robotic Car and Alphabetic Display by Support Vector Machine for Aiding Amyotrophic Lateral Sclerosis Patients

    Get PDF
    This thesis presents the design and experiment of a system that can detect the human thinking such as driving directions and letters using the brainwave signals known as electroencephalogram (EEG) and a machine learning algorithm called support vector machine (SVM). This research is motivated by amyotrophic lateral sclerosis (ALS) disease which makes patients seriously lose mobility and speaking capabilities. The developed system in this thesis has three main steps. First, wearing EPOC headset from Emotiv Company, a user can record the EEG signals when he/she is thinking a direction or a letter, and also save the data in a personal computer wirelessly. Next, a large amount of EEG data carrying the information of different directions and letters from this user are used to train SVM classification model exhaustively. Finally, the well-trained SVM model will be used to detect any new thought about directions and letters from the user. The detection results from the SVM model will be transmitted wirelessly to a robotic car with LCD display built with Arduino microcontrollers to control its motions as well as the alphabetic display on LCD. One of the great potential applications of the developed system is to make an advanced brain control wheel chair system with LCD display for aiding ALS patients with their mobility and daily communications
    corecore