Search CORE

526 research outputs found

Cost-Sensitive Learning-based Methods for Imbalanced Classification Problems with Applications

Author: Razzaghi Talayeh
Publication venue: 'Information Bulletin on Variable Stars (IBVS)'
Publication date: 01/01/2014
Field of study

Analysis and predictive modeling of massive datasets is an extremely significant problem that arises in many practical applications. The task of predictive modeling becomes even more challenging when data are imperfect or uncertain. The real data are frequently affected by outliers, uncertain labels, and uneven distribution of classes (imbalanced data). Such uncertainties create bias and make predictive modeling an even more difficult task. In the present work, we introduce a cost-sensitive learning method (CSL) to deal with the classification of imperfect data. Typically, most traditional approaches for classification demonstrate poor performance in an environment with imperfect data. We propose the use of CSL with Support Vector Machine, which is a well-known data mining algorithm. The results reveal that the proposed algorithm produces more accurate classifiers and is more robust with respect to imperfect data. Furthermore, we explore the best performance measures to tackle imperfect data along with addressing real problems in quality control and business analytics

University of Central Florida (UCF): STARS (Showcase of Text, Archives, Research & Scholarship)

Doctor of Philosophy

Author: Mohammadabadi Sayed Mehdi Sajjadi
Publication venue: University of Utah
Publication date: 01/01/2017
Field of study

dissertationThe goal of machine learning is to develop efficient algorithms that use training data to create models that generalize well to unseen data. Learning algorithms can use labeled data, unlabeled data or both. Supervised learning algorithms learn a model using labeled data only. Unsupervised learning methods learn the internal structure of a dataset using only unlabeled data. Lastly, semisupervised learning is the task of finding a model using both labeled and unlabeled data. In this research work, we contribute to both supervised and semisupervised learning. We contribute to supervised learning by proposing an efficient high-dimensional space coverage scheme which is based on the disjunctive normal form. We use conjunctions of a set of half-spaces to create a set of convex polytopes. Disjunction of these polytopes can provide desirable coverage of space. Unlike traditional methods based on neural networks, we do not initialize the model parameters randomly. As a result, our model minimizes the risk of poor local minima and higher learning rates can be used which leads to faster convergence. We contribute to semisupervised learning by proposing 2 unsupervised loss functions that form the basis of a novel semisupervised learning method. The first loss function is called Mutual-Exclusivity. The motivation of this method is the observation that an optimal decision boundary lies between the manifolds of different classes where there are no or very few samples. Decision boundaries can be pushed away from training samples by maximizing their margin and it is not necessary to know the class labels of the samples to maximize the margin. The second loss is named Transformation/Stability and is based on the fact that the prediction of a classifier for a data sample should not change with respect to transformations and perturbations applied to that data sample. In addition, internal variations of a learning system should have little to no effect on the output. The proposed loss minimizes the variation in the prediction of the network for a specific data sample. We also show that the same technique can be used to improve the robustness of a learning model with respect to adversarial examples

The University of Utah: J. Willard Marriott Digital Library

A Robust Algorithm to Detect Multiple Centrifugal Pump Faults with Corrupted Vibration and Current Signatures Using Continuous Wavelet Transform

Author: Rapur Janani Shruti
Tiwari Rajiv
Publication venue: Turbomachinery Laboratory, Texas A&M Engineering Experiment Station
Publication date: 01/01/2018
Field of study

LectureCentrifugal pumps are susceptible to seizures owing to reasons such as, fluid flow abnormalities and/or mechanical component failures. Consequently, it is crucial to recognize these faults and estimate their severity. The present work shows the development of a robust algorithm based on support vector machines (SVM) to classify multiple CP faults, such as suction and discharge blockages (with varying severities), impeller defects, pitted cover plate faults and dry runs using continuous wavelet transform (CWT) analysis. For the sake of classification, the CP vibration data and motor line-current data are generated for each of these faults experimentally. Furthermore, in an industrial setting, CP signatures are susceptible to noise corruption due to other operating equipment in the premises. Hence, to assess the versatility of the developed methodology, the generated experimental data is further corrupted with 5%, 10% and 25% additive white Gaussian noise and used to test the algorithm

Optimal set of EEG features for emotional state classification and trajectory visualization in Parkinson's disease

Author: Ibrahim Norlinah Mohamed
Mohamad Khairiyah
Murugappan Murugappan
Omar Mohd Iqbal
Palaniappan Ramaswamy
Sundaraj Kenneth
Yuvaraj Rajamanickam
Publication venue: 'Elsevier BV'
Publication date: 31/07/2014
Field of study

In addition to classic motor signs and symptoms, individuals with Parkinson's disease (PD) are characterized by emotional deficits. Ongoing brain activity can be recorded by electroencephalograph (EEG) to discover the links between emotional states and brain activity. This study utilized machine-learning algorithms to categorize emotional states in PD patients compared with healthy controls (HC) using EEG. Twenty non-demented PD patients and 20 healthy age-, gender-, and education level-matched controls viewed happiness, sadness, fear, anger, surprise, and disgust emotional stimuli while fourteen-channel EEG was being recorded. Multimodal stimulus (combination of audio and visual) was used to evoke the emotions. To classify the EEG-based emotional states and visualize the changes of emotional states over time, this paper compares four kinds of EEG features for emotional state classification and proposes an approach to track the trajectory of emotion changes with manifold learning. From the experimental results using our EEG data set, we found that (a) bispectrum feature is superior to other three kinds of features, namely power spectrum, wavelet packet and nonlinear dynamical analysis; (b) higher frequency bands (alpha, beta and gamma) play a more important role in emotion activities than lower frequency bands (delta and theta) in both groups and; (c) the trajectory of emotion changes can be visualized by reducing subject-independent features with manifold learning. This provides a promising way of implementing visualization of patient's emotional state in real time and leads to a practical system for noninvasive assessment of the emotional impairments associated with neurological disorders

Kent Academic Repository

Software design and optimization of ECG signal analysis and diagnosis for embedded IoT devices

Author: Azariadi Dimitra
Αζαριάδη Δήμητρα
Publication venue
Publication date: 17/05/2016
Field of study

DSpace at NTUA

Crowdsourcing traffic data for travel time estimation

Author: Gadde Chanukya Chowdary
Publication venue: The Research Repository @ WVU
Publication date: 01/05/2014
Field of study

Travel time estimation is a fundamental measure used in routing and navigation applications, in particular in emerging intelligent transportation systems (ITS). For example, many users may prefer the fastest route to their destination and would rely on real-time predicted travel times. It also helps real-time traffic management and traffic light control. Accurate estimation of travel time requires collecting a lot of real-time data from road networks. This data can be collected using a wide variety of sources like inductive loop detectors, video cameras, radio frequency identification (RFID) transponders etc. But these systems include deployment of infrastructure which has some limitations and drawbacks. The main drawbacks in these modes are the high cost and the high probability of error caused by prevalence of equipment malfunctions and in the case of sensor based methods, the problem of spatial coverage.;As an alternative to traditional way of collecting data using expensive equipment, development of cellular & mobile technology allows for leveraging embedded GPS sensors in smartphones carried by millions of road users. Crowd-sourcing GPS data will allow building traffic monitoring systems that utilize this opportunity for the purpose of accurate and real-time prediction of traffic measures. However, the effectiveness of these systems have not yet been proven or shown in real applications. In this thesis, we study some of the current available data sets and identify the requirements for accurate prediction. In our work, we propose the design for a crowd-sourcing traffic application, including an android-based mobile client and a server architecture. We also develop map-matching method. More importantly, we present prediction methods using machine learning techniques such as support vector regression.;Machine learning provides an alternative to traditional statistical method such as using averaged historic data for estimation of travel time. Machine Learning techniques played a key role in estimation in the last two decades. They are proved by providing better accuracy in estimation and in classification. However, employing a machine learning technique in any application requires creative modeling of the system and its sensory data. In this thesis, we model the road network as a graph and train different models for different links on the road. Modeling a road network as graph with nodes and links enables the learner to capture patterns occurring on each segment of road, thereby providing better accuracy. To evaluate the prediction models, we use three sets of data out of which two sets are collected using mobile probing and one set is generated using VISSIM traffic simulator. The results show that crowdsourcing is only more accurate than traditional statistical methods if the input values for input data are very close to the actual values. In particular, when speed of vehicles on a link are concerned, we need to provide the machine learning model with data that is only few minutes old; using average speed of vehicles, for example from the past half hour, as is usually seen in many web based traffic information sources may not allow for better performance

The Research Repository @ WVU (West Virginia University)

Optimization techniques for data mining and information reconstruction

Author: Bianchi Gianpiero
Publication venue
Publication date: 18/11/2013
Field of study

Archivio della ricerca- Università di Roma La Sapienza