8,242 research outputs found

    Modified Mahalanobis Taguchi System for Imbalance Data Classification

    Get PDF
    The Mahalanobis Taguchi System (MTS) is considered one of the most promising binary classification algorithms to handle imbalance data. Unfortunately, MTS lacks a method for determining an efficient threshold for the binary classification. In this paper, a nonlinear optimization model is formulated based on minimizing the distance between MTS Receiver Operating Characteristics (ROC) curve and the theoretical optimal point named Modified Mahalanobis Taguchi System (MMTS). To validate the MMTS classification efficacy, it has been benchmarked with Support Vector Machines (SVMs), Naive Bayes (NB), Probabilistic Mahalanobis Taguchi Systems (PTM), Synthetic Minority Oversampling Technique (SMOTE), Adaptive Conformal Transformation (ACT), Kernel Boundary Alignment (KBA), Hidden Naive Bayes (HNB), and other improved Naive Bayes algorithms. MMTS outperforms the benchmarked algorithms especially when the imbalance ratio is greater than 400. A real life case study on manufacturing sector is used to demonstrate the applicability of the proposed model and to compare its performance with Mahalanobis Genetic Algorithm (MGA)

    Automatic Analysis of People in Thermal Imagery

    Get PDF

    Development of a Wireless Mobile Computing Platform for Fall Risk Prediction

    Get PDF
    Falls are a major health risk with which the elderly and disabled must contend. Scientific research on smartphone-based gait detection systems using the Internet of Things (IoT) has recently become an important component in monitoring injuries due to these falls. Analysis of human gait for detecting falls is the subject of many research projects. Progress in these systems, the capabilities of smartphones, and the IoT are enabling the advancement of sophisticated mobile computing applications that detect falls after they have occurred. This detection has been the focus of most fall-related research; however, ensuring preventive measures that predict a fall is the goal of this health monitoring system. By performing a thorough investigation of existing systems and using predictive analytics, we built a novel mobile application/system that uses smartphone and smart-shoe sensors to predict and alert the user of a fall before it happens. The major focus of this dissertation has been to develop and implement this unique system to help predict the risk of falls. We used built-in sensors --accelerometer and gyroscope-- in smartphones and a sensor embedded smart-shoe. The smart-shoe contains four pressure sensors with a Wi-Fi communication module to unobtrusively collect data. The interactions between these sensors and the user resulted in distinct challenges for this research while also creating new performance goals based on the unique characteristics of this system. In addition to providing an exciting new tool for fall prediction, this work makes several contributions to current and future generation mobile computing research

    Integrated Machine Learning Approaches to Improve Classification performance and Feature Extraction Process for EEG Dataset

    Get PDF
    Epileptic seizure or epilepsy is a chronic neurological disorder that occurs due to brain neurons\u27 abnormal activities and has affected approximately 50 million people worldwide. Epilepsy can affect patients’ health and lead to life-threatening emergencies. Early detection of epilepsy is highly effective in avoiding seizures by intervening in treatment. The electroencephalogram (EEG) signal, which contains valuable information of electrical activity in the brain, is a standard neuroimaging tool used by clinicians to monitor and diagnose epilepsy. Visually inspecting the EEG signal is an expensive, tedious, and error-prone practice. Moreover, the result varies with different neurophysiologists for an identical reading. Thus, automatically classifying epilepsy into different epileptic states with a high accuracy rate is an urgent requirement and has long been investigated. This PhD thesis contributes to the epileptic seizure detection problem using Machine Learning (ML) techniques. Machine learning algorithms have been implemented to automatically classifying epilepsy from EEG data. Imbalance class distribution problems and effective feature extraction from the EEG signals are the two major concerns towards effectively and efficiently applying machine learning algorithms for epilepsy classification. The algorithms exhibit biased results towards the majority class when classes are imbalanced, while effective feature extraction can improve classification performance. In this thesis, we presented three different novel frameworks to effectively classify epileptic states while addressing the above issues. Firstly, a deep neural network-based framework exploring different sampling techniques was proposed where both traditional and state-of-the-art sampling techniques were experimented with and evaluated for their capability of improving the imbalance ratio and classification performance. Secondly, a novel integrated machine learning-based framework was proposed to effectively learn from EEG imbalanced data leveraging the Principal Component Analysis method to extract high- and low-variant principal components, which are empirically customized for the imbalanced data classification. This study showed that principal components associated with low variances can capture implicit patterns of the minority class of a dataset. Next, we proposed a novel framework to effectively classify epilepsy leveraging summary statistics analysis of window-based features of EEG signals. The framework first denoised the signals using power spectrum density analysis and replaced outliers with k-NN imputer. Next, window level features were extracted from statistical, temporal, and spectral domains. Basic summary statistics are then computed from the extracted features to feed into different machine learning classifiers. An optimal set of features are selected leveraging variance thresholding and dropping correlated features before feeding the features for classification. Finally, we applied traditional machine learning classifiers such as Support Vector Machine, Decision Tree, Random Forest, and k-Nearest Neighbors along with Deep Neural Networks to classify epilepsy. We experimented the frameworks with a benchmark dataset through rigorous experimental settings and displayed the effectiveness of the proposed frameworks in terms of accuracy, precision, recall, and F-beta score

    DeepLOB: Deep Convolutional Neural Networks for Limit Order Books

    Full text link
    We develop a large-scale deep learning model to predict price movements from limit order book (LOB) data of cash equities. The architecture utilises convolutional filters to capture the spatial structure of the limit order books as well as LSTM modules to capture longer time dependencies. The proposed network outperforms all existing state-of-the-art algorithms on the benchmark LOB dataset [1]. In a more realistic setting, we test our model by using one year market quotes from the London Stock Exchange and the model delivers a remarkably stable out-of-sample prediction accuracy for a variety of instruments. Importantly, our model translates well to instruments which were not part of the training set, indicating the model's ability to extract universal features. In order to better understand these features and to go beyond a "black box" model, we perform a sensitivity analysis to understand the rationale behind the model predictions and reveal the components of LOBs that are most relevant. The ability to extract robust features which translate well to other instruments is an important property of our model which has many other applications.Comment: 12 pages, 9 figure

    Identifying Rare and Subtle Behaviors: A Weakly Supervised Joint Topic Model

    Get PDF

    Adaptive Activation Function Generation Through Fuzzy Inference for Grooming Text Categorisation

    Get PDF
    The activation function is introduced to determine the output of neural networks by mapping the resulting values of neurons into a specific range. The activation functions often suffer from ‘gradient vanishing’, ‘non zero-centred function outputs’, ‘exploding gradients’, and ‘dead neurons’, which may lead to deterioration in the classification performance. This paper proposes an activation function generation approach using the Takagi-Sugeno-Kang inference in an effort to address such challenges. In addition, the proposed method further optimises the coefficients in the activation function using the genetic algorithm such that the activation function can adapt to different applications. This approach has been applied to a digital forensics application of online grooming detection. The evaluations confirm the superiority of the proposed activation function for online grooming detection using an imbalanced data set

    Predictive Modelling Approach to Data-Driven Computational Preventive Medicine

    Get PDF
    This thesis contributes novel predictive modelling approaches to data-driven computational preventive medicine and offers an alternative framework to statistical analysis in preventive medicine research. In the early parts of this research, this thesis presents research by proposing a synergy of machine learning methods for detecting patterns and developing inexpensive predictive models from healthcare data to classify the potential occurrence of adverse health events. In particular, the data-driven methodology is founded upon a heuristic-systematic assessment of several machine-learning methods, data preprocessing techniques, models’ training estimation and optimisation, and performance evaluation, yielding a novel computational data-driven framework, Octopus. Midway through this research, this thesis advances research in preventive medicine and data mining by proposing several new extensions in data preparation and preprocessing. It offers new recommendations for data quality assessment checks, a novel multimethod imputation (MMI) process for missing data mitigation, a novel imbalanced resampling approach, and minority pattern reconstruction (MPR) led by information theory. This thesis also extends the area of model performance evaluation with a novel classification performance ranking metric called XDistance. In particular, the experimental results show that building predictive models with the methods guided by our new framework (Octopus) yields domain experts' approval of the new reliable models’ performance. Also, performing the data quality checks and applying the MMI process led healthcare practitioners to outweigh predictive reliability over interpretability. The application of MPR and its hybrid resampling strategies led to better performances in line with experts' success criteria than the traditional imbalanced data resampling techniques. Finally, the use of the XDistance performance ranking metric was found to be more effective in ranking several classifiers' performances while offering an indication of class bias, unlike existing performance metrics The overall contributions of this thesis can be summarised as follow. First, several data mining techniques were thoroughly assessed to formulate the new Octopus framework to produce new reliable classifiers. In addition, we offer a further understanding of the impact of newly engineered features, the physical activity index (PAI) and biological effective dose (BED). Second, the newly developed methods within the new framework. Finally, the newly accepted developed predictive models help detect adverse health events, namely, visceral fat-associated diseases and advanced breast cancer radiotherapy toxicity side effects. These contributions could be used to guide future theories, experiments and healthcare interventions in preventive medicine and data mining
    corecore