Search CORE

1,652 research outputs found

Data prediction for cases of incorrect data in multi-node electrocardiogram monitoring

Author: Erawati Rajab Tati Latifah
Hadiyoso Sugondo
Nugroho Heru
Surendro Kridanto
Publication venue: 'Institute of Advanced Engineering and Science'
Publication date: 01/04/2022
Field of study

The development of a mesh topology in multi-node electrocardiogram (ECG) monitoring based on the ZigBee protocol still has limitations. When more than one active ECG node sends a data stream, there will be incorrect data or damage due to a failure of synchronization. The incorrect data will affect signal interpretation. Therefore, a mechanism is needed to correct or predict the damaged data. In this study, the method of expectation-maximization (EM) and regression imputation (RI) was proposed to overcome these problems. Real data from previous studies are the main modalities used in this study. The ECG signal data that has been predicted is then compared with the actual ECG data stored in the main controller memory. Root mean square error (RMSE) is calculated to measure system performance. The simulation was performed on 13 ECG waves, each of them has 1000 samples. The simulation results show that the EM method has a lower predictive error value than the RI method. The average RMSE for the EM and RI methods is 4.77 and 6.63, respectively. The proposed method is expected to be used in the case of multi-node ECG monitoring, especially in the ZigBee application to minimize errors

ZENODO

NEUROSURGERY ENTHUSIASTIC WOMEN SOCIETY

Institute of Advanced Engineering and Science

Regression Analysis of University Giving Data

Author: Jin Yi
Publication venue: Digital WPI
Publication date: 02/01/2007
Field of study

This project analyzed the giving data of Worcester Polytechnic Institute\u27s alumni and other constituents (parents, friends, neighbors, etc.) from fiscal year 1983 to 2007 using a two-stage modeling approach. Logistic regression analysis was conducted in the first stage to predict the likelihood of giving for each constituent, followed by linear regression method in the second stage which was used to predict the amount of contribution to be expected from each contributor. Box-Cox transformation was performed in the linear regression phase to ensure the assumption underlying the model holds. Due to the nature of the data, multiple imputation was performed on the missing information to validate generalization of the models to a broader population. Concepts from the field of direct and database marketing, like score and lift , were also introduced in this report

DigitalCommons@WPI

Effects of data cleaning on machine learning model performance

Author: Kokkonen H. (Henna)
Publication venue: University of Oulu
Publication date: 11/11/2019
Field of study

Abstract. This thesis is focused on the preprocessing and challenges of a university student data set and how different levels of data preprocessing affect the performance of a prediction model both in general and in selected groups of interest. The data set comprises the students at the University of Oulu who were admitted to the Faculty of Information Technology and Electrical Engineering during years 2006–2015. This data set was cleaned at three different levels, which resulted in three differently processed data sets: one set is the original data set with only basic cleaning, the second has been cleaned out of the most obvious anomalies and the third has been systematically cleaned out of possible anomalies. Each of these data sets was used to build a Gradient Boosting Machine model that predicted the cumulative number of ECTS the students would achieve by the end of their second-year studies based on their first-year studies and the Matriculation Examination results. The effects of the cleaning on the model performance were examined by comparing the prediction accuracy and the information the models gave of the factors that might indicate a slow ECTS accumulation. The results showed that the prediction accuracy improved after each cleaning stage and the influences of the features altered significantly, becoming more reasonable.Datan siivouksen vaikutukset koneoppimismallin suorituskykyyn. Tiivistelmä. Tässä tutkielmassa keskitytään opiskelijadatan esikäsittelyyn ja haasteisiin sekä siihen, kuinka eritasoinen esikäsittely vaikuttaa ennustemallin suorituskykyyn sekä yleisesti että tietyissä kiinnostuksen kohteena olevissa ryhmissä. Opiskelijadata koostuu Oulun yliopiston Tieto- ja sähkötekniikan tiedekuntaan vuosina 2006–2015 valituista opiskelijoista. Tätä opiskelijadataa käsiteltiin kolmella eri tasolla, jolloin saatiin kolme eritasoisesti siivottua versiota alkuperäisestä datajoukosta. Ensimmäinen versio on alkuperäinen datajoukko, jolle on tehty vain perussiivous, toisessa versiossa datasta on poistettu vain ilmeisimmät poikkeavuudet ja kolmannessa versiossa datasta on systemaattisesti poistettu mahdolliset poikkeavuudet. Jokaisella datajoukolla opetettiin Gradient Boosting Machine koneoppismismalli ennustamaan opiskelijoiden opintopistekertymää toisen vuoden loppuun mennessä perustuen heidän ensimmäisen vuoden opintoihinsa ja ylioppilaskirjoitustensa tuloksiin. Datan eritasoisen siivouksen vaikutuksia mallin suorituskykyyn tutkittiin vertailemalla mallien ennustetarkkuutta sekä tietoa, jota mallit antoivat niistä tekijöistä, jotka voivat ennakoida hitaampaa opintopistekertymää. Tulokset osoittivat mallin ennustetarkkuuden parantuneen jokaisen käsittelytason jälkeen sekä mallin ennustajien vaikutusten muuttuneen järjellisemmiksi

University of Oulu Repository - Jultika

Near-Lossless Compression for Large Traffic Networks

Author: Asif Muhammad Tayyab
Dauwels Justin
Jaillet Patrick
Mitrovic Nikola
Srinivasan Kannan
Publication venue: 'Institute of Electrical and Electronics Engineers (IEEE)'
Publication date: 01/01/2014
Field of study

With advancements in sensor technologies, intelligent transportation systems can collect traffic data with high spatial and temporal resolution. However, the size of the networks combined with the huge volume of the data puts serious constraints on system resources. Low-dimensional models can help ease these constraints by providing compressed representations for the networks. In this paper, we analyze the reconstruction efficiency of several low-dimensional models for large and diverse networks. The compression performed by low-dimensional models is lossy in nature. To address this issue, we propose a near-lossless compression method for traffic data by applying the principle of lossy plus residual coding. To this end, we first develop a low-dimensional model of the network. We then apply Huffman coding (HC) in the residual layer. The resultant algorithm guarantees that the maximum reconstruction error will remain below a desired tolerance limit. For analysis, we consider a large and heterogeneous test network comprising of more than 18 000 road segments. The results show that the proposed method can efficiently compress data obtained from a large and diverse road network, while maintaining the upper bound on the reconstruction error.Singapore. National Research Foundation (Singapore-MIT Alliance for Research and Technology Center. Future Urban Mobility Program

DSpace@MIT

Condition Monitoring of Wind Turbines Using Intelligent Machine Learning Techniques

Author: Morshedizadeh Majid
Publication venue: 'University of Windsor Leddy Library'
Publication date: 01/01/2017
Field of study

Wind Turbine condition monitoring can detect anomalies in turbine performance which have the potential to result in unexpected failure and financial loss. This study examines common Supervisory Control And Data Acquisition (SCADA) data over a period of 20 months for 21 pitch regulated 2.3 MW turbines and is presented in three manuscripts. First, power curve monitoring is targeted applying various types of Artificial Neural Networks to increase modeling accuracy. It is shown how the proposed method can significantly improve network reliability compared with existing models. Then, an advance technique is utilized to create a smoother dataset for network training followed by establishing dynamic ANFIS network. At this stage, designed network aims to predict power generation in future hours. Finally, a recursive principal component analysis is performed to extract significant features to be used as input parameters of the network. A novel fusion technique is then employed to build an advanced model to make predictions of turbines performance with favorably low errors

Scholarship at UWindsor

ARDP: SIMPLIFIED MACHINE LEARNING PREDICTOR FOR MISSING UNIDIMENSIONAL ACADEMIC RESULTS DATASET

Author: Agbele Kehinde
Akinyede Olufemi
Folorunso Olufemi
Publication venue: 'Politechnika Lubelska'
Publication date: 31/03/2023
Field of study

We present a machine learning predictor for academic results datasets (PARD), for missing academic results based on chi-squared expected calculation, positional clustering, progressive approximation of relative residuals, and positional averages of the data in a sampled population. Academic results datasets are data originating from academic institutions’ results repositories. It is a technique designed specifically for predicting missing academic results. Since the whole essence of data mining is to elicit useful information and gain knowledge-driven insights into datasets, PARD positions data explorer at this advantageous perspective. PARD promises to solve missing academic results dataset problems more quickly over and above what currently obtains in literatures. The predictor was implemented using Python, and the results obtained show that it is admissible in a minimum of up to 93.6 average percent accurate predictions of the sampled cases. The results demonstrate that PARD shows a tendency toward greater precision in providing the better solution to the problems of predictions of missing academic results datasets in universities

Lublin University of Technology Journals

Data mining tool for academic data exploitation: selection of most suitable algorithms

Author: Alves Paulo
Barbu Marian
Bazzarelli M.
Dominguez Manuel
Morán Antonio
Paganoni Anna
Pereira Maria João
Podpora Michal
Prada Miguel
Spagnolini Umberto
Torrebruno Aldo
Vicario José
Vilanova Ramon
Publication venue
Publication date: 01/01/2018
Field of study

SPEET project is aimed at exploiting the potential synergy among the huge amount of academic data actually existing at universities and the maturity of data science in order to provide tools to extract information from students’ data. A rich picture can be extracted from this data if conveniently processed. The purpose of this project is to apply data mining algorithms to process this data in order to extract information about and to identify student profiles. In this document, the results obtained at SPEET project under the development of the data mining tools are presented. More specifically, two mechanisms have been developed: a clustering/classification scheme of students in terms of academic performance and a drop-out prediction system. The document starts by addressing the motivation of the development of data mining tools along with the considerations taken into account for academic data gathering. These considerations include the proposed unified dataset format and some details about confidentiality issues. Next, the students’ clustering and classification schemes are presented in detail. More specifically, a description of the considered machine learning algorithms can be found. Besides, a discussion of obtained results when considering data belonging to the different SPEET project’s partners is addressed. Results show how groups of clusters can be automatically identified and how new students can be classified into existing groups with a high accuracy. Finally, the implemented drop-out prediction system is considered by presenting several algorithms alternatives. In this case, the evaluation of the dropout mechanism is focused on one institution, showing a prediction accuracy around 91 %. Algorithms presented at this document are available at repositories or inline code format, as accordingly indicated.info:eu-repo/semantics/publishedVersio

LAReferencia - Red Federada de Repositorios Institucionales de Publicaciones Científicas Latinoamericanas

Biblioteca Digital do IPB

Diposit Digital de Documents de la UAB

Segmentation with unsupervised learning: An application using the Walker's data

Author: Polat Taylan
Publication venue: MEF Üniversitesi Fen Bilimleri Enstitüsü
Publication date: 01/01/2021
Field of study

In this project, the Walkers suitable for the service were filtered by using the dataset shared by the DogGo company. Then, unsupervised machine learning methods such as K-Means, Gaussian, Principal Component Analysis were used to score and cluster the most suitable walkers according to performance, willingness, and experience. DogGo is the first mobile application in Turkey that provides pet walking and grooming services to its customers in a safe and professional manner. DogGo provides a professional service where dogs are taken care of in dog families' own homes or at the caretaker's home for any need of dog families. DogGo Company wants to provide the best matching of walkers and animals, using Machine Learning algorithms, through a 5-step acquisition process for their walkers. While the results of the K-means models created on the unique sliders were compared with the help of the Elbow method and the Silhouette score, the results of the Gaussian models were compared with the AIC and BIC method. In addition, an RFM scoring in a classical structure has also been created. When the results of the study were examined considering the Elbow and Silhouette scores, it was shown that the model created with K-Means gave the best results, and the number of clusters was decided as 2

MEF University Institutional Repository

Recommended from our members

End-to-End Machine Learning Frameworks for Medicine: Data Imputation, Model Interpretation and Synthetic Data Generation

Author: Yoon Jinsung
Publication venue: eScholarship, University of California
Publication date: 01/01/2020
Field of study

Tremendous successes in machine learning have been achieved in a variety of applications such as image classification and language translation via supervised learning frameworks. Recently, with the rapid increase of electronic health records (EHR), machine learning researchers got immense opportunities to adopt the successful supervised learning frameworks to diverse clinical applications. To properly employ machine learning frameworks for medicine, we need to handle the special properties of the EHR and clinical applications: (1) extensive missing data, (2) model interpretation, (3) privacy of the data. This dissertation addresses those specialties to construct end-to-end machine learning frameworks for clinical decision support. We focus on the following three problems: (1) how to deal with incomplete data (data imputation), (2) how to explain the decisions of the trained model (model interpretation), (3) how to generate synthetic data for better sharing private clinical data (synthetic data generation). To appropriately handle those problems, we propose novel machine learning algorithms for both static and longitudinal settings. For data imputation, we propose modified Generative Adversarial Networks and Recurrent Neural Networks to accurately impute the missing values and return the complete data for applying state-of-the-art supervised learning models. For model interpretation, we utilize the actor-critic framework to estimate feature importance of the trained model's decision in an instance level. We expand this algorithm to active sensing framework that recommends which observations should we measure and when. For synthetic data generation, we extend well-known Generative Adversarial Network frameworks from static setting to longitudinal setting, and propose a novel differentially private synthetic data generation framework.To demonstrate the utilities of the proposed models, we evaluate those models on various real-world medical datasets including cohorts in the intensive care units, wards, and primary care hospitals. We show that the proposed algorithms consistently outperform state-of-the-art for handling missing data, understanding the trained model, and generating private synthetic data that are critical for building end-to-end machine learning frameworks for medicine

eScholarship - University of California

Manufacturing Process Causal Knowledge Discovery using a Modified Random Forest-based Predictive Model

Author: MESHARI EBRAHIM
Publication venue: 'Swansea University'
Publication date: 01/01/2020
Field of study

A Modified Random Forest algorithm (MRF)-based predictive model is proposed for use in man-ufacturing processes to estimate the e˙ects of several potential interventions, such as (i) altering the operating ranges of selected continuous process parameters within specified tolerance limits,(ii) choosing particular categories of discrete process parameters, or (iii) choosing combinations of both types of process parameters. The model introduces a non-linear approach to defining the most critical process inputs by scoring the contribution made by each process input to the process output prediction power. It uses this contribution to discover optimal operating ranges for the continuous process parameters and/or optimal categories for discrete process parameters. The set of values used for the process inputs was generated from operating ranges identified using a novel Decision Path Search (DPS) algorithm and Bootstrap sampling.The odds ratio is the ratio between the occurrence probabilities of desired and undesired process output values. The e˙ect of potential interventions, or of proposed confirmation trials, are quantified as posterior odds and used to calculate conditional probability distributions. The advantages of this approach are discussed in comparison to fitting these probability distributions to Bayesian Networks (BN).The proposed explainable data-driven predictive model is scalable to a large number of process factors with non-linear dependence on one or more process responses. It allows the discovery of data-driven process improvement opportunities that involve minimal interaction with domain expertise. An iterative Random Forest algorithm is proposed to predict the missing values for the mixed dataset (continuous and categorical process parameters). It is shown that the algorithm is robust even at high proportions of missing values in the dataset.The number of observations available in manufacturing process datasets is generally low, e.g. of a similar order of magnitude to the number of process parameters. Hence, Neural Network (NN)-based deep learning methods are generally not applicable, as these techniques require 50-100 times more observations than input factors (process parameters).The results are verified on a number of benchmark examples with datasets published in the lit-erature. The results demonstrate that the proposed method outperforms the comparison approaches in term of accuracy and causality, with linearity assumed. Furthermore, the computational cost is both far better and very feasible for heterogeneous datasets

Cronfa at Swansea University