Search CORE

10,114 research outputs found

Predictive Maintenance on the Machining Process and Machine Tool

Author: Box
Breiman
Djeziri
Extrapolation
Farokhzad
Galar
Guyon
Jin
Maronna
Mills
Mobley
Tsymbal
Tyagi
Winkler
Publication venue: 'MDPI AG'
Publication date: 01/01/2020
Field of study

This paper presents the process required to implement a data driven Predictive Maintenance (PdM) not only in the machine decision making, but also in data acquisition and processing. A short review of the different approaches and techniques in maintenance is given. The main contribution of this paper is a solution for the predictive maintenance problem in a real machining process. Several steps are needed to reach the solution, which are carefully explained. The obtained results show that the Preventive Maintenance (PM), which was carried out in a real machining process, could be changed into a PdM approach. A decision making application was developed to provide a visual analysis of the Remaining Useful Life (RUL) of the machining tool. This work is a proof of concept of the methodology presented in one process, but replicable for most of the process for serial productions of pieces

Crossref

LAReferencia - Red Federada de Repositorios Institucionales de Publicaciones Científicas Latinoamericanas

TECNALIA Publications

Multilevel Weighted Support Vector Machine for Classification on Healthcare Data with Missing Values

Author: Marko Nicholas
Razzaghi Talayeh
Roderick Oleg
Safro Ilya
Publication venue: 'Public Library of Science (PLoS)'
Publication date: 07/04/2016
Field of study

This work is motivated by the needs of predictive analytics on healthcare data as represented by Electronic Medical Records. Such data is invariably problematic: noisy, with missing entries, with imbalance in classes of interests, leading to serious bias in predictive modeling. Since standard data mining methods often produce poor performance measures, we argue for development of specialized techniques of data-preprocessing and classification. In this paper, we propose a new method to simultaneously classify large datasets and reduce the effects of missing values. It is based on a multilevel framework of the cost-sensitive SVM and the expected maximization imputation method for missing values, which relies on iterated regression analyses. We compare classification results of multilevel SVM-based algorithms on public benchmark datasets with imbalanced classes and missing values as well as real data in health applications, and show that our multilevel SVM-based method produces fast, and more accurate and robust classification results.Comment: arXiv admin note: substantial text overlap with arXiv:1503.0625

arXiv.org e-Print Archive

Directory of Open Access Journals

FigShare

Recommended from our members

State-of-the-art on research and applications of machine learning in the building life cycle

Author: Hong T
Luo X
Wang Z
Zhang W
Publication venue: eScholarship, University of California
Publication date: 01/04/2020
Field of study

Fueled by big data, powerful and affordable computing resources, and advanced algorithms, machine learning has been explored and applied to buildings research for the past decades and has demonstrated its potential to enhance building performance. This study systematically surveyed how machine learning has been applied at different stages of building life cycle. By conducting a literature search on the Web of Knowledge platform, we found 9579 papers in this field and selected 153 papers for an in-depth review. The number of published papers is increasing year by year, with a focus on building design, operation, and control. However, no study was found using machine learning in building commissioning. There are successful pilot studies on fault detection and diagnosis of HVAC equipment and systems, load prediction, energy baseline estimate, load shape clustering, occupancy prediction, and learning occupant behaviors and energy use patterns. None of the existing studies were adopted broadly by the building industry, due to common challenges including (1) lack of large scale labeled data to train and validate the model, (2) lack of model transferability, which limits a model trained with one data-rich building to be used in another building with limited data, (3) lack of strong justification of costs and benefits of deploying machine learning, and (4) the performance might not be reliable and robust for the stated goals, as the method might work for some buildings but could not be generalized to others. Findings from the study can inform future machine learning research to improve occupant comfort, energy efficiency, demand flexibility, and resilience of buildings, as well as to inspire young researchers in the field to explore multidisciplinary approaches that integrate building science, computing science, data science, and social science

eScholarship - University of California

Mining large-scale human mobility data for long-term crime prediction

Author: Kadar Cristina
Pletikosa Irena
Publication venue
Publication date: 04/06/2018
Field of study

Traditional crime prediction models based on census data are limited, as they fail to capture the complexity and dynamics of human activity. With the rise of ubiquitous computing, there is the opportunity to improve such models with data that make for better proxies of human presence in cities. In this paper, we leverage large human mobility data to craft an extensive set of features for crime prediction, as informed by theories in criminology and urban studies. We employ averaging and boosting ensemble techniques from machine learning, to investigate their power in predicting yearly counts for different types of crimes occurring in New York City at census tract level. Our study shows that spatial and spatio-temporal features derived from Foursquare venues and checkins, subway rides, and taxi rides, improve the baseline models relying on census and POI data. The proposed models achieve absolute R^2 metrics of up to 65% (on a geographical out-of-sample test set) and up to 89% (on a temporal out-of-sample test set). This proves that, next to the residential population of an area, the ambient population there is strongly predictive of the area's crime levels. We deep-dive into the main crime categories, and find that the predictive gain of the human dynamics features varies across crime types: such features bring the biggest boost in case of grand larcenies, whereas assaults are already well predicted by the census features. Furthermore, we identify and discuss top predictive features for the main crime categories. These results offer valuable insights for those responsible for urban policy or law enforcement

arXiv.org e-Print Archive

Repository for Publications and Research Data

Directory of Open Access Journals

FigShare

Evaluation of Machine Learning Techniques for Early Identification of At-Risk Students

Author: Awaji Mansour Hamoud
Publication venue: NSUWorks
Publication date: 01/01/2018
Field of study

Student attrition is one of the long-standing problems facing higher education institutions despite the extensive research that has been undertaken to address it. To increase students’ success and retention rates, there is a need for early alert systems that facilitate the identification of at-risk students so that remedial measures may be taken in time to reduce the risk. However, incorporating ML predictive models into early warning systems face two main challenges: improving the accuracy of timely predictions and the generalizability of predictive models across on-campus and online courses. The goal of this study was to develop and evaluate predictive models that can be applied to on-campus and online courses to predict at-risk students based on data collected from different stages of a course: start of the course, 4th week, 8th week, and 12th week. In this research, several supervised machine learning algorithms were trained and evaluated on their performance. This study compared the performance of single classifiers (Logistic Regression, Decision Trees, Naïve Bayes, and Artificial Neural Networks) and ensemble classifiers (using bagging and boosting techniques). Their performance was evaluated in term of sensitivity, specificity, and Area Under Curve (AUC). A total of four experiments were conducted based on data collected from different stages of the course. In the first experiment, the classification algorithms were trained and evaluated based on data collected before the beginning of the semester. In the second experiment, the classification algorithms were trained and evaluated based on week-four data. Similarly, in the third and fourth experiments, the classification algorithms were trained and evaluated based on week-eight and week-12 data. The results demonstrated that ensemble classifiers were able to achieve the highest classification performance in all experiments. Additionally, the results of the generalizability analysis showed that the predictive models were able to attain a similar performance when used to classify on-campus and online students. Moreover, the Extreme Gradient Boosting (XGBoost) classifier was found to be the best performing classifier suited for the at-risk students’ classification problem and was able to achieve an AUC of ≈ 0.89, a sensitivity of ≈ 0.81, and specificity of ≈ 0.81 using data available at the start of a course. Finally, the XGBoost classifier was able to improve by 1% for each subsequent four weeks dataset reaching an AUC of ≈ 0.92, a sensitivity of ≈ 0.84, and specificity of ≈ 0.84 by week 12. While the additional learning management system\u27s (LMS) data helped in improving the prediction accuracy consistently as the course progresses, the improvement was marginal. Such findings suggest that the predictive models can be used to identify at-risk students even in courses that do not make significant use of LMS. The results of this research demonstrated the usefulness and effectiveness of ML techniques for early identification of at-risk students. Interestingly, it was found that fairly reliable predictions can be made at the start of the semester, which is significant in that help can be provided to at-risk students even before the course starts. Finally, it is hoped that the results of this study advance the understanding of the appropriateness and effectiveness of ML techniques when used for early identification of at-risk students

NSU Works

A Hybrid Machine Learning Framework for Predicting Students’ Performance in Virtual Learning Environment

Author: Evangelista Edmund
Publication venue: ZU Scholars
Publication date: 21/12/2021
Field of study

Virtual Learning Environments (VLE), such as Moodle and Blackboard, store vast data to help identify students\u27 performance and engagement. As a result, researchers have been focusing their efforts on assisting educational institutions in providing machine learning models to predict at-risk students and improve their performance. However, it requires an efficient approach to construct a model that can ultimately provide accurate predictions. Consequently, this study proposes a hybrid machine learning framework to predict students\u27 performance using eight classification algorithms and three ensemble methods (Bagging, Boosting, Voting) to determine the best-performing predictive model. In addition, this study used filter-based and wrapper-based feature selection techniques to select the best features of the dataset related to students\u27 performance. The obtained results reveal that the ensemble methods recorded higher predictive accuracy when compared to single classifiers. Furthermore, the accuracy of the models improved due to the feature selection techniques utilized in this study

ZU Scholars (Zayed University)

Prediction of student success: A smart data-driven approach

Author: Pinto Ana Rosa Almeida
Publication venue
Publication date: 16/12/2022
Field of study

Predicting student’s academic performance is one of the subjects related to the Educational Data Mining process, which intends to extract useful information and new patterns from educational data. Understanding the drivers of student success may assist educators in developing pedagogical methods providing a tool for personalized feedback and advice. In order to improve the academic performance of students and create a decision support solution for higher education institutes, this dissertation proposed a methodology that uses educational data mining to compare prediction models for the students' success. Data belongs to ISCTE master students, a Portuguese university, during 2012 to 2022 academic years. In addition, it was studied which factors are the strongest predictors of the student’s success. PyCaret library was used to compare the performance of several algorithms. Factors that were proposed to influence the success include, for example, the student's gender, previous educational background, the existence of a special statute, and the parents' educational degree. The analysis revealed that the Light Gradient Boosting Machine Classifier had the best performance with an accuracy of 87.37%, followed by Gradient Boosting Classifier (accuracy = 85.11%) and Adaptive Boosting Classifier (accuracy = 83.37%). Hyperparameter tunning improved the performance of all the algorithms. Feature importance analysis revealed that the factors that impacted the student’s success most were the average grade, master time, and the gap between degrees, i.e., the number of years between the last degree and the start of the master.A previsão do sucesso académico de estudantes é um dos tópicos relacionados com a mineração de dados educacionais, a qual pretende extrair informação útil e encontrar padrões a partir de dados académicos. Compreender que fatores afetam o sucesso dos estudantes pode ajudar, as instituições de educação, no desenvolvimento de métodos pedagógicos, dando uma ferramenta de feedback e aconselhamento personalizado. Com o fim de melhorar o desempenho académico dos estudantes e criar uma solução de apoio à decisão, para instituições de ensino superior, este artigo propõe uma metodologia que usa mineração de dados para comparar modelos de previsão para o sucesso dos alunos. Os dados pertencem a alunos de mestrado que frequentaram o ISCTE, uma universidade portuguesa, durante os anos letivos de 2012 a 2022. Além disso, foram estudados quais os fatores que mais afetam o sucesso do aluno. Os vários algoritmos foram comparados pela biblioteca PyCaret. Alguns dos fatores que foram propostos como relevantes para o sucesso incluem, o género do aluno, a formação educacional anterior, a existência de um estatuto especial e o grau de escolaridade dos pais. A análise dos resultados demonstrou que o classificador Light Gradient Boosting Machine (LGBMC) é o que tem o melhor desempenho com uma accuracy de 87.37%, seguindo-se o classificador Gradient Boosting Classifier (accuracy=85.11%) e o classificador Adaptive Boosting (accuracy=83.37%). A afinação de hiperparâmetros melhorou o desempenho de todos os algoritmos. As variáveis que demonstraram ter maior impacto foram a média dos estudantes, a duração do mestrado e o intervalo entre estudos

Repositório Institucional do ISCTE-IUL