10,114 research outputs found

    Predictive Maintenance on the Machining Process and Machine Tool

    Get PDF
    This paper presents the process required to implement a data driven Predictive Maintenance (PdM) not only in the machine decision making, but also in data acquisition and processing. A short review of the different approaches and techniques in maintenance is given. The main contribution of this paper is a solution for the predictive maintenance problem in a real machining process. Several steps are needed to reach the solution, which are carefully explained. The obtained results show that the Preventive Maintenance (PM), which was carried out in a real machining process, could be changed into a PdM approach. A decision making application was developed to provide a visual analysis of the Remaining Useful Life (RUL) of the machining tool. This work is a proof of concept of the methodology presented in one process, but replicable for most of the process for serial productions of pieces

    Multilevel Weighted Support Vector Machine for Classification on Healthcare Data with Missing Values

    Full text link
    This work is motivated by the needs of predictive analytics on healthcare data as represented by Electronic Medical Records. Such data is invariably problematic: noisy, with missing entries, with imbalance in classes of interests, leading to serious bias in predictive modeling. Since standard data mining methods often produce poor performance measures, we argue for development of specialized techniques of data-preprocessing and classification. In this paper, we propose a new method to simultaneously classify large datasets and reduce the effects of missing values. It is based on a multilevel framework of the cost-sensitive SVM and the expected maximization imputation method for missing values, which relies on iterated regression analyses. We compare classification results of multilevel SVM-based algorithms on public benchmark datasets with imbalanced classes and missing values as well as real data in health applications, and show that our multilevel SVM-based method produces fast, and more accurate and robust classification results.Comment: arXiv admin note: substantial text overlap with arXiv:1503.0625

    Mining large-scale human mobility data for long-term crime prediction

    Full text link
    Traditional crime prediction models based on census data are limited, as they fail to capture the complexity and dynamics of human activity. With the rise of ubiquitous computing, there is the opportunity to improve such models with data that make for better proxies of human presence in cities. In this paper, we leverage large human mobility data to craft an extensive set of features for crime prediction, as informed by theories in criminology and urban studies. We employ averaging and boosting ensemble techniques from machine learning, to investigate their power in predicting yearly counts for different types of crimes occurring in New York City at census tract level. Our study shows that spatial and spatio-temporal features derived from Foursquare venues and checkins, subway rides, and taxi rides, improve the baseline models relying on census and POI data. The proposed models achieve absolute R^2 metrics of up to 65% (on a geographical out-of-sample test set) and up to 89% (on a temporal out-of-sample test set). This proves that, next to the residential population of an area, the ambient population there is strongly predictive of the area's crime levels. We deep-dive into the main crime categories, and find that the predictive gain of the human dynamics features varies across crime types: such features bring the biggest boost in case of grand larcenies, whereas assaults are already well predicted by the census features. Furthermore, we identify and discuss top predictive features for the main crime categories. These results offer valuable insights for those responsible for urban policy or law enforcement

    Evaluation of Machine Learning Techniques for Early Identification of At-Risk Students

    Get PDF
    Student attrition is one of the long-standing problems facing higher education institutions despite the extensive research that has been undertaken to address it. To increase students’ success and retention rates, there is a need for early alert systems that facilitate the identification of at-risk students so that remedial measures may be taken in time to reduce the risk. However, incorporating ML predictive models into early warning systems face two main challenges: improving the accuracy of timely predictions and the generalizability of predictive models across on-campus and online courses. The goal of this study was to develop and evaluate predictive models that can be applied to on-campus and online courses to predict at-risk students based on data collected from different stages of a course: start of the course, 4th week, 8th week, and 12th week. In this research, several supervised machine learning algorithms were trained and evaluated on their performance. This study compared the performance of single classifiers (Logistic Regression, Decision Trees, Naïve Bayes, and Artificial Neural Networks) and ensemble classifiers (using bagging and boosting techniques). Their performance was evaluated in term of sensitivity, specificity, and Area Under Curve (AUC). A total of four experiments were conducted based on data collected from different stages of the course. In the first experiment, the classification algorithms were trained and evaluated based on data collected before the beginning of the semester. In the second experiment, the classification algorithms were trained and evaluated based on week-four data. Similarly, in the third and fourth experiments, the classification algorithms were trained and evaluated based on week-eight and week-12 data. The results demonstrated that ensemble classifiers were able to achieve the highest classification performance in all experiments. Additionally, the results of the generalizability analysis showed that the predictive models were able to attain a similar performance when used to classify on-campus and online students. Moreover, the Extreme Gradient Boosting (XGBoost) classifier was found to be the best performing classifier suited for the at-risk students’ classification problem and was able to achieve an AUC of ≈ 0.89, a sensitivity of ≈ 0.81, and specificity of ≈ 0.81 using data available at the start of a course. Finally, the XGBoost classifier was able to improve by 1% for each subsequent four weeks dataset reaching an AUC of ≈ 0.92, a sensitivity of ≈ 0.84, and specificity of ≈ 0.84 by week 12. While the additional learning management system\u27s (LMS) data helped in improving the prediction accuracy consistently as the course progresses, the improvement was marginal. Such findings suggest that the predictive models can be used to identify at-risk students even in courses that do not make significant use of LMS. The results of this research demonstrated the usefulness and effectiveness of ML techniques for early identification of at-risk students. Interestingly, it was found that fairly reliable predictions can be made at the start of the semester, which is significant in that help can be provided to at-risk students even before the course starts. Finally, it is hoped that the results of this study advance the understanding of the appropriateness and effectiveness of ML techniques when used for early identification of at-risk students

    A Hybrid Machine Learning Framework for Predicting Students’ Performance in Virtual Learning Environment

    Get PDF
    Virtual Learning Environments (VLE), such as Moodle and Blackboard, store vast data to help identify students\u27 performance and engagement. As a result, researchers have been focusing their efforts on assisting educational institutions in providing machine learning models to predict at-risk students and improve their performance. However, it requires an efficient approach to construct a model that can ultimately provide accurate predictions. Consequently, this study proposes a hybrid machine learning framework to predict students\u27 performance using eight classification algorithms and three ensemble methods (Bagging, Boosting, Voting) to determine the best-performing predictive model. In addition, this study used filter-based and wrapper-based feature selection techniques to select the best features of the dataset related to students\u27 performance. The obtained results reveal that the ensemble methods recorded higher predictive accuracy when compared to single classifiers. Furthermore, the accuracy of the models improved due to the feature selection techniques utilized in this study

    Prediction of student success: A smart data-driven approach

    Get PDF
    Predicting student’s academic performance is one of the subjects related to the Educational Data Mining process, which intends to extract useful information and new patterns from educational data. Understanding the drivers of student success may assist educators in developing pedagogical methods providing a tool for personalized feedback and advice. In order to improve the academic performance of students and create a decision support solution for higher education institutes, this dissertation proposed a methodology that uses educational data mining to compare prediction models for the students' success. Data belongs to ISCTE master students, a Portuguese university, during 2012 to 2022 academic years. In addition, it was studied which factors are the strongest predictors of the student’s success. PyCaret library was used to compare the performance of several algorithms. Factors that were proposed to influence the success include, for example, the student's gender, previous educational background, the existence of a special statute, and the parents' educational degree. The analysis revealed that the Light Gradient Boosting Machine Classifier had the best performance with an accuracy of 87.37%, followed by Gradient Boosting Classifier (accuracy = 85.11%) and Adaptive Boosting Classifier (accuracy = 83.37%). Hyperparameter tunning improved the performance of all the algorithms. Feature importance analysis revealed that the factors that impacted the student’s success most were the average grade, master time, and the gap between degrees, i.e., the number of years between the last degree and the start of the master.A previsão do sucesso académico de estudantes é um dos tópicos relacionados com a mineração de dados educacionais, a qual pretende extrair informação útil e encontrar padrões a partir de dados académicos. Compreender que fatores afetam o sucesso dos estudantes pode ajudar, as instituições de educação, no desenvolvimento de métodos pedagógicos, dando uma ferramenta de feedback e aconselhamento personalizado. Com o fim de melhorar o desempenho académico dos estudantes e criar uma solução de apoio à decisão, para instituições de ensino superior, este artigo propõe uma metodologia que usa mineração de dados para comparar modelos de previsão para o sucesso dos alunos. Os dados pertencem a alunos de mestrado que frequentaram o ISCTE, uma universidade portuguesa, durante os anos letivos de 2012 a 2022. Além disso, foram estudados quais os fatores que mais afetam o sucesso do aluno. Os vários algoritmos foram comparados pela biblioteca PyCaret. Alguns dos fatores que foram propostos como relevantes para o sucesso incluem, o género do aluno, a formação educacional anterior, a existência de um estatuto especial e o grau de escolaridade dos pais. A análise dos resultados demonstrou que o classificador Light Gradient Boosting Machine (LGBMC) é o que tem o melhor desempenho com uma accuracy de 87.37%, seguindo-se o classificador Gradient Boosting Classifier (accuracy=85.11%) e o classificador Adaptive Boosting (accuracy=83.37%). A afinação de hiperparâmetros melhorou o desempenho de todos os algoritmos. As variáveis que demonstraram ter maior impacto foram a média dos estudantes, a duração do mestrado e o intervalo entre estudos
    • …
    corecore