11 research outputs found

    Approx-SMOTE: Fast SMOTE for Big Data on Apache Spark

    Get PDF
    One of the main goals of Big Data research, is to find new data mining methods that are able to process large amounts of data in acceptable times. In Big Data classification, as in traditional classification, class imbalance is a common problem that must be addressed, in the case of Big Data also looking for a solution that can be applied in an acceptable execution time. In this paper we present Approx-SMOTE, a parallel implementation of the SMOTE algorithm for the Apache Spark framework. The key difference with the original SMOTE, besides parallelism, is that it uses an approximated version of k-Nearest Neighbor which makes it highly scalable. Although an implementation of SMOTE for Big Data already exists (SMOTE-BD), it uses an exact Nearest Neighbor search, which does not make it entirely scalable. Approx-SMOTE on the other hand is able to achieve up to 30 times faster run times without sacrificing the improved classification performance offered by the original SMOTE.“La Caixa” Foundation, under agreement LCF/PR/PR18/51130007. This work was supported by the Junta de Castilla y León under project BU055P20 and by the Ministry of Science and Innovation of Spain under project PID2020-119894 GB-I00, co-financed through European Union FEDER funds. It also was supported through Consejería de Educación of the Junta de Castilla y León and the European Social Fund through a pre-doctoral grant (EDU/1100/2017). This material is based upon work supported by Google Cloud

    Seguimiento de la actividad y abandono en Moodle mediante la aplicación UBUMonitor

    Get PDF
    Teaching with online learning platforms should simplify the monitoring of students’ activity, particularly when evaluating student dropout. Popular learning environments such as Moodle should implement visual analytic tools that facilitate such tasks, nevertheless, institutions are usually reluctant to incorporate them. This paper presents UBUMonitor, a desktop application that allows the visualization of student’s activity data, extended as a proof of concept with a module for dropout tracking. Therefore, by using UBUMonitor, teachers will be able to easily visualize their students’ engagement with their subject, which can facilitate early action to prevent students from dropping out more effectivel

    Paralelización y adaptación a plataformas de cómputo en la nube de algoritmos de mantenimiento y detección de fallos

    No full text
    El foco de la presente tesis se centra en el papel que tiene el big data dentro de la nueva revolución industrial que está teniendo lugar actualmente. Comúnmente nos referiremos a ella con el término Industria 4.0. La característica que más nos interesa de esta nueva industria, es el creciente uso de sensores capaces de monitorizar y registrar de forma continua el funcionamiento de su maquinaria. Gracias a ello surgen nuevas oportunidades para optimizar procesos como el mantenimiento, avanzando hacia nuevas estrategias más eficaces que contribuyan a abaratar costes y maximizar los beneficios. Es el caso del mantenimiento predictivo, el cual, a través de la detección temprana de fallos en todo tipo de maquinaria, como motores de inducción, por ejemplo, se pueden programar mantenimientos que ayuden a evitar paradas inesperadas en el proceso de producción. Fruto de ello surgen líneas de investigación sobre el desarrollo de nuevos algoritmos predictivos, o la adaptación de los existentes para hacerlos capaces de trabajar con las grandes cantidades de datos que se generan en estos problemas. Para este último caso, el tipo de adaptación escogida ha sido la paralelización algorítmica para su ejecución en plataformas de cómputo en la nube

    Rotation Forest for multi-target regression

    No full text
    The prediction of multiple numeric outputs at the same time is called multi-target regression (MTR), and it has gained attention during the last decades. This task is a challenging research topic in supervised learning because it poses additional difficulties to traditional single-target regression (STR), and many real-world problems involve the prediction of multiple targets at once. One of the most successful approaches to deal with MTR, although not the only one, consists in transforming the problem in several STR problems, whose outputs will be combined building up the MTR output. In this paper, the Rotation Forest ensemble method, previously proposed for single-label classification and single-target regression, is adapted to MTR tasks and tested with several regressors and data sets. Our proposal rotates the input space in an efficient and novel fashion, avoiding extra rotations forced by MTR problem decomposition. Four approaches for MTR are used: single-target (ST), stacked-single target (SST), Ensembles of Regressor Chains (ERC), and Multi-target Regression via Quantization (MRQ). For assessing the benefits of the proposal, a thorough experimentation with 28 MTR data sets and statistical tests are used, concluding that Rotation Forest, adapted by means of these approaches, outperforms other popular ensembles, such as Bagging and Random Forest.Ministerio de Economía y Competitividad of the Spanish Government under project TIN2015-67534-P (MINECO-FEDER, UE), by the Junta de Castilla y León under project BU085P17 (JCyL/FEDER, UE) (both projects co-financed through European Union FEDER funds), and by the Consejería de Educación of the Junta de Castilla y León and the European Social Fund with the EDU/1100/2017 pre-doctoral grant

    Early and extremely early multi-label fault diagnosis in induction motors

    No full text
    The detection of faulty machinery and its automated diagnosis is an industrial priority because efficient fault diagnosis implies efficient management of the maintenance times, reduction of energy consumption, reduction in overall costs and, most importantly, the availability of the machinery is ensured. Thus, this paper presents a new intelligent multi-fault diagnosis method based on multiple sensor information for assessing the occurrence of single, combined, and simultaneous faulty conditions in an induction motor. The contribution and novelty of the proposed method include the consideration of different physical magnitudes such as vibrations, stator currents, voltages, and rotational speed as a meaningful source of information of the machine condition. Moreover, for each available physical magnitude, the reduction of the original number of attributes through the Principal Component Analysis leads to retain a reduced number of significant features that allows achieving the final diagnosis outcome by a multi-label classification tree. The effectiveness of the method was validated by using a complete set of experimental data acquired from a laboratory electromechanical system, where a healthy and seven faulty scenarios were assessed. Also, the interpretation of the results do not require any prior expert knowledge and the robustness of this proposal allows its application in industrial applications, since it may deal with different operating conditions such as different loads and operating frequencies. Finally, the performance was evaluated using multi-label measures, which to the best of our knowledge, is an innovative development in the field condition monitoring and fault identification.project TIN2015-67534-P (MINECO, Spain/FEDER, UE) of the Ministerio de Economía y Competitividad of the Spanish Government, project BU085P17 (JCyL/FEDER, UE) of the Consejería de Educación of the Junta de Castilla y León, Spain (both projects co-financed through European Union FEDER funds), and by the pre-doctoral grant (EDU/1100/2017), also of the Consejería de Educación of the Junta de Castilla y León, Spain and the European Social Fund
    corecore