4 research outputs found

    Impact of Regressand Stratification in Dataset Shift Caused by Cross-Validation

    Get PDF
    Data that have not been modeled cannot be correctly predicted. Under this assumption, this research studies how k-fold cross-validation can introduce dataset shift in regression problems. This fact implies data distributions in the training and test sets to be different and, therefore, a deterioration of the model performance estimation. Even though the stratification of the output variable is widely used in the field of classification to reduce the impacts of dataset shift induced by cross-validation, its use in regression is not widespread in the literature. This paper analyzes the consequences for dataset shift of including different regressand stratification schemes in cross-validation with regression data. The results obtained show that these allow for creating more similar training and test sets, reducing the presence of dataset shift related to cross-validation. The bias and deviation of the performance estimation results obtained by regression algorithms are improved using the highest amounts of strata, as are the number of cross-validation repetitions necessary to obtain these better results.MCIU/AEI/ERDF, UE PGC2018098860-B-I00ERDF Operational Programme 2014-2020Economy and Knowledge Council of the Regional Government of Andalusia, Spain MCIN/AEI CEX2020-001105-M A-FQM-345-UGR1

    Predictive analytics in agribusiness industries

    Get PDF
    Agriculturally related industries are routinely among the most hazardous work environments. Workplace injuries directly impact labor-market outcomes including income reduction, job loss, and health of the injured workers. In addition to medical and indemnity costs, workplace incidents include indirect costs such as equipment damage and repair, incident investigation time, training new personnel for replacement of the injured ones, an increase in insurance premiums for the year following the incidents, a slowdown of production schedules, damage to companies’ reputation, and lowering the workers’ motivation to return to work. The main purpose of incident analysis is the derivation and development of preventative measures from injury data. Applying proper analytical tools aimed at discovering the causes of occupational incidents is essential to gain useful information that contributes in preventing those incidents in future. Insight gained from the analyses of workers’ compensation data can efficiently direct preventative activities at high-risk industries. Since incidents arise from a combination of factors rather than a single cause, research on occupational incidents must go deeper into identifying the underlying causes and their relationship through applying more comprehensive analyses. Therefore, this study aimed at identifying underlying patterns in occupational injury occurrence and costs using data mining and predictive modeling techniques instead of traditional statistical methods. Utilizing a workers’ compensation claims dataset, the objectives of this study were to: investigate the use of predictive modeling techniques in forecasting future claims costs based on historical data; identify distinctive patterns of high-cost occupational injuries; and examine how well machine learning methods work in finding the predictive relationship between factors influencing occupational injuries and workers’ compensation claims occurrence and severity. The results lead to a better understanding of injury patterns, identification of prevalent causes of occupational injuries, and identification of high-risk industries and occupations. Therefore, various stakeholders such as policymakers, insurance companies, safety standard writers, and manufacturers of safety equipment can use the findings of the study to plan for remedial actions and revise safety standards. The implementation of safety measures by agribusiness organizations can prevent occupational injuries, save lives, and reduce the occurrence and cost of such incidents in agricultural work environments

    Modelo matemático como soporte para la planificación del transporte masivo de pasajeros aplicando una estrategia de cambio de resolución

    Get PDF
    En esta tesis se formula un modelo matemático de optimización para resolver de manera integrada las etapas de diseño de itinerarios y asignación de flota en un sistema de transporte aéreo de pasajeros utilizando una estrategia de cambio de resolución para disminuir el tamaño del problema resultante, en términos de la cantidad de variables de decisión y ecuaciones, así como del tiempo y de la cantidad de iteraciones requeridas para resolverlo. Para reducir el tamaño del modelo de optimización resultante se implementa una estrategia de clusterización de datos utilizando algoritmos de Aprendizaje de Máquina e Inteligencia Artificial. Estos algoritmos permiten agrupar datos en clústers de manera no trivial, de manera que los elementos pertenecientes a cada clúster son homogéneos entre sí, y los clústers contienen elementos heterogéneos entre ellos. Así, un conjunto original de datos pasa a ser reemplazado por los centroides de los clústers encontrados. Se desarrolla un caso de aplicación en el que, usando el modelo de optimización y la estrategia de cambio de resolución propuesta, se resuelven las dos etapas de la planeación mencionadas. Se plantea el modelo con y sin clusterización de datos y se concluye que la estrategia de clusterización, además de disminuir drásticamente el tiempo de resolución del modelo, mejora la calidad de la solución encontrada, ya que se obtiene una combinación de vuelos incluidos en el itinerario operada con un costo menor que el óptimo encontrado sin aplicar la clusterización de datos y con mejor conectividad entre ellos.Abstract: In this thesis, a mathematical optimization model to solve the integrated problem of itinerary design and fleet assignment in a passenger air transportation system is formulated using a change-of-scale strategy to reduce the size of the resulting problem, in terms of the number of decision variables and constraints, as well as the time and number of iterations required to solve it. To reduce the size of the resulting model, a clustering strategy is implemented using Machine Learning and Artificial Intelligence algorithms. Such algorithms allow to group data in clusters, in a non-trivial way, so that the elements belonging to one cluster are similar among them, and the clusters contain dissimilar elements. This way, an original data set is replaced by the centroids of the clusters found. An application case is developed to solve the mentioned integrated problem using the proposed optimization model and change-of-scale strategy. The model is solved with and without data clustering. The data clustering strategy, besides drastically reducing the resolution time of the model, improves the quality of the solution found, due to a higher flexibility to find a combination of flights included in the final itinerary with higher connectivity between them and operated with a lower cost than the optimal found without the data clustering.Maestrí

    Proceedings - 29. Workshop Computational Intelligence, Dortmund, 28. - 29. November 2019

    Get PDF
    Dieser Tagungsband enthält die Beiträge des 29. Workshops Computational Intelligence. Die Schwerpunkte sind Methoden, Anwendungen und Tools für Fuzzy-Systeme, Künstliche Neuronale Netze, Evolutionäre Algorithmen und Data-Mining-Verfahren sowie der Methodenvergleich anhand von industriellen und Benchmark-Problemen
    corecore