2 research outputs found

    An alternative approach to dimension reduction for pareto distributed data: a case study

    Get PDF
    Deep learning models are tools for data analysis suitable for approximating (non-linear) relationships among variables for the best prediction of an outcome. While these models can be used to answer many important questions, their utility is still harshly criticized, being extremely challenging to identify which data descriptors are the most adequate to represent a given specific phenomenon of interest. With a recent experience in the development of a deep learning model designed to detect failures in mechanical water meter devices, we have learnt that a sensible deterioration of the prediction accuracy can occur if one tries to train a deep learning model by adding specific device descriptors, based on categorical data. This can happen because of an excessive increase in the dimensions of the data, with a correspondent loss of statistical significance. After several unsuccessful experiments conducted with alternative methodologies that either permit to reduce the data space dimensionality or employ more traditional machine learning algorithms, we changed the training strategy, reconsidering that categorical data, in the light of a Pareto analysis. In essence, we used those categorical descriptors, not as an input on which to train our deep learning model, but as a tool to give a new shape to the dataset, based on the Pareto rule. With this data adjustment, we trained a more performative deep learning model able to detect defective water meter devices with a prediction accuracy in the range 87-90%, even in the presence of categorical descriptors

    On the implications of big data and machine learning in the interplay between humans and machines

    Get PDF
    Big data and machine learning are profoundly shaping social, economic, and political spheres, becoming part of the collective imagination. In recent years, barriers have fallen and a wide range of products, services, and resources, that exploit Artificial Intelligence, have emerged. Hence, it becomes of fundamental importance to understand the limits and, consequently, the potentialities of predictions made by a machine that learns directly from data. Understanding the limits of machine predictions would allow dispelling false beliefs about the potentialities of machine learning algorithms, avoiding at the same time possible misuses. To tackle this problem, completely different research lines are emerging, that focus on different aspects. In this thesis, we study how the presence of big data and artificial intelligence influences the interaction between humans and computers. Such a study should produce some high-level reflections that can contribute to the framing of how the interaction between humans and computers has changed, since the presence of big data and algorithms that can make computers somehow intelligent, albeit with some limitations. In the different chapters of the thesis, various case studies that we faced during the Ph.D. are described, chosen specifically for their peculiar characteristics. Starting from the obtained results, we provide several high-level reflections on the implications of the interaction between humans and machines
    corecore