Search CORE

8 research outputs found

AI in the newsroom: A data quality assessment framework for employing machine learning in journalistic workflows

Author: Dierickx Laurence
Guerrero Rojas Diana
Khan Sohail
Lindén Carl-Gustav
Opdahl Andreas
Publication venue: Editorial Universitat Politècnica de València
Publication date: 22/09/2023
Field of study

[EN] AI-driven journalism refers to various methods and tools for gathering, verifying, producing, and distributing news information. Their potential is to extend human capabilities and create new forms of augmented journalism. Although scholars agreed on the necessity to embed journalistic values in these systems to make AI-driven systems accountable, less attention is paid to data quality, while the results' accuracy and efficiency depend on high-quality data. However, data quality remains complex to define insofar as it is a multidimensional, highly domain-dependent concept. Assessing data quality in the context of AI-driven journalism requires a broader and interdisciplinary approach, relying on the challenges of data quality in machine learning and the ethical challenges of using machine learning in journalism. These considerations ground a conceptual data quality assessment framework that aims to support the collection and pre-processing stages in machine learning. It aims to contribute to strengthening data literacy in journalism and to make a bridge between scientific disciplines that should be viewed through the lenses of their complementarity.Dierickx, L.; Lindén, C.; Opdahl, A.; Khan, S.; Guerrero Rojas, D. (2023). AI in the newsroom: A data quality assessment framework for employing machine learning in journalistic workflows. Editorial Universitat Politècnica de València. 217-225. https://doi.org/10.4995/CARMA2023.2023.1644021722

RiuNet

Towards Risk Modeling for Collaborative AI

Author: Camilli Matteo
Felderer Michael
Giusti Andrea
Matt Dominik T.
Perini Anna
Russo Barbara
Susi Angelo
Publication venue
Publication date: 01/01/2021
Field of study

Collaborative AI systems aim at working together with humans in a shared space to achieve a common goal. This setting imposes potentially hazardous circumstances due to contacts that could harm human beings. Thus, building such systems with strong assurances of compliance with requirements domain specific standards and regulations is of greatest importance. Challenges associated with the achievement of this goal become even more severe when such systems rely on machine learning components rather than such as top-down rule-based AI. In this paper, we introduce a risk modeling approach tailored to Collaborative AI systems. The risk model includes goals, risk events and domain specific indicators that potentially expose humans to hazards. The risk model is then leveraged to drive assurance methods that feed in turn the risk model through insights extracted from run-time evidence. Our envisioned approach is described by means of a running example in the domain of Industry 4.0, where a robotic arm endowed with a visual perception component, implemented with machine learning, collaborates with a human operator for a production-relevant task.Comment: 4 pages, 2 figure

arXiv.org e-Print Archive

Archivio della ricerca - Fondazione Bruno Kessler

Data Pipeline Quality: Influencing Factors, Root Causes of Data-related Issues, and Processing Problem Areas for Developers

Author: Felderer Michael
Foidl Harald
Golendukhina Valentina
Ramler Rudolf
Publication venue
Publication date: 13/09/2023
Field of study

Data pipelines are an integral part of various modern data-driven systems. However, despite their importance, they are often unreliable and deliver poor-quality data. A critical step toward improving this situation is a solid understanding of the aspects contributing to the quality of data pipelines. Therefore, this article first introduces a taxonomy of 41 factors that influence the ability of data pipelines to provide quality data. The taxonomy is based on a multivocal literature review and validated by eight interviews with experts from the data engineering domain. Data, infrastructure, life cycle management, development & deployment, and processing were found to be the main influencing themes. Second, we investigate the root causes of data-related issues, their location in data pipelines, and the main topics of data pipeline processing issues for developers by mining GitHub projects and Stack Overflow posts. We found data-related issues to be primarily caused by incorrect data types (33%), mainly occurring in the data cleaning stage of pipelines (35%). Data integration and ingestion tasks were found to be the most asked topics of developers, accounting for nearly half (47%) of all questions. Compatibility issues were found to be a separate problem area in addition to issues corresponding to the usual data pipeline processing areas (i.e., data loading, ingestion, integration, cleaning, and transformation). These findings suggest that future research efforts should focus on analyzing compatibility and data type issues in more depth and assisting developers in data integration and ingestion tasks. The proposed taxonomy is valuable to practitioners in the context of quality assurance activities and fosters future research into data pipeline quality.Comment: To be published by The Journal of Systems & Softwar

arXiv.org e-Print Archive

Quality Issues in Machine Learning Software Systems

Author: Abidi Mouna
Basta Ilan
Bouchoucha Rached
Côté Pierre-Olivier
Khomh Foutse
Nikanjam Amin
Publication venue
Publication date: 26/06/2023
Field of study

Context: An increasing demand is observed in various domains to employ Machine Learning (ML) for solving complex problems. ML models are implemented as software components and deployed in Machine Learning Software Systems (MLSSs). Problem: There is a strong need for ensuring the serving quality of MLSSs. False or poor decisions of such systems can lead to malfunction of other systems, significant financial losses, or even threats to human life. The quality assurance of MLSSs is considered a challenging task and currently is a hot research topic. Objective: This paper aims to investigate the characteristics of real quality issues in MLSSs from the viewpoint of practitioners. This empirical study aims to identify a catalog of quality issues in MLSSs. Method: We conduct a set of interviews with practitioners/experts, to gather insights about their experience and practices when dealing with quality issues. We validate the identified quality issues via a survey with ML practitioners. Results: Based on the content of 37 interviews, we identified 18 recurring quality issues and 24 strategies to mitigate them. For each identified issue, we describe the causes and consequences according to the practitioners' experience. Conclusion: We believe the catalog of issues developed in this study will allow the community to develop efficient quality assurance tools for ML models and MLSSs. A replication package of our study is available on our public GitHub repository

arXiv.org e-Print Archive

Performance assessment and optimisation of a novel guideless irregular dew point cooler using artificial intelligence

Author: Akhlagi Yousef Golizadeh
Publication venue
Publication date: 01/11/2020
Field of study

Air Conditioners (ACs) are a vital need in modern buildings to provide comfortable indoor air for the occupants. Several alternatives for the traditional coolers are introduced to improve the cooling efficiency but among them, Evaporative Coolers (ECs) absorbed more attention owing to their intelligible structure and high efficiency. ECs are categorized into two types, i.e., Direct Evaporative Coolers (DECs) and Indirect Evaporative Coolers (IECs). Continuous endeavours in the improvement of the ECs resulted in development of Dew Point Coolers (DPCs) which enable the supply air to reach the dew point temperature. The main innovation of DPCs relies on invention of a M-cycle Heat and Mass Exchanger (HMX) which contributes towards improvement of the ECs’ efficiency by up to 30%. A state-of-the-art counter flow DPC in which the flat plates in traditional HMXs are replaced by the corrugated plates is called Guideless Irregular DPC (GIDPC). This technology has 30-60% more cooling efficiency compared to the flat plate HMX in traditional DPCs.Owing to the empirical success of the Artificial Intelligence (AI) in different fields and enhanced importance of Machine Learning (ML) models, this study pioneers in developing two ML models using Multiple Polynomial Regression (MPR), and Deep Neural Network (DNN) methods, and three Multi Objective Evolutionary Optimisation (MOEO) models using Genetic Algorithms (GA), Particle Swarm Optimisation (PSO), and a novel bio-inspired algorithm, i.e., Slime Mould Algorithm (SMA), for the performance prediction and optimisation of the GIDPC in all possible operating climates. Furthermore, this study pioneers in developing an explainable and interpretable DNN model for the GIDPC. To this end, a game theory-based SHapley Additive exPlanations (SHAP) method is used to interpret contribution of the operating conditions on performance parameters.The ML models, take the intake air characteristic as well as main operating and design parameters of the HMX as inputs of the ML models to predict the GIDPC’s performance parameters, e.g., cooling capacity, coefficient of performance (COP), thermal efficiencies. The results revealed that both models have high prediction accuracies where MPR performs with a maximum average error of 1.22%. In addition, the Mean Square Error (MSE) of the selected DNN model is only 0.04. The objectives of the MOEO models are to simultaneously maximise the cooling efficiency and minimise the construction cost of the GIDPC by determining the optimum values of the selected decision variables.The performance of the optimised GIDPCs is compared in a deterministic way in which the comparisons are carried out in diverse climates in 2020 and 2050 in which the hourly future weather data are projected using a high-emission scenario defined by Intergovernmental Panel for Climate Change (IPCC). The results revealed that the hourly COP of the optimised systems outperforms the base design. Moreover, although power consumption of all systems increases from 2020 to 2050, owing to more operating hours as a result of global warming, but power savings of up to 72%, 69.49%, 63.24%, and 69.21% in hot summer continental, arid, tropical rainforest and Mediterranean hot summer climates respectively, can be achieved compared to the base system when the systems run optimally

Repository@Hull - Worktribe

5th International Conference on Advanced Research Methods and Analytics (CARMA 2023)

Author: Martínez Torres María del Rocío
Toral Marín Sergio
Publication venue: Editorial Universitat Politècnica de València
Publication date: 27/09/2023
Field of study

Research methods in economics and social sciences are evolving with the increasing availability of Internet and Big Data sources of information. As these sources, methods, and applications become more interdisciplinary, the 5th International Conference on Advanced Research Methods and Analytics (CARMA) is a forum for researchers and practitioners to exchange ideas and advances on how emerging research methods and sources are applied to different fields of social sciences as well as to discuss current and future challenges.Martínez Torres, MDR.; Toral Marín, S. (2023). 5th International Conference on Advanced Research Methods and Analytics (CARMA 2023). Editorial Universitat Politècnica de València. https://doi.org/10.4995/CARMA2023.2023.1700

RiuNet