8 research outputs found
AI in the newsroom: A data quality assessment framework for employing machine learning in journalistic workflows
[EN] AI-driven journalism refers to various methods and tools for gathering, verifying, producing, and distributing news information. Their potential is to extend human capabilities and create new forms of augmented journalism. Although scholars agreed on the necessity to embed journalistic values in these systems to make AI-driven systems accountable, less attention is paid to data quality, while the results' accuracy and efficiency depend on high-quality data. However, data quality remains complex to define insofar as it is a multidimensional, highly domain-dependent concept. Assessing data quality in the context of AI-driven journalism requires a broader and interdisciplinary approach, relying on the challenges of data quality in machine learning and the ethical challenges of using machine learning in journalism. These considerations ground a conceptual data quality assessment framework that aims to support the collection and pre-processing stages in machine learning. It aims to contribute to strengthening data literacy in journalism and to make a bridge between scientific disciplines that should be viewed through the lenses of their complementarity.Dierickx, L.; Lindén, C.; Opdahl, A.; Khan, S.; Guerrero Rojas, D. (2023). AI in the newsroom: A data quality assessment framework for employing machine learning in journalistic workflows. Editorial Universitat Politècnica de València. 217-225. https://doi.org/10.4995/CARMA2023.2023.1644021722
Towards Risk Modeling for Collaborative AI
Collaborative AI systems aim at working together with humans in a shared
space to achieve a common goal. This setting imposes potentially hazardous
circumstances due to contacts that could harm human beings. Thus, building such
systems with strong assurances of compliance with requirements domain specific
standards and regulations is of greatest importance. Challenges associated with
the achievement of this goal become even more severe when such systems rely on
machine learning components rather than such as top-down rule-based AI. In this
paper, we introduce a risk modeling approach tailored to Collaborative AI
systems. The risk model includes goals, risk events and domain specific
indicators that potentially expose humans to hazards. The risk model is then
leveraged to drive assurance methods that feed in turn the risk model through
insights extracted from run-time evidence. Our envisioned approach is described
by means of a running example in the domain of Industry 4.0, where a robotic
arm endowed with a visual perception component, implemented with machine
learning, collaborates with a human operator for a production-relevant task.Comment: 4 pages, 2 figure
Data Pipeline Quality: Influencing Factors, Root Causes of Data-related Issues, and Processing Problem Areas for Developers
Data pipelines are an integral part of various modern data-driven systems.
However, despite their importance, they are often unreliable and deliver
poor-quality data. A critical step toward improving this situation is a solid
understanding of the aspects contributing to the quality of data pipelines.
Therefore, this article first introduces a taxonomy of 41 factors that
influence the ability of data pipelines to provide quality data. The taxonomy
is based on a multivocal literature review and validated by eight interviews
with experts from the data engineering domain. Data, infrastructure, life cycle
management, development & deployment, and processing were found to be the main
influencing themes. Second, we investigate the root causes of data-related
issues, their location in data pipelines, and the main topics of data pipeline
processing issues for developers by mining GitHub projects and Stack Overflow
posts. We found data-related issues to be primarily caused by incorrect data
types (33%), mainly occurring in the data cleaning stage of pipelines (35%).
Data integration and ingestion tasks were found to be the most asked topics of
developers, accounting for nearly half (47%) of all questions. Compatibility
issues were found to be a separate problem area in addition to issues
corresponding to the usual data pipeline processing areas (i.e., data loading,
ingestion, integration, cleaning, and transformation). These findings suggest
that future research efforts should focus on analyzing compatibility and data
type issues in more depth and assisting developers in data integration and
ingestion tasks. The proposed taxonomy is valuable to practitioners in the
context of quality assurance activities and fosters future research into data
pipeline quality.Comment: To be published by The Journal of Systems & Softwar
Quality Issues in Machine Learning Software Systems
Context: An increasing demand is observed in various domains to employ
Machine Learning (ML) for solving complex problems. ML models are implemented
as software components and deployed in Machine Learning Software Systems
(MLSSs). Problem: There is a strong need for ensuring the serving quality of
MLSSs. False or poor decisions of such systems can lead to malfunction of other
systems, significant financial losses, or even threats to human life. The
quality assurance of MLSSs is considered a challenging task and currently is a
hot research topic. Objective: This paper aims to investigate the
characteristics of real quality issues in MLSSs from the viewpoint of
practitioners. This empirical study aims to identify a catalog of quality
issues in MLSSs. Method: We conduct a set of interviews with
practitioners/experts, to gather insights about their experience and practices
when dealing with quality issues. We validate the identified quality issues via
a survey with ML practitioners. Results: Based on the content of 37 interviews,
we identified 18 recurring quality issues and 24 strategies to mitigate them.
For each identified issue, we describe the causes and consequences according to
the practitioners' experience. Conclusion: We believe the catalog of issues
developed in this study will allow the community to develop efficient quality
assurance tools for ML models and MLSSs. A replication package of our study is
available on our public GitHub repository
Performance assessment and optimisation of a novel guideless irregular dew point cooler using artificial intelligence
Air Conditioners (ACs) are a vital need in modern buildings to provide comfortable indoor air for the occupants. Several alternatives for the traditional coolers are introduced to improve the cooling efficiency but among them, Evaporative Coolers (ECs) absorbed more attention owing to their intelligible structure and high efficiency. ECs are categorized into two types, i.e., Direct Evaporative Coolers (DECs) and Indirect Evaporative Coolers (IECs). Continuous endeavours in the improvement of the ECs resulted in development of Dew Point Coolers (DPCs) which enable the supply air to reach the dew point temperature. The main innovation of DPCs relies on invention of a M-cycle Heat and Mass Exchanger (HMX) which contributes towards improvement of the ECs’ efficiency by up to 30%. A state-of-the-art counter flow DPC in which the flat plates in traditional HMXs are replaced by the corrugated plates is called Guideless Irregular DPC (GIDPC). This technology has 30-60% more cooling efficiency compared to the flat plate HMX in traditional DPCs.Owing to the empirical success of the Artificial Intelligence (AI) in different fields and enhanced importance of Machine Learning (ML) models, this study pioneers in developing two ML models using Multiple Polynomial Regression (MPR), and Deep Neural Network (DNN) methods, and three Multi Objective Evolutionary Optimisation (MOEO) models using Genetic Algorithms (GA), Particle Swarm Optimisation (PSO), and a novel bio-inspired algorithm, i.e., Slime Mould Algorithm (SMA), for the performance prediction and optimisation of the GIDPC in all possible operating climates. Furthermore, this study pioneers in developing an explainable and interpretable DNN model for the GIDPC. To this end, a game theory-based SHapley Additive exPlanations (SHAP) method is used to interpret contribution of the operating conditions on performance parameters.The ML models, take the intake air characteristic as well as main operating and design parameters of the HMX as inputs of the ML models to predict the GIDPC’s performance parameters, e.g., cooling capacity, coefficient of performance (COP), thermal efficiencies. The results revealed that both models have high prediction accuracies where MPR performs with a maximum average error of 1.22%. In addition, the Mean Square Error (MSE) of the selected DNN model is only 0.04. The objectives of the MOEO models are to simultaneously maximise the cooling efficiency and minimise the construction cost of the GIDPC by determining the optimum values of the selected decision variables.The performance of the optimised GIDPCs is compared in a deterministic way in which the comparisons are carried out in diverse climates in 2020 and 2050 in which the hourly future weather data are projected using a high-emission scenario defined by Intergovernmental Panel for Climate Change (IPCC). The results revealed that the hourly COP of the optimised systems outperforms the base design. Moreover, although power consumption of all systems increases from 2020 to 2050, owing to more operating hours as a result of global warming, but power savings of up to 72%, 69.49%, 63.24%, and 69.21% in hot summer continental, arid, tropical rainforest and Mediterranean hot summer climates respectively, can be achieved compared to the base system when the systems run optimally
5th International Conference on Advanced Research Methods and Analytics (CARMA 2023)
Research methods in economics and social sciences are evolving with the increasing availability of Internet and Big Data sources of information. As these sources, methods, and applications become more interdisciplinary, the 5th International Conference on Advanced Research Methods and Analytics (CARMA) is a forum for researchers and practitioners to exchange ideas and advances on how emerging research methods and sources are applied to different fields of social sciences as well as to discuss current and future challenges.MartĂnez Torres, MDR.; Toral MarĂn, S. (2023). 5th International Conference on Advanced Research Methods and Analytics (CARMA 2023). Editorial Universitat Politècnica de València. https://doi.org/10.4995/CARMA2023.2023.1700