Search CORE

7 research outputs found

Computationally intensive, distributed and decentralised machine learning: from theory to applications

Author: Fiosina Jelena
Publication venue
Publication date: 29/06/2022
Field of study

Machine learning (ML) is currently one of the most important research fields, spanning computer science, statistics, pattern recognition, data mining, and predictive analytics. It plays a central role in automatic data processing and analysis in numerous research domains owing to widely distributed and geographically scattered data sources, powerful computing clouds, and high digitisation requirements. However, aspects such as the accuracy of methods, data privacy, and model explainability remain challenging and require additional research. Therefore, it is necessary to analyse centralised and distributed data processing architectures, and to create novel computationally intensive explainable and privacy-preserving ML methods, to investigate their properties, to propose distributed versions of prospective ML baseline methods, and to evaluate and apply these in various applications. This thesis addresses the theoretical and practical aspects of state-of-the-art ML methods. The contributions of this thesis are threefold. In Chapter 2, novel non-distributed, centralised, computationally intensive ML methods are proposed, their properties are investigated, and state-of-the-art ML methods are applied to real-world data from two domains, namely transportation and bioinformatics. Moreover, algorithms for ‘black-box’ model interpretability are presented. Decentralised ML methods are considered in Chapter 3. First, we investigate data processing as a preliminary step in data-driven, agent-based decision-making. Thereafter, we propose novel decentralised ML algorithms that are based on the collaboration of the local models of agents. Within this context, we consider various regression models. Finally, the explainability of multiagent decision-making is addressed. In Chapter 4, we investigate distributed centralised ML methods. We propose a distributed parallelisation algorithm for the semi-parametric and non-parametric regression types, and implement these in the computational environment and data structures of Apache SPARK. Scalability, speed-up, and goodness-of-fit experiments using real-world data demonstrate the excellent performance of the proposed methods. Moreover, the federated deep-learning approach enables us to address the data privacy challenges caused by processing of distributed private data sources to solve the travel-time prediction problem. Finally, we propose an explainability strategy to interpret the influence of the input variables on this federated deep-learning application. This thesis is based on the contribution made by 11 papers to the theoretical and practical aspects of state-of-the-art and proposed ML methods. We successfully address the stated challenges with various data processing architectures, validate the proposed approaches in diverse scenarios from the transportation and bioinformatics domains, and demonstrate their effectiveness in scalability, speed-up, and goodness-of-fit experiments with real-world data. However, substantial future research is required to address the stated challenges and to identify novel issues in ML. Thus, it is necessary to advance the theoretical part by creating novel ML methods and investigating their properties, as well as to contribute to the application part by using of the state-of-the-art ML methods and their combinations, and interpreting their results for different problem setting

Publikationsserver der Technischen Universität Clausthal

Distributed Nonparametric and Semiparametric Regression on SPARK for Big Data Forecasting

Author: Jelena Fiosina
Maksims Fiosins
Publication venue: 'Hindawi Limited'
Publication date: 01/01/2017
Field of study

Crossref

AI for Explaining Decisions in Multi-Agent Environments

Author: Azaria Amos
Fiosina Jelena
Greve Maike
Hazon Noam
Kolbe Lutz
Kraus Sarit
Lembcke Tim-Benjamin
Müller Jörg P.
Schleibaum Sören
Vollrath Mark
Publication venue: 'Association for the Advancement of Artificial Intelligence (AAAI)'
Publication date: 12/10/2019
Field of study

Explanation is necessary for humans to understand and accept decisions made by an AI system when the system's goal is known. It is even more important when the AI system makes decisions in multi-agent environments where the human does not know the systems' goals since they may depend on other agents' preferences. In such situations, explanations should aim to increase user satisfaction, taking into account the system's decision, the user's and the other agents' preferences, the environment settings and properties such as fairness, envy and privacy. Generating explanations that will increase user satisfaction is very challenging; to this end, we propose a new research direction: xMASE. We then review the state of the art and discuss research directions towards efficient methodologies and algorithms for generating explanations that will increase users' satisfaction from AI system's decisions in multi-agent environments.Comment: This paper has been submitted to the Blue Sky Track of the AAAI 2020 conference. At the time of submission, it is under review. The tentative notification date will be November 10, 2019. Current version: Name of first author had been added in metadat

arXiv.org e-Print Archive

Association for the Advancement of Artificial Intelligence: AAAI Publications

Distributed Nonparametric and Semiparametric Regression on SPARK for Big Data Forecasting

Author: Jelena Fiosina
Maksims Fiosins
Publication venue: 'Hindawi Limited'
Publication date: 01/01/2017
Field of study

Forecasting in big datasets is a common but complicated task, which cannot be executed using the well-known parametric linear regression. However, nonparametric and semiparametric methods, which enable forecasting by building nonlinear data models, are computationally intensive and lack sufficient scalability to cope with big datasets to extract successful results in a reasonable time. We present distributed parallel versions of some nonparametric and semiparametric regression models. We used MapReduce paradigm and describe the algorithms in terms of SPARK data structures to parallelize the calculations. The forecasting accuracy of the proposed algorithms is compared with the linear regression model, which is the only forecasting model currently having parallel distributed realization within the SPARK framework to address big data problems. The advantages of the parallelization of the algorithm are also provided. We validate our models conducting various numerical experiments: evaluating the goodness of fit, analyzing how increasing dataset size influences time consumption, and analyzing time consumption by varying the degree of parallelism (number of workers) in the distributed realization

Directory of Open Access Journals

Publikationsserver der Technischen Universität Clausthal

Polymer Reaction Engineering meets Explainable Machine Learning

Author: Jelena Fiosina
Marco Drache
Philipp Sievers
Sabine Beuermann
Publication venue
Publication date: 06/04/2023
Field of study

Due to the complicated polymerization technique and statistical composition of the polymer, tailoring its characteristics is a challenging task. Modeling of the polymerizations can contribute to deeper insights into the process. This study applies state-of-the-art machine learning (ML) methods for modeling and reverse engineering of polymerization processes. ML methods (random forest, XGBoost and CatBoost) are trained on data sets generated by an in house developed kinetic Monte Carlo simulator. The applied ML models predict monomer concentration, average molar masses and full molar mass distributions with excellent accuracy (R2 > 0.96). Reverse engineering results delivering the polymerization recipe for a targeted molar mass distribution are less accurate, but still only minor deviations from the targeted molar mass distribution are seen. The influences of the input variables in ML models obtained by explainability methods correspond to the expert expectations

ChemRxiv