1,946 research outputs found
On the Generation of Realistic and Robust Counterfactual Explanations for Algorithmic Recourse
This recent widespread deployment of machine learning algorithms presents many new challenges. Machine learning algorithms are usually opaque and can be particularly difficult to interpret. When humans are involved, algorithmic and automated decisions can negatively impact people’s lives. Therefore, end users would like to be insured against potential harm. One popular way to achieve this is to provide end users access to algorithmic recourse, which gives end users negatively affected by algorithmic decisions the opportunity to reverse unfavorable decisions, e.g., from a loan denial to a loan acceptance. In this thesis, we design recourse algorithms to meet various end user needs. First, we propose methods for the generation of realistic recourses. We use generative models to suggest recourses likely to occur under the data distribution. To this end, we shift the recourse action from the input space to the generative model’s latent space, allowing to generate counterfactuals that lie in regions with data support. Second, we observe that small changes applied to the recourses prescribed to end users likely invalidate the suggested recourse after being nosily implemented in practice. Motivated by this observation, we design methods for the generation of robust recourses and for assessing the robustness of recourse algorithms to data deletion requests. Third, the lack of a commonly used code-base for counterfactual explanation and algorithmic recourse algorithms and the vast array of evaluation measures in literature make it difficult to compare the per formance of different algorithms. To solve this problem, we provide an open source benchmarking library that streamlines the evaluation process and can be used for benchmarking, rapidly developing new methods, and setting up new
experiments. In summary, our work contributes to a more reliable interaction of end users and machine learned models by covering fundamental aspects of the recourse process and suggests new solutions towards generating realistic and robust counterfactual explanations for algorithmic recourse
Fairness-aware Machine Learning in Educational Data Mining
Fairness is an essential requirement of every educational system, which is reflected in a variety of educational activities. With the extensive use of Artificial Intelligence (AI) and Machine Learning (ML) techniques in education, researchers and educators can analyze educational (big) data and propose new (technical) methods in order to support teachers, students, or administrators of (online) learning systems in the organization of teaching and learning. Educational data mining (EDM) is the result of the application and development of data mining (DM), and ML techniques to deal with educational problems, such as student performance prediction and student grouping. However, ML-based decisions in education can be based on protected attributes, such as race or gender, leading to discrimination of individual students or subgroups of students. Therefore, ensuring fairness in ML models also contributes to equity in educational systems. On the other hand, bias can also appear in the data obtained from learning environments. Hence, bias-aware exploratory educational data analysis is important to support unbiased decision-making in EDM.
In this thesis, we address the aforementioned issues and propose methods that mitigate discriminatory outcomes of ML algorithms in EDM tasks. Specifically, we make the following contributions:
We perform bias-aware exploratory analysis of educational datasets using Bayesian networks to identify the relationships among attributes in order to understand bias in the datasets. We focus the exploratory data analysis on features having a direct or indirect relationship with the protected attributes w.r.t. prediction outcomes.
We perform a comprehensive evaluation of the sufficiency of various group fairness measures in predictive models for student performance prediction problems. A variety of experiments on various educational datasets with different fairness measures are performed to provide users with a broad view of unfairness from diverse aspects.
We deal with the student grouping problem in collaborative learning. We introduce the fair-capacitated clustering problem that takes into account cluster fairness and cluster cardinalities. We propose two approaches, namely hierarchical clustering and partitioning-based clustering, to obtain fair-capacitated clustering.
We introduce the multi-fair capacitated (MFC) students-topics grouping problem that satisfies students' preferences while ensuring balanced group cardinalities and maximizing the diversity of members regarding the protected attribute. We propose three approaches: a greedy heuristic approach, a knapsack-based approach using vanilla maximal 0-1 knapsack formulation, and an MFC knapsack approach based on group fairness knapsack formulation.
In short, the findings described in this thesis demonstrate the importance of fairness-aware ML in educational settings. We show that bias-aware data analysis, fairness measures, and fairness-aware ML models are essential aspects to ensure fairness in EDM and the educational environment.Ministry of Science and Culture of Lower Saxony/LernMINT/51410078/E
LIPIcs, Volume 251, ITCS 2023, Complete Volume
LIPIcs, Volume 251, ITCS 2023, Complete Volum
Models and algorithms for real-world optimization problems
This thesis deals with efficient solution of optimization problems of practical interest.
The first part of the thesis deals with bin packing problems. The bin packing problem (BPP) is one of the oldest and most fundamental combinatorial optimiza- tion problems.
The bin packing problem and its generalizations arise often in real-world ap- plications, from manufacturing industry, logistics and transportation of goods, and scheduling.
After an introductory chapter, I will present two applications of two of the most natural extensions of the bin packing: Chapter 2 will be dedicated to an application of bin packing in two dimension to a problem of scheduling a set of computational tasks on a computer cluster, while Chapter 3 deals with the generalization of BPP in three dimensions that arise frequently in logistic and transportation, often com- plemented with additional constraints on the placement of items and characteristics of the solution, like, for example, guarantees on the stability of the items, to avoid potential damage to the transported goods, on the distribution of the total weight of the bins, and on compatibility with loading and unloading operations.
The second part of the thesis, and in particular Chapter 4 considers the Trans- mission Expansion Problem (TEP), where an electrical transmission grid must be expanded so as to satisfy future energy demand at the minimum cost, while main- taining some guarantees of robustness to potential line failures. These problems are gaining importance in a world where a shift towards renewable energy can impose a significant geographical reallocation of generation capacities, resulting in the ne- cessity of expanding current power transmission grids
Adjustable robust optimization with nonlinear recourses
Over the last century, mathematical optimization has become a prominent tool for decision making. Its systematic application in practical fields such as economics, logistics or defense led to the development of algorithmic methods with ever increasing efficiency. Indeed, for a variety of real-world problems, finding an optimal decision among a set of (implicitly or explicitly) predefined alternatives has become conceivable in reasonable time. In the last decades, however, the research community raised more and more attention to the role of uncertainty in the optimization process. In particular, one may question the notion of optimality, and even feasibility, when studying decision problems with unknown or imprecise input parameters. This concern is even more critical in a world becoming more and more complex —by which we intend, interconnected —where each individual variation inside a system inevitably causes other variations in the system itself.
In this dissertation, we study a class of optimization problems which suffer from imprecise input data and feature a two-stage decision process, i.e., where decisions are made in a sequential order —called stages —and where unknown parameters are revealed throughout the stages. The applications of such problems are plethora in practical fields such as, e.g., facility location problems with uncertain demands, transportation problems with uncertain costs or scheduling under uncertain processing times. The uncertainty is dealt with a robust optimization (RO) viewpoint (also known as "worst-case perspective") and we present original contributions to the RO literature on both the theoretical and practical side
Matheuristics:survey and synthesis
In integer programming and combinatorial optimisation, people use the term matheuristics to refer to methods that are heuristic in nature, but draw on concepts from the literature on exact methods. We survey the literature on this topic, with a particular emphasis on matheuristics that yield both primal and dual bounds (i.e., upper and lower bounds in the case of a minimisation problem). We also make some comments about possible future developments
Towards the reduction of greenhouse gas emissions : models and algorithms for ridesharing and carbon capture and storage
Avec la ratification de l'Accord de Paris, les pays se sont engagés à limiter le réchauffement climatique bien en dessous de 2, de préférence à 1,5 degrés Celsius, par rapport aux niveaux préindustriels. À cette fin, les émissions anthropiques de gaz à effet de serre (GES, tels que CO2) doivent être réduites pour atteindre des émissions nettes de carbone nulles d'ici 2050. Cet objectif ambitieux peut être atteint grâce à différentes stratégies d'atténuation des GES, telles que l'électrification, les changements de comportement des consommateurs, l'amélioration de l'efficacité énergétique des procédés, l'utilisation de substituts aux combustibles fossiles (tels que la bioénergie ou l'hydrogène), le captage et le stockage du carbone (CSC), entre autres. Cette thèse vise à contribuer à deux de ces stratégies : le covoiturage (qui appartient à la catégorie des changements de comportement du consommateur) et la capture et le stockage du carbone. Cette thèse fournit des modèles mathématiques et d'optimisation et des algorithmes pour la planification opérationnelle et tactique des systèmes de covoiturage, et des heuristiques pour la planification stratégique d'un réseau de captage et de stockage du carbone.
Dans le covoiturage, les émissions sont réduites lorsque les individus voyagent ensemble au lieu de conduire seuls. Dans ce contexte, cette thèse fournit de nouveaux modèles mathématiques pour représenter les systèmes de covoiturage, allant des problèmes d'affectation stochastique à deux étapes aux problèmes d'empaquetage d'ensembles stochastiques à deux étapes qui peuvent représenter un large éventail de systèmes de covoiturage. Ces modèles aident les décideurs dans leur planification opérationnelle des covoiturages, où les conducteurs et les passagers doivent être jumelés pour le covoiturage à court terme. De plus, cette thèse explore la planification tactique des systèmes de covoiturage en comparant différents modes de fonctionnement du covoiturage et les paramètres de la plateforme (par exemple, le partage des revenus et les pénalités). De nouvelles caractéristiques de problèmes sont étudiées, telles que l'incertitude du conducteur et du passager, la flexibilité de réappariement et la réservation de l'offre de conducteur via les frais de réservation et les pénalités. En particulier, la flexibilité de réappariement peut augmenter l'efficacité d'une plateforme de covoiturage, et la réservation de l'offre de conducteurs via les frais de réservation et les pénalités peut augmenter la satisfaction des utilisateurs grâce à une compensation garantie si un covoiturage n'est pas fourni. Des expériences computationnelles détaillées sont menées et des informations managériales sont fournies.
Malgré la possibilité de réduction des émissions grâce au covoiturage et à d'autres stratégies d'atténuation, des études macroéconomiques mondiales montrent que même si plusieurs stratégies d'atténuation des GES sont utilisées simultanément, il ne sera probablement pas possible d'atteindre des émissions nettes nulles d'ici 2050 sans le CSC. Ici, le CO2 est capturé à partir des sites émetteurs et transporté vers des réservoirs géologiques, où il est injecté pour un stockage à long terme. Cette thèse considère un problème de planification stratégique multipériode pour l'optimisation d'une chaîne de valeur CSC. Ce problème est un problème combiné de localisation des installations et de conception du réseau où une infrastructure CSC est prévue pour les prochaines décennies. En raison des défis informatiques associés à ce problème, une heuristique est introduite, qui est capable de trouver de meilleures solutions qu'un solveur commercial de programmation mathématique, pour une fraction du temps de calcul. Cette heuristique comporte des phases d'intensification et de diversification, une génération améliorée de solutions réalisables par programmation dynamique, et une étape finale de raffinement basée sur un modèle restreint. Dans l'ensemble, les contributions de cette thèse sur le covoiturage et le CSC fournissent des modèles de programmation mathématique, des algorithmes et des informations managériales qui peuvent aider les praticiens et les parties prenantes à planifier des émissions nettes nulles.With the ratification of the Paris Agreement, countries committed to limiting global warming to well below 2, preferably to 1.5 degrees Celsius, compared to pre-industrial levels. To this end, anthropogenic greenhouse gas (GHG) emissions (such as CO2) must be reduced to reach net-zero carbon emissions by 2050. This ambitious target may be met by means of different GHG mitigation strategies, such as electrification, changes in consumer behavior, improving the energy efficiency of processes, using substitutes for fossil fuels (such as bioenergy or hydrogen), and carbon capture and storage (CCS). This thesis aims at contributing to two of these strategies: ridesharing (which belongs to the category of changes in consumer behavior) and carbon capture and storage. This thesis provides mathematical and optimization models and algorithms for the operational and tactical planning of ridesharing systems, and heuristics for the strategic planning of a carbon capture and storage network.
In ridesharing, emissions are reduced when individuals travel together instead of driving alone. In this context, this thesis provides novel mathematical models to represent ridesharing systems, ranging from two-stage stochastic assignment problems to two-stage stochastic set packing problems that can represent a wide variety of ridesharing systems. These models aid decision makers in their operational planning of rideshares, where drivers and riders have to be matched for ridesharing on the short-term. Additionally, this thesis explores the tactical planning of ridesharing systems by comparing different modes of ridesharing operation and platform parameters (e.g., revenue share and penalties). Novel problem characteristics are studied, such as driver and rider uncertainty, rematching flexibility, and reservation of driver supply through booking fees and penalties. In particular, rematching flexibility may increase the efficiency of a ridesharing platform, and the reservation of driver supply through booking fees and penalties may increase user satisfaction through guaranteed compensation if a rideshare is not provided. Extensive computational experiments are conducted and managerial insights are given.
Despite the opportunity to reduce emissions through ridesharing and other mitigation strategies, global macroeconomic studies show that even if several GHG mitigation strategies are used simultaneously, achieving net-zero emissions by 2050 will likely not be possible without CCS. Here, CO2 is captured from emitter sites and transported to geological reservoirs, where it is injected for long-term storage. This thesis considers a multiperiod strategic planning problem for the optimization of a CCS value chain. This problem is a combined facility location and network design problem where a CCS infrastructure is planned for the next decades. Due to the computational challenges associated with that problem, a slope scaling heuristic is introduced, which is capable of finding better solutions than a state-of-the-art general-purpose mathematical programming solver, at a fraction of the computational time. This heuristic has intensification and diversification phases, improved generation of feasible solutions through dynamic programming, and a final refining step based on a restricted model. Overall, the contributions of this thesis on ridesharing and CCS provide mathematical programming models, algorithms, and managerial insights that may help practitioners and stakeholders plan for net-zero emissions
Gaps and requirements for applying automatic architectural design to building renovation
The renovation of existing buildings provides an opportunity to change the layout to meet the needs of facilities and accomplish sustainability in the built environment at high utilisation rates and low cost. However, building renovation design is complex, and completing architectural design schemes manually needs more efficiency and overall robustness. With the use of computational optimisation, automatic architectural design (AAD) can efficiently assist in building renovation through decision-making based on performance evaluation. This paper comprehensively analyses AAD's current research status and provides a state-of-the-art overview of applying AAD technology to building renovation. Besides, gaps and requirements of using AAD for building renovation are explored from quantitative and qualitative aspects, providing ideas for future research. The research shows that there is still much work to be done to apply AAD to building renovation, including quickly obtaining input data, expanding optimisation topics, selecting design methods, and improving workflow and efficiency
Self-adjusting Population Sizes for Non-elitist Evolutionary Algorithms:Why Success Rates Matter
Evolutionary algorithms (EAs) are general-purpose optimisers that come with several
parameters like the sizes of parent and offspring populations or the mutation rate. It is
well known that the performance of EAs may depend drastically on these parameters.
Recent theoretical studies have shown that self-adjusting parameter control mechanisms that tune parameters during the algorithm run can provably outperform the best
static parameters in EAs on discrete problems. However, the majority of these studies
concerned elitist EAs and we do not have a clear answer on whether the same mechanisms can be applied for non-elitist EAs. We study one of the best-known parameter
control mechanisms, the one-fifth success rule, to control the offspring population
size λ in the non-elitist (1, λ) EA. It is known that the (1, λ) EA has a sharp threshold
with respect to the choice of λ where the expected runtime on the benchmark function OneMax changes from polynomial to exponential time. Hence, it is not clear
whether parameter control mechanisms are able to find and maintain suitable values
of λ. For OneMax we show that the answer crucially depends on the success rate s
(i. e. a one-(s + 1)-th success rule). We prove that, if the success rate is appropriately
small, the self-adjusting (1, λ) EA optimises OneMax in O(n) expected generations
and O(n log n) expected evaluations, the best possible runtime for any unary unbiased
black-box algorithm. A small success rate is crucial: we also show that if the success
rate is too large, the algorithm has an exponential runtime on OneMax and other
functions with similar characteristics
On a Vehicle Routing Problem with Customer Costs and Multi Depots
The Vehicle Routing Problem with Customer Costs (short VRPCC) was developed for railway maintenance scheduling. In detail, corrective maintenance jobs for unexpected occurring failures are planned to a short time horizon. These jobs are geographically distributed in the railway net. Furthermore, dependent on the severity of the failure, it can be necessary to reduce the top speed on the track section in order to avoid safety risks or a too fast deterioration. For fatal failures, it can even be necessary to close the track section. The resulting limitations on railway service lead to penalty costs for the maintenance operator. These must be paid until the track is repaired and the restrictions are removed. By scheduling the maintenance tasks, these penalty costs can be reduced by proceeding corresponding maintenance tasks earlier. However, this may in return lead to increased costs for moving the maintenance machines and crews.
For this scheduling problem, the VRPCC was developed. With it, for each maintenance vehicle and crew, a route is defined that describes the order to proceed maintenance tasks. Two kinds of costs are considered: Firstly, travel costs for machinery and crew; and secondly, penalty costs for an unsafe track condition that have to be paid for each day from failure detection to maintenance completion. To model the penalties, the novel customer costs are defined. In detail, for each maintenance activity a customer cost coefficient is given which incur for each day between failure detection and failure repair. The objective function of this problem is defined by the sum of travel costs and time-dependent customer costs. With it, the priority of customers can be taken into account without losing the sight on travel costs.
This new vehicle routing problem was introduced in this thesis by a non-linear partition and permutation model. In this model, a feasible solution is defined by a partition of the job set into subsets that represent the allocation of jobs to vehicles and a permutation for each subset that represent the order of processing the jobs. Then, the start times of the jobs were calculated based on the order given by the permutations. It was taken into account that work can only be done in eight hour shifts during the night. Based on the start times, the customer cost value of each job is computed which equals to the paid penalty costs. Then, the costs of a schedule are calculated via the sum of travel costs and customer costs.
To solve the VRPCC by a commercial linear programming solver, different formulations of the VRPCC as mixed-integer linear program were developed. In doing so, the start times became decision variables. It turned out that including customer costs led to problems harder to solve than vehicle routing problems where only travel costs are minimized.
Further, in the thesis several construction heuristics for the VRPCC were designed and investigated. Also two local search algorithms, first and best improvement, were applied. The computational experiments showed that the solutions generated by the local search algorithm were much better than the solutions of the construction heuristics.
The main part of this thesis was to design a Branch-and-Bound algorithm for the VRPCC. For this purpose, new lower bounds for the customer cost part of the objective function were formulated. The computational experiments showed that a lower bound computed from the LP relaxation of a specific bin packing problem had the best trade-off between computational effort and bound quality. For the travel cost part of the objective function, several known lower bounds from the TSP were compared.
To design a Branch-and-Bound algorithm, beside efficient lower bound, also suitable branching strategies are necessary to split the problem space into smaller subspaces. In this thesis two branching strategies were developed which are based on the non-linear partition and permutation model to take advantage from the problem structure. To be more precise, new branches are generated by appending or including a job to an uncompleted schedule. Consequently, the start times can be computed directly from the so far planned jobs and more tight lower bounds can be computed for the so far unplanned jobs.
By means of computational experiments, the developed Branch-and-Bound algorithms were compared with the classical approach, which means solving a mixed-integer linear program of the VRPCC by a commercial solver. The results showed that both Branch-and-Bound algorithms solved the small instances faster than the classical approach
- …