5 research outputs found

    Rigging Nearly Acyclic Tournaments Is Fixed-Parameter Tractable

    No full text
    Single-elimination tournaments (or knockout tournaments) are a popular format in sports competitions that is also widely used for decision making and elections. In this paper we study the algorithmic problem of manipulating the outcome of a tournament. More specifically, we study the problem of finding a seeding of the players such that a certain player wins the resulting tournament. The problem is known to be NP-hard in general. In this paper we present an algorithm for this problem that exploits structural restrictions on the tournament. More specifically, we establish that the problem is fixed-parameter tractable when parameterized by the size of a smallest feedback arc set of the tournament (interpreting the tournament as an oriented complete graph). This is a natural parameter because most problems on tournaments (including this one) are either trivial or easily solvable on acyclic tournaments, leading to the question — what about nearly acyclic tournaments or tournaments with a small feedback arc set? Our result significantly improves upon a recent algorithm by Aziz et al. (2014) whose running time is bounded by an exponential function where the size of a smallest feedback arc set appears in the exponent and the base is the number of players

    Doing Transparent and Reproducible Quantitative Sociology

    Get PDF
    The ongoing replication crisis (Baker 2016; Gelman and Loken 2016; Freese and Peterson 2017; Wiggins and Christopherson 2019; Bird 2020; Colling and Szűcs 2021) has laid bare quantitative sociology’s need for better standards of transparency and reproducibility in all published research. This dissertation’s core contribution is the proposal and articulation of a ‘foundational cycle’ of three interrelated methodological practices: Causal Inference (Rubin 1974; Pearl 2009b, 2009a; Pearl, Glymour, and Jewell 2016), Principled Data Processing (Ball 2016a, 2016b; Barrett 2022), and Bayesian Inference (Bayes 1763; Jaynes 2003). By enshrining the principles and practices of the ‘foundational cycle,’ researchers ensure that transparency and reproducibility is woven into each critical juncture of the research project – this permits other researchers to comprehend and reconstruct all aspects contributing to the published findings. The first of the dissertation’s four substantive chapters contributes an account of the development of causal inference with a particular focus on the role the ‘graphical’ paradigm played in motivating the development (and, later, dismantling) of causal methods in quantitative sociology. It also provides a brief description of Judea Pearl’s theory of inferred causation (Pearl 2009b) and argues that quantitative sociology should adopt it as the baseline model for the purposes of causal transparency and reproducibility. The second substantive chapter – which builds directly on that of the first – addresses a gap in extant sociological literature about the prevalence of explicit causal methodology in the field. This chapter contributes a review of the causal methods employed in the quantitative articles published in ‘top’ sociological journals in the year 2022 (see: Jacobs 2016). The review, which examined 283 quantitative sociological articles (out of a total 574 in the review’s corpus), found that – as judged by the criteria articulated in Pearl (2009b) – only 5 among them were ‘causally adequate.’ The third substantive chapter’s contribution takes the form of a software-based implementation of Patrick Ball’s Principled Data Processing framework (Ball 2016b, 2016a), which is designed to permit the development and maintenance of transparent, reproducible data processing pipelines, even in the context of large, distributed, collaborative, and technically-complex research efforts. The software package, titled pdpp (Browne et al. 2021), is an accessibility-oriented iteration on Ball’s original framework. The fourth and final substantive chapter makes two contributions: the first is the development and articulation of an ‘ameliorative’ class of argumentation designed to address gaps in Gelman’s typology of arguments in favour of Bayesian inference (Gelman 2008) – extant modes of argumentation focus on ‘winning’ the debate between Frequentism and Bayesianism, whereas ‘ameliorative’ arguments seek to address Frequentists’ concerns and trepidations about the Bayesian paradigm or the transition thereto. The second contribution is an ameliorative argument reified as a software package titled pyKrusch (Browne 2021), which automates the creation of – and builds upon the functionality of – John Kruschke’s Bayesian dependency structure diagrams (Kruschke 2014)

    Democratizing machine learning

    Get PDF
    Modelle des maschinellen Lernens sind zunehmend in der Gesellschaft verankert, oft in Form von automatisierten Entscheidungsprozessen. Ein wesentlicher Grund dafür ist die verbesserte Zugänglichkeit von Daten, aber auch von Toolkits für maschinelles Lernen, die den Zugang zu Methoden des maschinellen Lernens für Nicht-Experten ermöglichen. Diese Arbeit umfasst mehrere Beiträge zur Demokratisierung des Zugangs zum maschinellem Lernen, mit dem Ziel, einem breiterem Publikum Zugang zu diesen Technologien zu er- möglichen. Die Beiträge in diesem Manuskript stammen aus mehreren Bereichen innerhalb dieses weiten Gebiets. Ein großer Teil ist dem Bereich des automatisierten maschinellen Lernens (AutoML) und der Hyperparameter-Optimierung gewidmet, mit dem Ziel, die oft mühsame Aufgabe, ein optimales Vorhersagemodell für einen gegebenen Datensatz zu finden, zu vereinfachen. Dieser Prozess besteht meist darin ein für vom Benutzer vorgegebene Leistungsmetrik(en) optimales Modell zu finden. Oft kann dieser Prozess durch Lernen aus vorhergehenden Experimenten verbessert oder beschleunigt werden. In dieser Arbeit werden drei solcher Methoden vorgestellt, die entweder darauf abzielen, eine feste Menge möglicher Hyperparameterkonfigurationen zu erhalten, die wahrscheinlich gute Lösungen für jeden neuen Datensatz enthalten, oder Eigenschaften der Datensätze zu nutzen, um neue Konfigurationen vorzuschlagen. Darüber hinaus wird eine Sammlung solcher erforderlichen Metadaten zu den Experimenten vorgestellt, und es wird gezeigt, wie solche Metadaten für die Entwicklung und als Testumgebung für neue Hyperparameter- Optimierungsmethoden verwendet werden können. Die weite Verbreitung von ML-Modellen in vielen Bereichen der Gesellschaft erfordert gleichzeitig eine genauere Untersuchung der Art und Weise, wie aus Modellen abgeleitete automatisierte Entscheidungen die Gesellschaft formen, und ob sie möglicherweise Individuen oder einzelne Bevölkerungsgruppen benachteiligen. In dieser Arbeit wird daher ein AutoML-Tool vorgestellt, das es ermöglicht, solche Überlegungen in die Suche nach einem optimalen Modell miteinzubeziehen. Diese Forderung nach Fairness wirft gleichzeitig die Frage auf, ob die Fairness eines Modells zuverlässig geschätzt werden kann, was in einem weiteren Beitrag in dieser Arbeit untersucht wird. Da der Zugang zu Methoden des maschinellen Lernens auch stark vom Zugang zu Software und Toolboxen abhängt, sind mehrere Beiträge in Form von Software Teil dieser Arbeit. Das R-Paket mlr3pipelines ermöglicht die Einbettung von Modellen in sogenan- nte Machine Learning Pipelines, die Vor- und Nachverarbeitungsschritte enthalten, die im maschinellen Lernen und AutoML häufig benötigt werden. Das mlr3fairness R-Paket hingegen ermöglicht es dem Benutzer, Modelle auf potentielle Benachteiligung hin zu über- prüfen und diese durch verschiedene Techniken zu reduzieren. Eine dieser Techniken, multi-calibration wurde darüberhinaus als seperate Software veröffentlicht.Machine learning artifacts are increasingly embedded in society, often in the form of automated decision-making processes. One major reason for this, along with methodological improvements, is the increasing accessibility of data but also machine learning toolkits that enable access to machine learning methodology for non-experts. The core focus of this thesis is exactly this – democratizing access to machine learning in order to enable a wider audience to benefit from its potential. Contributions in this manuscript stem from several different areas within this broader area. A major section is dedicated to the field of automated machine learning (AutoML) with the goal to abstract away the tedious task of obtaining an optimal predictive model for a given dataset. This process mostly consists of finding said optimal model, often through hyperparameter optimization, while the user in turn only selects the appropriate performance metric(s) and validates the resulting models. This process can be improved or sped up by learning from previous experiments. Three such methods one with the goal to obtain a fixed set of possible hyperparameter configurations that likely contain good solutions for any new dataset and two using dataset characteristics to propose new configurations are presented in this thesis. It furthermore presents a collection of required experiment metadata and how such meta-data can be used for the development and as a test bed for new hyperparameter optimization methods. The pervasion of models derived from ML in many aspects of society simultaneously calls for increased scrutiny with respect to how such models shape society and the eventual biases they exhibit. Therefore, this thesis presents an AutoML tool that allows incorporating fairness considerations into the search for an optimal model. This requirement for fairness simultaneously poses the question of whether we can reliably estimate a model’s fairness, which is studied in a further contribution in this thesis. Since access to machine learning methods also heavily depends on access to software and toolboxes, several contributions in the form of software are part of this thesis. The mlr3pipelines R package allows for embedding models in so-called machine learning pipelines that include pre- and postprocessing steps often required in machine learning and AutoML. The mlr3fairness R package on the other hand enables users to audit models for potential biases as well as reduce those biases through different debiasing techniques. One such technique, multi-calibration is published as a separate software package, mcboost

    Fuelling the zero-emissions road freight of the future: routing of mobile fuellers

    Get PDF
    The future of zero-emissions road freight is closely tied to the sufficient availability of new and clean fuel options such as electricity and Hydrogen. In goods distribution using Electric Commercial Vehicles (ECVs) and Hydrogen Fuel Cell Vehicles (HFCVs) a major challenge in the transition period would pertain to their limited autonomy and scarce and unevenly distributed refuelling stations. One viable solution to facilitate and speed up the adoption of ECVs/HFCVs by logistics, however, is to get the fuel to the point where it is needed (instead of diverting the route of delivery vehicles to refuelling stations) using "Mobile Fuellers (MFs)". These are mobile battery swapping/recharging vans or mobile Hydrogen fuellers that can travel to a running ECV/HFCV to provide the fuel they require to complete their delivery routes at a rendezvous time and space. In this presentation, new vehicle routing models will be presented for a third party company that provides MF services. In the proposed problem variant, the MF provider company receives routing plans of multiple customer companies and has to design routes for a fleet of capacitated MFs that have to synchronise their routes with the running vehicles to deliver the required amount of fuel on-the-fly. This presentation will discuss and compare several mathematical models based on different business models and collaborative logistics scenarios
    corecore