541 research outputs found

    Disruption analytics in urban metro systems with large-scale automated data

    Get PDF
    Urban metro systems are frequently affected by disruptions such as infrastructure malfunctions, rolling stock breakdowns and accidents. Such disruptions give rise to delays, congestion and inconvenience for public transport users, which in turn, lead to a wider range of negative impacts on the social economy and wellbeing. This PhD thesis aims to improve our understanding of disruption impacts and improve the ability of metro operators to detect and manage disruptions by using large-scale automated data. The crucial precondition of any disruption analytics is to have accurate information about the location, occurrence time, duration and propagation of disruptions. In pursuit of this goal, the thesis develops statistical models to detect disruptions via deviations in trains’ headways relative to their regular services. Our method is a unique contribution in the sense that it is based on automated vehicle location data (data-driven) and the probabilistic framework is effective to detect any type of service interruptions, including minor delays that last just a few minutes. As an important research outcome, the thesis delivers novel analyses of the propagation progress of disruptions along metro lines, thus enabling us to distinguish primary and secondary disruptions as well as recovery interventions performed by operators. The other part of the thesis provides new insights for quantifying disruption impacts and measuring metro vulnerability. One of our key messages is that in metro systems there are factors influencing both the occurrence of disruptions and their outcomes. With such confounding factors, we show that causal inference is a powerful tool to estimate unbiased impacts on passenger demand and journey time, which is also capable of quantifying the spatial-temporal propagation of disruption impacts within metro networks. The causal inference approaches are applied to empirical studies based on the Hong Kong Mass Transit Railway (MTR). Our conclusions can assist researchers and practitioners in two applications: (i) the evaluation of metro performance such as service reliability, system vulnerability and resilience, and (ii) the management of future disruptions.Open Acces

    Machine learning in the social and health sciences

    Get PDF
    The uptake of machine learning (ML) approaches in the social and health sciences has been rather slow, and research using ML for social and health research questions remains fragmented. This may be due to the separate development of research in the computational/data versus social and health sciences as well as a lack of accessible overviews and adequate training in ML techniques for non data science researchers. This paper provides a meta-mapping of research questions in the social and health sciences to appropriate ML approaches, by incorporating the necessary requirements to statistical analysis in these disciplines. We map the established classification into description, prediction, and causal inference to common research goals, such as estimating prevalence of adverse health or social outcomes, predicting the risk of an event, and identifying risk factors or causes of adverse outcomes. This meta-mapping aims at overcoming disciplinary barriers and starting a fluid dialogue between researchers from the social and health sciences and methodologically trained researchers. Such mapping may also help to fully exploit the benefits of ML while considering domain-specific aspects relevant to the social and health sciences, and hopefully contribute to the acceleration of the uptake of ML applications to advance both basic and applied social and health sciences research

    Bayesian Learning in the Counterfactual World

    Get PDF
    Recent years have witnessed a surging interest towards the use of machine learning tools for causal inference. In contrast to the usual large data settings where the primary goal is prediction, many disciplines, such as health, economic and social sciences, are instead interested in causal questions. Learning individualized responses to an intervention is a crucial task in many applied fields (e.g., precision medicine, targeted advertising, precision agriculture, etc.) where the ultimate goal is to design optimal and highly-personalized policies based on individual features. In this work, I thus tackle the problem of estimating causal effects of an intervention that are heterogeneous across a population of interest and depend on an individual set of characteristics (e.g., a patient's clinical record, user's browsing history, etc..) in high-dimensional observational data settings. This is done by utilizing Bayesian Nonparametric or Probabilistic Machine Learning tools that are specifically adjusted for the causal setting and have desirable uncertainty quantification properties, with a focus on the issues of interpretability/explainability and inclusion of domain experts' prior knowledge. I begin by introducing terminology and concepts from causality and causal reasoning in the first chapter. Then I include a literature review of some of the state-of-the-art regression-based methods for heterogeneous treatment effects estimation, with an attempt to build a unifying taxonomy and lay down the finite-sample empirical properties of these models. The chapters forming the core of the dissertation instead present some novel methods addressing existing issues in individualized causal effects estimation: Chapter 3 develops both a Bayesian tree ensemble method and a deep learning architecture to tackle interpretability, uncertainty coverage and targeted regularization; Chapter 4 instead introduces a novel multi-task Deep Kernel Learning method particularly suited for multi-outcome | multi-action scenarios. The last chapter concludes with a discussion
    • …
    corecore