83 research outputs found

    Assigning times to minimise reachability in temporal graphs

    Get PDF
    Temporal graphs (in which edges are active at specified times) are of particular relevance for spreading processes on graphs, e.g. the spread of disease or dissemination of information. Motivated by real-world applications, modification of static graphs to control this spread has proven a rich topic for previous research. Here, we introduce a new type of modification for temporal graphs: the number of active times for each edge is fixed, but we can change the relative order in which (sets of) edges are active. We investigate the problem of determining an ordering of edges that minimises the maximum number of vertices reachable from any single starting vertex; epidemiologically, this corresponds to the worst-case number of vertices infected in a single disease outbreak. We study two versions of this problem, both of which we show to be -hard, and identify cases in which the problem can be solved or approximated efficiently

    Assigning times to minimise reachability in temporal graphs

    Get PDF
    Temporal graphs (in which edges are active at specified times) are of particular relevance for spreading processes on graphs, e.g.~the spread of disease or dissemination of information. Motivated by real-world applications, modification of static graphs to control this spread has proven a rich topic for previous research. Here, we introduce a new type of modification for temporal graphs: the number of active times for each edge is fixed, but we can change the relative order in which (sets of) edges are active. We investigate the problem of determining an ordering of edges that minimises the maximum number of vertices reachable from any single starting vertex; epidemiologically, this corresponds to the worst-case number of vertices infected in a single disease outbreak. We study two versions of this problem, both of which we show to be \NP-hard, and identify cases in which the problem can be solved or approximated efficiently.Comment: Author final version, to appear in Journal of Computer and System Sciences. Material from the previous version has been reorganised substantially, and some results have been strengthene

    Greedy structure learning from data that contains systematic missing values

    Get PDF
    Learning from data that contain missing values represents a common phenomenon in many domains. Relatively few Bayesian Network structure learning algorithms account for missing data, and those that do tend to rely on standard approaches that assume missing data are missing at random, such as the Expectation-Maximisation algorithm. Because missing data are often systematic, there is a need for more pragmatic methods that can effectively deal with data sets containing missing values not missing at random. The absence of approaches that deal with systematic missing data impedes the application of BN structure learning methods to real-world problems where missingness are not random. This paper describes three variants of greedy search structure learning that utilise pairwise deletion and inverse probability weighting to maximally leverage the observed data and to limit potential bias caused by missing values. The first two of the variants can be viewed as sub-versions of the third and best performing variant, but are important in their own in illustrating the successive improvements in learning accuracy. The empirical investigations show that the proposed approach outperforms the commonly used and state-of-the-art Structural EM algorithm, both in terms of learning accuracy and efficiency, as well as both when data are missing at random and not at random

    A survey of Bayesian Network structure learning

    Get PDF

    Treatment of missing data in Bayesian network structure learning : an application to linked biomedical and social survey data

    Get PDF
    The authors acknowledge the Research/Scientific Computing teams at The James Hutton Institute and NIAB for providing computational resources and technical support for the ā€œUKā€™s Crop Diversity Bioinformatics HPCā€ (BBSRC grant BB/S019669/1), use of which has contributed to the results reported within this paper. Access to this was provided via the University of St Andrews Bioinformatics Unit which is funded by a Wellcome Trust ISSF award (grant 105621/Z/14/Z and 204821/Z/16/Z). XK was supported by a World-Leading PhD Scholarship from St Leonardā€™s Postgraduate School of the University of St Andrews. VAS and KK were partially supported by HATUA, The Holistic Approach to Unravel Antibacterial Resistance in East Africa, a three-year Global Context Consortia Award (MR/S004785/1) funded by the National Institute for Health Research, Medical Research Council and the Department of Health and Social Care. KK is supported by the Academy of Medical Sciences, the Wellcome Trust, the Government Department of Business, Energy and Industrial Strategy, the British Heart Foundation Diabetes UK, and the Global Challenges Research Fund [Grant number SBF004\1093]. KK is additionally supported by the Economic and Social Research Council HIGHLIGHT CPC- Connecting Generations Centre [Grant number ES/W002116/1].Background Availability of linked biomedical and social science data has risen dramatically in past decades, facilitating holistic and systems-based analyses. Among these, Bayesian networks have great potential to tackle complex interdisciplinary problems, because they can easily model inter-relations between variables. They work by encoding conditional independence relationships discovered via advanced inference algorithms. One challenge is dealing with missing data, ubiquitous in survey or biomedical datasets. Missing data is rarely addressed in an advanced way in Bayesian networks; the most common approach is to discard all samples containing missing measurements. This can lead to biased estimates. Here, we examine how Bayesian network structure learning can incorporate missing data. Methods We use a simulation approach to compare a commonly used method in frequentist statistics, multiple imputation by chained equations (MICE), with one specific for Bayesian network learning, structural expectation-maximization (SEM). We simulate multiple incomplete categorical (discrete) data sets with different missingness mechanisms, variable numbers, data amount, and missingness proportions. We evaluate performance of MICE and SEM in capturing network structure. We then apply SEM combined with community analysis to a real-world dataset of linked biomedical and social data to investigate associations between socio-demographic factors and multiple chronic conditions in the US elderly population. Results We find that applying either method (MICE or SEM) provides better structure recovery than doing nothing, and SEM in general outperforms MICE. This finding is robust across missingness mechanisms, variable numbers, data amount and missingness proportions. We also find that imputed data from SEM is more accurate than from MICE. Our real-world application recovers known inter-relationships among socio-demographic factors and common multimorbidities. This network analysis also highlights potential areas of investigation, such as links between cancer and cognitive impairment and disconnect between self-assessed memory decline and standard cognitive impairment measurement. Conclusion Our simulation results suggest taking advantage of the additional information provided by network structure during SEM improves the performance of Bayesian networks; this might be especially useful for social science and other interdisciplinary analyses. Our case study show that comorbidities of different diseases interact with each other and are closely associated with socio-demographic factors.PostprintPublisher PDFPeer reviewe

    Open problems in causal structure learning: A case study of COVID-19 in the UK

    Full text link
    Causal machine learning (ML) algorithms recover graphical structures that tell us something about cause-and-effect relationships. The causal representation praovided by these algorithms enables transparency and explainability, which is necessary for decision making in critical real-world problems. Yet, causal ML has had limited impact in practice compared to associational ML. This paper investigates the challenges of causal ML with application to COVID-19 UK pandemic data. We collate data from various public sources and investigate what the various structure learning algorithms learn from these data. We explore the impact of different data formats on algorithms spanning different classes of learning, and assess the results produced by each algorithm, and groups of algorithms, in terms of graphical structure, model dimensionality, sensitivity analysis, confounding variables, predictive and interventional inference. We use these results to highlight open problems in causal structure learning and directions for future research. To facilitate future work, we make all graphs, models, data sets, and source code publicly available online

    Bayesian network structure learning in the presence of data noise

    Get PDF
    A Bayesian Network (BN) is a type of a probabilistic graphical model that captures conditional and marginal independencies between variables. These models are generally represented by a Directed Acyclic Graph (DAG), which is composed by nodes and arcs. The nodes represent variables and the absence of arcs represent conditional or marginal independencies. When BNs are applied to real-world problems, the structure of these models is often assumed to be causal (often referred to as a causal BN), and is often constructed from either expert knowledge and Randomised Controlled Trials (RCTs). However, these two approaches can be time-consuming and expensive, and it might not always be possible or ethical to perform RCTs. As a result, structure learning algorithms that recover graphical structures from observational data, which in turn could be used to inform causal structures, have received increasing attention over the past few decades. To be able to guarantee the correctness of a structure learnt from data, a structure learning algorithm must rely on assumptions that may not hold in practice. One such crucial and commonly used assumption is that the observed data are independently and identically sampled from the underlying distribution, such that all statistical quantities of the distribution can be recovered with no bias from the observed data when sample size goes to infinite. While such assumptions are often needed to be able to devise theoretical guarantees, the impact of violating these assumptions when working with real data tends to be overlooked. Empirical investigations show that structure learning algorithms perform considerably worse on noisy data that violate many of their theoretical assumptions, relative to how they perform on clean synthetic data that do not violate any of their data-generating assumptions. However, there has been limited research on how to deal with these problems effectively and efficiently. This thesis investigates this research direction and primarily focuses on improving structure learning in the presence of measurement error and systematic missing data, which are two of the most common types of data noise present in real data sets

    Learning Bayesian network equivalence classes with ant colony optimization

    Get PDF
    Bayesian networks are a useful tool in the representation of uncertain knowledge. This paper proposes a new algorithm called ACO-E, to learn the structure of a Bayesian network. It does this by conducting a search through the space of equivalence classes of Bayesian networks using Ant Colony Optimization (ACO). To this end, two novel extensions of traditional ACO techniques are proposed and implemented. Firstly, multiple types of moves are allowed. Secondly, moves can be given in terms of indices that are not based on construction graph nodes. The results of testing show that ACO-E performs better than a greedy search and other state-of-the-art and metaheuristic algorithms whilst searching in the space of equivalence classe
    • ā€¦
    corecore