22 research outputs found

    Estimating causal structure using conditional DAG models

    Full text link
    ©2016 Chris. J. Oates, Jim. Q. Smith and Sach Mukherjee. This paper considers inference of causal structure in a class of graphical models called conditional DAGs. These are directed acyclic graph (DAG) models with two kinds of variables, primary and secondary. The secondary variables are used to aid in the estimation of the structure of causal relationships between the primary variables. We prove that, under certain assumptions, such causal structure is identifiable from the joint observational distribution of the primary and secondary variables. We give causal semantics for the model class, put forward a score-based approach for estimation and establish consistency results. Empirical results demonstrate gains compared with formulations that treat all variables on an equal footing, or that ignore secondary variables. The methodology is motivated by applications in biology that involve multiple data types and is illustrated here using simulated data and in an analysis of molecular data from the Cancer Genome Atlas

    The development of object oriented Bayesian networks to evaluate the social, economic and environmental impacts of solar PV

    Get PDF
    Domestic and community low carbon technologies are widely heralded as valuable means for delivering sustainability outcomes in the form of social, economic and environmental (SEE) policy objectives. To accelerate their diffusion they have benefited from a significant number and variety of subsidies worldwide. Considerable aleatory and epistemic uncertainties exist, however, both with regard to their net energy contribution and their SEE impacts. Furthermore the socio-economic contexts themselves exhibit enormous variability, and commensurate uncertainties in their parameterisation. This represents a significant risk for policy makers and technology adopters. This work describes an approach to these problems using Bayesian Network models. These are utilised to integrate extant knowledge from a variety of disciplines to quantify SEE impacts and endogenise uncertainties. A large-scale Object Oriented Bayesian network has been developed to model the specific case of solar photovoltaics (PV) installed on UK domestic roofs. Three specific model components have been developed. The PV component characterises the yield of UK systems, the building energy component characterises the energy consumption of the dwellings and their occupants and a third component characterises the building stock in four English urban communities. Three representative SEE indicators, fuel affordability, carbon emission reduction and discounted cash flow are integrated and used to test the model s ability to yield meaningful outputs in response to varying inputs. The variability in the percentage of the three indicators is highly responsive to the dwellings built form, age and orientation, but is not just due to building and solar physics but also to socio-economic factors. The model can accept observations or evidence in order to create scenarios which facilitate deliberative decision making. The BN methodology contributes to the synthesis of new knowledge from extant knowledge located between disciplines . As well as insights into the impacts of high PV penetration, an epistemic contribution has been made to transdisciplinary building energy modelling which can be replicated with a variety of low carbon interventions

    Embedding Approaches for Relational Data

    Get PDF
    ​Embedding methods for searching latent representations of the data are very important tools for unsupervised and supervised machine learning as well as information visualisation. Over the years, such methods have continually progressed towards the ability to capture and analyse the structure and latent characteristics of larger and more complex data. In this thesis, we examine the problem of developing efficient and reliable embedding methods for revealing, understanding, and exploiting the different aspects of the relational data. We split our work into three pieces, where each deals with a different relational data structure. In the first part, we are handling with the weighted bipartite relational structure. Based on the relational measurements between two groups of heterogeneous objects, our goal is to generate low dimensional representations of these two different types of objects in a unified common space. We propose a novel method that models the embedding of each object type symmetrically to the other type, subject to flexible scale constraints and weighting parameters. The embedding generation relies on an efficient optimisation despatched using matrix decomposition. And we have also proposed a simple way of measuring the conformity between the original object relations and the ones re-estimated from the embeddings, in order to achieve model selection by identifying the optimal model parameters with a simple search procedure. We show that our proposed method achieves consistently better or on-par results on multiple synthetic datasets and real world ones from the text mining domain when compared with existing embedding generation approaches. In the second part of this thesis, we focus on the multi-relational data, where objects are interlinked by various relation types. Embedding approaches are very popular in this field, they typically encode objects and relation types with hidden representations and use the operations between them to compute the positive scalars corresponding to the linkages' likelihood score. In this work, we aim at further improving the existing embedding techniques by taking into account the multiple facets of the different patterns and behaviours of each relation type. To the best of our knowledge, this is the first latent representation model which considers relational representations to be dependent on the objects they relate in this field. The multi-modality of the relation type over different objects is effectively formulated as a projection matrix over the space spanned by the object vectors. Two large benchmark knowledge bases are used to evaluate the performance with respect to the link prediction task. And a new test data partition scheme is proposed to offer a better understanding of the behaviour of a link prediction model. In the last part of this thesis, a much more complex relational structure is considered. In particular, we aim at developing novel embedding methods for jointly modelling the linkage structure and objects' attributes. Traditionally, link prediction task is carried out on either the linkage structure or the objects' attributes, which does not aware of their semantic connections and is insufficient for handling the complex link prediction task. Thus, our goal in this work is to build a reliable model that can fuse both sources of information to improve the link prediction problem. The key idea of our approach is to encode both the linkage validities and the nodes neighbourhood information into embedding-based conditional probabilities. Another important aspect of our proposed algorithm is that we utilise a margin-based contrastive training process for encoding the linkage structure, which relies on a more appropriate assumption and dramatically reduces the number of training links. In the experiments, our proposed method indeed improves the link prediction performance on three citation/hyperlink datasets, when compared with those methods relying on only the nodes' attributes or the linkage structure, and it also achieves much better performances compared with the state-of-arts

    Efficient search for relevance explanations using MAP-independence in Bayesian networks

    Get PDF
    [Formula presented]-independence is a novel concept concerned with explaining the (ir)relevance of intermediate nodes for maximum a posteriori ([Formula presented]) computations in Bayesian networks. Building upon properties of [Formula presented]-independence, we introduce and experiment with methods for finding sets of relevant nodes using both an exhaustive and a heuristic approach. Our experiments show that these properties significantly speed up run time for both approaches. In addition, we link [Formula presented]-independence to defeasible reasoning, a type of reasoning that analyses how new evidence may invalidate an already established conclusion. Ways to present users with an explanation using [Formula presented]-independence are also suggested

    If interpretability is the answer, what is the question?

    Get PDF
    Due to the ability to model even complex dependencies, machine learning (ML) can be used to tackle a broad range of (high-stakes) prediction problems. The complexity of the resulting models comes at the cost of transparency, meaning that it is difficult to understand the model by inspecting its parameters. This opacity is considered problematic since it hampers the transfer of knowledge from the model, undermines the agency of individuals affected by algorithmic decisions, and makes it more challenging to expose non-robust or unethical behaviour. To tackle the opacity of ML models, the field of interpretable machine learning (IML) has emerged. The field is motivated by the idea that if we could understand the model's behaviour -- either by making the model itself interpretable or by inspecting post-hoc explanations -- we could also expose unethical and non-robust behaviour, learn about the data generating process, and restore the agency of affected individuals. IML is not only a highly active area of research, but the developed techniques are also widely applied in both industry and the sciences. Despite the popularity of IML, the field faces fundamental criticism, questioning whether IML actually helps in tackling the aforementioned problems of ML and even whether it should be a field of research in the first place: First and foremost, IML is criticised for lacking a clear goal and, thus, a clear definition of what it means for a model to be interpretable. On a similar note, the meaning of existing methods is often unclear, and thus they may be misunderstood or even misused to hide unethical behaviour. Moreover, estimating conditional-sampling-based techniques poses a significant computational challenge. With the contributions included in this thesis, we tackle these three challenges for IML. We join a range of work by arguing that the field struggles to define and evaluate "interpretability" because incoherent interpretation goals are conflated. However, the different goals can be disentangled such that coherent requirements can inform the derivation of the respective target estimands. We demonstrate this with the examples of two interpretation contexts: recourse and scientific inference. To tackle the misinterpretation of IML methods, we suggest deriving formal interpretation rules that link explanations to aspects of the model and data. In our work, we specifically focus on interpreting feature importance. Furthermore, we collect interpretation pitfalls and communicate them to a broader audience. To efficiently estimate conditional-sampling-based interpretation techniques, we propose two methods that leverage the dependence structure in the data to simplify the estimation problems for Conditional Feature Importance (CFI) and SAGE. A causal perspective proved to be vital in tackling the challenges: First, since IML problems such as algorithmic recourse are inherently causal; Second, since causality helps to disentangle the different aspects of model and data and, therefore, to distinguish the insights that different methods provide; And third, algorithms developed for causal structure learning can be leveraged for the efficient estimation of conditional-sampling based IML methods.Aufgrund der Fähigkeit, selbst komplexe Abhängigkeiten zu modellieren, kann maschinelles Lernen (ML) zur Lösung eines breiten Spektrums von anspruchsvollen Vorhersageproblemen eingesetzt werden. Die Komplexität der resultierenden Modelle geht auf Kosten der Interpretierbarkeit, d. h. es ist schwierig, das Modell durch die Untersuchung seiner Parameter zu verstehen. Diese Undurchsichtigkeit wird als problematisch angesehen, da sie den Wissenstransfer aus dem Modell behindert, sie die Handlungsfähigkeit von Personen, die von algorithmischen Entscheidungen betroffen sind, untergräbt und sie es schwieriger macht, nicht robustes oder unethisches Verhalten aufzudecken. Um die Undurchsichtigkeit von ML-Modellen anzugehen, hat sich das Feld des interpretierbaren maschinellen Lernens (IML) entwickelt. Dieses Feld ist von der Idee motiviert, dass wir, wenn wir das Verhalten des Modells verstehen könnten - entweder indem wir das Modell selbst interpretierbar machen oder anhand von post-hoc Erklärungen - auch unethisches und nicht robustes Verhalten aufdecken, über den datengenerierenden Prozess lernen und die Handlungsfähigkeit betroffener Personen wiederherstellen könnten. IML ist nicht nur ein sehr aktiver Forschungsbereich, sondern die entwickelten Techniken werden auch weitgehend in der Industrie und den Wissenschaften angewendet. Trotz der Popularität von IML ist das Feld mit fundamentaler Kritik konfrontiert, die in Frage stellt, ob IML tatsächlich dabei hilft, die oben genannten Probleme von ML anzugehen, und ob es überhaupt ein Forschungsgebiet sein sollte: In erster Linie wird an IML kritisiert, dass es an einem klaren Ziel und damit an einer klaren Definition dessen fehlt, was es für ein Modell bedeutet, interpretierbar zu sein. Weiterhin ist die Bedeutung bestehender Methoden oft unklar, so dass sie missverstanden oder sogar missbraucht werden können, um unethisches Verhalten zu verbergen. Letztlich stellt die Schätzung von auf bedingten Stichproben basierenden Verfahren eine erhebliche rechnerische Herausforderung dar. In dieser Arbeit befassen wir uns mit diesen drei grundlegenden Herausforderungen von IML. Wir schließen uns der Argumentation an, dass es schwierig ist, "Interpretierbarkeit" zu definieren und zu bewerten, weil inkohärente Interpretationsziele miteinander vermengt werden. Die verschiedenen Ziele lassen sich jedoch entflechten, sodass kohärente Anforderungen die Ableitung der jeweiligen Zielgrößen informieren. Wir demonstrieren dies am Beispiel von zwei Interpretationskontexten: algorithmischer Regress und wissenschaftliche Inferenz. Um der Fehlinterpretation von IML-Methoden zu begegnen, schlagen wir vor, formale Interpretationsregeln abzuleiten, die Erklärungen mit Aspekten des Modells und der Daten verknüpfen. In unserer Arbeit konzentrieren wir uns speziell auf die Interpretation von sogenannten Feature Importance Methoden. Darüber hinaus tragen wir wichtige Interpretationsfallen zusammen und kommunizieren sie an ein breiteres Publikum. Zur effizienten Schätzung auf bedingten Stichproben basierender Interpretationstechniken schlagen wir zwei Methoden vor, die die Abhängigkeitsstruktur in den Daten nutzen, um die Schätzprobleme für Conditional Feature Importance (CFI) und SAGE zu vereinfachen. Eine kausale Perspektive erwies sich als entscheidend für die Bewältigung der Herausforderungen: Erstens, weil IML-Probleme wie der algorithmische Regress inhärent kausal sind; zweitens, weil Kausalität hilft, die verschiedenen Aspekte von Modell und Daten zu entflechten und somit die Erkenntnisse, die verschiedene Methoden liefern, zu unterscheiden; und drittens können wir Algorithmen, die für das Lernen kausaler Struktur entwickelt wurden, für die effiziente Schätzung von auf bindingten Verteilungen basierenden IML-Methoden verwenden

    Partial correlation based penalty functions and prior distributions for Gaussian graphical models

    Get PDF
    Graphical models are a useful tool for encoding conditional independence relations. A common goal is to select the graphical model that best describes the conditional independence relationships between variables given observations of these variables. Under the additional Gaussian assumption, conditional independence is equivalent to zero entries in the inverse covariance matrix Ćź. Thus sparse estimation of Ćź in turn specifies a graphical model and the associated conditional independencies. Popular frequentist methods for this often involve placing a penalty function on Ćź and maximising a penalised likelihood, whilst Bayesian methods require specification of a prior distribution on Ćź. Conditional independence relations are invariant to non-zero scalar multiplication of the variables, however in this thesis we show that essentially all current penalised likelihood methods and many prior distributions are not invariant to such transformations of the variables. In fact many methods are very sensitive to rescaling of the variables which can, and often does, result in a vastly different selected graphical model. To remedy this issue we introduce new classes of penalty functions and prior distributions which are based on partial correlations. We show that such penalty functions and prior distributions lead to scale invariant estimation and posterior inference on Ćź. We pay particular attention to two penalty functions in this class. The partial correlation graphical LASSO places an L1 penalty on the partial correlations whilst the spike and slab partial correlation graphical LASSO is a penalty function based on a spike and slab prior formulation. The performance of these penalty functions is compared to that of current popular penalty functions in simulated and real world settings. We also investigate spike and slab priors in general for Gaussian graphical models and point out that care must be taken when considering the positive definiteness of Ćź. With this in mind we provide some theoretical results based on Wigner matrices

    Learning and predicting with chain event graphs

    Get PDF
    Graphical models provide a very promising avenue for making sense of large, complex datasets. The most popular graphical models in use at the moment are Bayesian networks (BNs). This thesis shows, however, they are not always ideal factorisations of a system. Instead, I advocate for the use of a relatively new graphical model, the chain event graph (CEG), that is based on event trees. Event trees directly represent graphically the event space of a system. Chain event graphs reduce their potentially huge dimensionality by taking into account identical probability distributions on some of the event tree’s subtrees, with the added benefits of showing the conditional independence relationships of the system — one of the advantages of the Bayesian network representation that event trees lack — and implementation of causal hypotheses that is just as easy, and arguably more natural, than is the case with Bayesian networks, with a larger domain of implementation using purely graphical means. The trade-off for this greater expressive power, however, is that model specification and selection are much more difficult to undertake with the larger set of possible models for a given set of variables. My thesis is the first exposition of how to learn CEGs. I demonstrate that not only is conjugate (and hence quick) learning of CEGs possible, but I characterise priors that imply conjugate updating based on very reasonable assumptions that also have direct Bayesian network analogues. By re-casting CEGs as partition models, I show how established partition learning algorithms can be adapted for the task of learning CEGs. I then develop a robust yet flexible prediction machine based on CEGs for any discrete multivariate time series — the dynamic CEG model — which combines the power of CEGs, multi-process and steady modelling, lattice theory and Occam’s razor. This is also an exact method that produces reliable predictions without requiring much a priori modelling. I then demonstrate how easily causal analysis can be implemented with this model class that can express a wide variety of causal hypotheses. I end with an application of these techniques to real educational data, drawing inferences that would not have been possible simply using BNs
    corecore