8 research outputs found

    A depresszió környezeti faktorainak vizsgálata oksági elemzési módszerekkel

    Get PDF
    Kutatómunkánk során globális és lokális oksági feltáró algoritmusokat alkalmazunk a depresszióhoz kapcsolódó környezeti és egyéb tényezőket közötti oksági kapcsolatok azonosítására

    Local Causal Discovery of Direct Causes and Effects

    Get PDF
    Abstract We focus on the discovery and identification of direct causes and effects of a target variable in a causal network. State-of-the-art causal learning algorithms generally need to find the global causal structures in the form of complete partial directed acyclic graphs (CPDAG) in order to identify direct causes and effects of a target variable. While these algorithms are effective, it is often unnecessary and wasteful to find the global structures when we are only interested in the local structure of one target variable (such as class labels). We propose a new local causal discovery algorithm, called Causal Markov Blanket (CMB), to identify the direct causes and effects of a target variable based on Markov Blanket Discovery. CMB is designed to conduct causal discovery among multiple variables, but focuses only on finding causal relationships between a specific target variable and other variables. Under standard assumptions, we show both theoretically and experimentally that the proposed local causal discovery algorithm can obtain the comparable identification accuracy as global methods but significantly improve their efficiency, often by more than one order of magnitude

    A Survey on Causal Discovery Methods for Temporal and Non-Temporal Data

    Full text link
    Causal Discovery (CD) is the process of identifying the cause-effect relationships among the variables from data. Over the years, several methods have been developed primarily based on the statistical properties of data to uncover the underlying causal mechanism. In this study we introduce the common terminologies in causal discovery, and provide a comprehensive discussion of the approaches designed to identify the causal edges in different settings. We further discuss some of the benchmark datasets available for evaluating the performance of the causal discovery algorithms, available tools to perform causal discovery readily, and the common metrics used to evaluate these methods. Finally, we conclude by presenting the common challenges involved in CD and also, discuss the applications of CD in multiple areas of interest

    A Bayesian Local Causal Discovery Framework

    Get PDF
    This work introduces the Bayesian local causal discovery framework, a method for discovering unconfounded causal relationships from observational data. It addresses the hypothesis that causal discovery using local search methods will outperform causal discovery algorithms that employ global search in the context of large datasets and limited computational resources.Several Bayesian local causal discovery (BLCD) algorithms are described and results presented comparing them with two well-known global causal discovery algorithms PC and FCI, and a global Bayesian network learning algorithm, the optimal reinsertion (OR) algorithm which was post-processed to identify relationships that under assumptions are causal.Methodologically, this research formalizes the task ofcausal discovery from observational data using a Bayesianapproach and local search. It specifically investigates theso called Y structure in causal discovery andclassifies the various types of Y structurespresent in the data generating networks. Itidentifies the Y structures in the Alarm,Hailfinder, Barley, Pathfinder and Munin networks andcategorizes them. A proof of the convergence of the BLCDalgorithm based on the identification of Y structures, isalso provided. Principled methods of combiningglobal and local causal discovery algorithms to improve uponthe performance of the individual algorithms are discussed. In particular,a post-processing method for identifying plausible causal relationships from the output of global Bayesiannetwork learning algorithms is described, therebyextending them to be causal discovery algorithms.In an experimental evaluation, simulated data fromsynthetic causal Bayesian networks representing fivedifferent domains, as well as a real-world medical dataset, were used. Causal discovery performance was measured using precision and recall.Sometimes the local methods performed better than the global methods,and sometimes they did not (both in terms of precision/recalland in terms of computation time).When all the datasets were considered in aggregate,the local methods (BLCD and BLCDpk) had higher precision.The general performance of the BLCD class of algorithmswas comparable to the global search algorithms, implying that the localsearch algorithms will have good performance onvery large datasets when the global methods fail to scaleup. The limitations of this research and directions for future research are also discussed

    Clinical foundations and information architecture for the implementation of a federated health record service

    Get PDF
    Clinical care increasingly requires healthcare professionals to access patient record information that may be distributed across multiple sites, held in a variety of paper and electronic formats, and represented as mixtures of narrative, structured, coded and multi-media entries. A longitudinal person-centred electronic health record (EHR) is a much-anticipated solution to this problem, but its realisation is proving to be a long and complex journey. This Thesis explores the history and evolution of clinical information systems, and establishes a set of clinical and ethico-legal requirements for a generic EHR server. A federation approach (FHR) to harmonising distributed heterogeneous electronic clinical databases is advocated as the basis for meeting these requirements. A set of information models and middleware services, needed to implement a Federated Health Record server, are then described, thereby supporting access by clinical applications to a distributed set of feeder systems holding patient record information. The overall information architecture thus defined provides a generic means of combining such feeder system data to create a virtual electronic health record. Active collaboration in a wide range of clinical contexts, across the whole of Europe, has been central to the evolution of the approach taken. A federated health record server based on this architecture has been implemented by the author and colleagues and deployed in a live clinical environment in the Department of Cardiovascular Medicine at the Whittington Hospital in North London. This implementation experience has fed back into the conceptual development of the approach and has provided "proof-of-concept" verification of its completeness and practical utility. This research has benefited from collaboration with a wide range of healthcare sites, informatics organisations and industry across Europe though several EU Health Telematics projects: GEHR, Synapses, EHCR-SupA, SynEx, Medicate and 6WINIT. The information models published here have been placed in the public domain and have substantially contributed to two generations of CEN health informatics standards, including CEN TC/251 ENV 13606

    Flow duration curve prediction for ungauged basins: A data-driven study of the contiguous United States

    Get PDF
    The flow duration curve (FDC) is one of the most widely used tools for displaying streamflow data, and percentile flows derived from the FDC provide essential information for managing rivers. These statistics are generally not available since most basins are ungauged. Percentile flows are frequently predicted using regression models developed using streamflow and ancillary data from gauged basins. Many potential independent variables are now available to predict percentile flows due to the ready availability of spatially distributed physical and climatic data for basins. A subset of the variables is often selected using automated regression procedures, but these procedures only evaluate a portion of the possible variable combinations. Other approaches for exploiting the information from physical and climatic data may produce stronger models for predicting percentile flows. The overarching hypothesis guiding this dissertation research was that more extensive approaches for extracting information from large sets of independent variables may improve percentile flow predictions. The dissertation was organized into the following three linked studies: (1) a performance evaluation of various approaches for selecting the independent variables of percentile flow regression models, (2) a comparison of different sets of variables for percentile flow regression modeling with increasing amounts of information in terms of the number of variables and their description of the statistical distribution of the data, and (3) a proof-of-concept study using a neural network approach called the self-organizing map (SOM) to account for the noise and non-linearity of predictive relations between the independent variables and percentile flows. Key findings from these studies were as follows: (1) random forests was the best approach for selecting the independent variables for regression models used to predict percentile flows, but variables selected based on a conceptual understanding of the FDC performed nearly as well, (2) a set of only three variables (mean annual precipitation, potential evapotranspiration, and baseflow index) performed as well as models with larger sets of variables representing more physical and climatic information, and (3) the SOM performed similarly to global regression models based on all the basins, but did not outperform regression models developed for regions composed of similar basins. This may be due to the SOM using all the independent variables, whereas the regression models discarded irrelevant variables that could increase the error in percentile flow predictions. All the studies of this dissertation were performed using 918 basins in the contiguous US, and the resulting predictive models provide a tool for local watershed managers to predict 13 percentile flows along with an estimate of the predictive error. These models could be improved through future research that (1) emphasizes the role of geology as this provided the most valuable information for predicting the percentile flows, (2) exploits new sources of remotely sensed information as classic topographic variables provided little predictive information, and (3) develops specialized models designed for high and low flows as these were the most difficult to predict
    corecore