173 research outputs found

    Scalable Exact Parent Sets Identification in Bayesian Networks Learning with Apache Spark

    Full text link
    In Machine Learning, the parent set identification problem is to find a set of random variables that best explain selected variable given the data and some predefined scoring function. This problem is a critical component to structure learning of Bayesian networks and Markov blankets discovery, and thus has many practical applications, ranging from fraud detection to clinical decision support. In this paper, we introduce a new distributed memory approach to the exact parent sets assignment problem. To achieve scalability, we derive theoretical bounds to constraint the search space when MDL scoring function is used, and we reorganize the underlying dynamic programming such that the computational density is increased and fine-grain synchronization is eliminated. We then design efficient realization of our approach in the Apache Spark platform. Through experimental results, we demonstrate that the method maintains strong scalability on a 500-core standalone Spark cluster, and it can be used to efficiently process data sets with 70 variables, far beyond the reach of the currently available solutions

    Structures in diagnosis:from theory to medical application

    Get PDF

    Building Bayesian Networks: Elicitation, Evaluation, and Learning

    Get PDF
    As a compact graphical framework for representation of multivariate probabilitydistributions, Bayesian networks are widely used for efficient reasoning underuncertainty in a variety of applications, from medical diagnosis to computertroubleshooting and airplane fault isolation. However, construction of Bayesiannetworks is often considered the main difficulty when applying this frameworkto real-world problems. In real world domains, Bayesian networks are often built by knowledge engineering approach. Unfortunately, eliciting knowledge from domain experts isa very time-consuming process, and could result in poor-quality graphicalmodels when not performed carefully. Over the last decade, the research focusis shifting more towards learning Bayesian networks from data, especially withincreasing volumes of data available in various applications, such asbiomedical, internet, and e-business, among others.Aiming at solving the bottle-neck problem of building Bayesian network models, thisresearch work focuses on elicitation, evaluation and learning Bayesiannetworks. Specifically, the contribution of this dissertation involves the research in the following five areas:a) graphical user interface tools forefficient elicitation and navigation of probability distributions, b) systematic and objective evaluation of elicitation schemes for probabilistic models, c)valid evaluation of performance robustness, i.e., sensitivity, of Bayesian networks,d) the sensitivity inequivalent characteristic of Markov equivalent networks, and the appropriateness of using sensitivity for model selection in learning Bayesian networks,e) selective refinement for learning probability parameters of Bayesian networks from limited data with availability of expert knowledge. In addition, an efficient algorithm for fast sensitivity analysis is developed based on relevance reasoning technique. The implemented algorithm runs very fast and makes d) and e) more affordable for real domain practice

    Impact of Bayesian network model structure on the accuracy of medical diagnostic systems

    Get PDF
    While Bayesian network models may contain a handful of numerical parameters that are important for their quality, several empirical studies have confirmed that overall precision of their probabilities is not crucial. In this paper, we study the impact of the structure of a Bayesian network on the precision of medical diagnostic systems. We show that also the structure is not that important - diagnostic accuracy of several medical diagnostic models changes minimally when we subject their structures to such transformations as arc removal and arc reversal. © 2014 Springer International Publishing

    A Knowledge-based Clinical Toxicology Consultant for Diagnosing Multiple Exposures

    Get PDF
    Objective: This paper presents continued research toward the development of a knowledge-based system for the diagnosis of human toxic exposures. In particular, this research focuses on the challenging task of diagnosing exposures to multiple toxins. Although only 10% of toxic exposures in the United States involve multiple toxins, multiple exposures account for more than half of all toxin-related fatalities. Using simple medical mathematics, we seek to produce a practical decision support system capable of supplying useful information to aid in the diagnosis of complex cases involving multiple unknown substances. Methods: The system is automatically trained using data mining techniques to extract prior probabilities and likelihood ratios from a database managed by the Florida Poison Information Center (FPIC). When supplied with observed clinical effects, the system produces a ranked list of the most plausible toxic exposures. During testing, the system diagnosed toxins at three levels: identifying the substance, identifying the toxin’s major and minor categories, and identifying the toxin’s major category alone. To enable comparison between these three levels, accuracy was calculated as the percentage of exposures correctly identified in top 10% of trained diagnoses. Results: System evaluation utilized a dataset of 8,901 multiple exposure cases and 37,617 single exposure cases. Initial system testing using only multiple exposure cases yielded poor results, with diagnosis accuracies ranging from 18.5-50.1%. Further investigation revealed that the system’s inability to diagnose multiple disorders resulted from insufficient data and that the clinical effects observed in multiple exposures are dominated by a single substance. Including single exposures when training, the system achieved accuracies as high as 83.5% when 2 diagnosing the primary contributors in multiple exposure cases by substance, 86.9% when diagnosing by major and minor categories, and 79.9% when diagnosing by major category alone. Conclusions: Although the system failed to completely diagnose exposures to multiple toxins, the ability to identify the primary contributor in such cases may prove valuable in aiding medical personnel as they seek to diagnose and treat patients. As time passes and more cases are added to the FPIC database, we believe system accuracy will continue to improve, producing a viable decision support system for clinical toxicology

    LULUS UKMPPD (Ujian Kompetensi Mahasiswa Pendidikan Profesi Dokter)

    Get PDF
    DALAM HAL INI MENAMPILKAN BUKU MONOGRAF DAN BUKU SELURUH DOSEN UNPRI TAHUN 202

    A Knowledge-based Clinical Toxicology Consultant for Diagnosing Single Exposures

    Get PDF
    Objective: Every year, toxic exposures kill twelve hundred Americans. To aid in the timely diagnosis and treatment of such exposures, this research investigates the feasibility of a knowledge-based system capable of generating differential diagnoses for human exposures involving unknown toxins. Methods: Data mining techniques automatically extract prior probabilities and likelihood ratios from a database managed by the Florida Poison Information Center. Using observed clinical effects, the trained system produces a ranked list of plausible toxic exposures. The resulting system was evaluated using 30,152 single exposure cases. In addition, the effects of two filters for refining diagnosis based on a minimum number of exposure cases and a minimum number of clinical effects were also explored. Results: The system achieved accuracies (calculated as the percentage of exposures correctly identified in top 10% of trained diagnoses) as high as 79.8% when diagnosing by substance and 78.9% when diagnosing by the major and minor categories of toxins. Conclusions: The results of this research are modest, yet promising. At this time, no similar systems are currently in use in the United States and it is hoped that these studies will yield an effective medical decision support system for clinical toxicology

    Classifiers for modeling of mineral potential

    Get PDF
    [Extract] Classification and allocation of land-use is a major policy objective in most countries. Such an undertaking, however, in the face of competing demands from different stakeholders, requires reliable information on resources potential. This type of information enables policy decision-makers to estimate socio-economic benefits from different possible land-use types and then to allocate most suitable land-use. The potential for several types of resources occurring on the earth's surface (e.g., forest, soil, etc.) is generally easier to determine than those occurring in the subsurface (e.g., mineral deposits, etc.). In many situations, therefore, information on potential for subsurface occurring resources is not among the inputs to land-use decision-making [85]. Consequently, many potentially mineralized lands are alienated usually to, say, further exploration and exploitation of mineral deposits. Areas with mineral potential are characterized by geological features associated genetically and spatially with the type of mineral deposits sought. The term 'mineral deposits' means .accumulations or concentrations of one or more useful naturally occurring substances, which are otherwise usually distributed sparsely in the earth's crust. The term 'mineralization' refers to collective geological processes that result in formation of mineral deposits. The term 'mineral potential' describes the probability or favorability for occurrence of mineral deposits or mineralization. The geological features characteristic of mineralized land, which are called recognition criteria, are spatial objects indicative of or produced by individual geological processes that acted together to form mineral deposits. Recognition criteria are sometimes directly observable; more often, their presence is inferred from one or more geographically referenced (or spatial) datasets, which are processed and analyzed appropriately to enhance, extract, and represent the recognition criteria as spatial evidence or predictor maps. Mineral potential mapping then involves integration of predictor maps in order to classify areas of unique combinations of spatial predictor patterns, called unique conditions [51] as either barren or mineralized with respect to the mineral deposit-type sought
    corecore