16 research outputs found

    Conditional probability estimation

    Get PDF
    This paper studies in particular an aspect of the estimation of conditional probability distributions by maximum likelihood that seems to have been overlooked in the literature on Bayesian networks: The information conveyed by the conditioning event should be included in the likelihood function as well

    Marginal and simultaneous predictive classification using stratified graphical models

    Full text link
    An inductive probabilistic classification rule must generally obey the principles of Bayesian predictive inference, such that all observed and unobserved stochastic quantities are jointly modeled and the parameter uncertainty is fully acknowledged through the posterior predictive distribution. Several such rules have been recently considered and their asymptotic behavior has been characterized under the assumption that the observed features or variables used for building a classifier are conditionally independent given a simultaneous labeling of both the training samples and those from an unknown origin. Here we extend the theoretical results to predictive classifiers acknowledging feature dependencies either through graphical models or sparser alternatives defined as stratified graphical models. We also show through experimentation with both synthetic and real data that the predictive classifiers based on stratified graphical models have consistently best accuracy compared with the predictive classifiers based on either conditionally independent features or on ordinary graphical models.Comment: 18 pages, 5 figure

    Context-specific independence in graphical models

    Get PDF
    The theme of this thesis is context-speci c independence in graphical models. Considering a system of stochastic variables it is often the case that the variables are dependent of each other. This can, for instance, be seen by measuring the covariance between a pair of variables. Using graphical models, it is possible to visualize the dependence structure found in a set of stochastic variables. Using ordinary graphical models, such as Markov networks, Bayesian networks, and Gaussian graphical models, the type of dependencies that can be modeled is limited to marginal and conditional (in)dependencies. The models introduced in this thesis enable the graphical representation of context-speci c independencies, i.e. conditional independencies that hold only in a subset of the outcome space of the conditioning variables. In the articles included in this thesis, we introduce several types of graphical models that can represent context-speci c independencies. Models for both discrete variables and continuous variables are considered. A wide range of properties are examined for the introduced models, including identi ability, robustness, scoring, and optimization. In one article, a predictive classi er which utilizes context-speci c independence models is introduced. This classi er clearly demonstrates the potential bene ts of the introduced models. The purpose of the material included in the thesis prior to the articles is to provide the basic theory needed to understand the articles.Temat för avhandlingen är kontextspecifikt oberoende i grafiska modeller. Inom sannolikhetslära och statistik är en stokastisk variabel en variabel som påverkas av slumpen. Till skillnad från vanliga matematiska variabler antar en stokastisk variabel ett givet värde med en viss sannolikhet. För en mängd stokastiska variabler gäller det i regel att variablerna är beroende av varandra. Graden av beroende kan t.ex. mätas med kovariansen mellan två variabler. Med hjälp av grafiska modeller är det möjligt att visualisera beroendestrukturen för ett system av stokastiska variabler. Med hjälp av traditionella grafiska modeller såsom Markov nätverk, Bayesianska nätverk och Gaussiska grafiska modeller är det möjligt att visualisera marginellt och betingat oberoende. De modeller som introduceras i denna avhandling möjliggör en grafisk representation av kontextspecifikt oberoende, d.v.s. betingat oberoende som endast håller i en delmängd av de betingande variablernas utfallsrum. I artiklarna som inkluderats i avhandlingen introduceras flera typer av grafiska modeller som kan representera kontextspecifika oberoende. Både diskreta och kontinuerliga system behandlas. För dessa modeller undersöks många egenskaper inklusive identifierbarhet, stabilitet, modelljämförelse och optimering. I en artikel introduceras en prediktiv klassificerare som utnyttjar kontextspecifikt oberoende i grafiska modeller. Denna klassificerare visar tydligt hur användningen av kontextspecifika oberoende kan leda till förbättrade resultat i praktiska tillämpningar

    Short CFD simulation activities in the context of fluid-mechanical learning in a multidisciplinary student body

    Get PDF
    17 p.Simulation activities are a useful tool to improve competence in industrial engineering bachelors. Specifically, fluid simulation allows students to acquire important skills to strengthen their theoretical knowledge and improve their future professional career. However, these tools usually require long training times and they are usually not available in the subjects of B.Sc. degrees. In this article, a new methodology based on short lessons is raised and evaluated in the fluid-mechanical subject for students enrolled in three different bachelor degree groups: B.Sc. in Mechanical Engineering, B.Sc. in Electrical Engineering and B.Sc. in Electronic and Automatic Engineering. Statistical results show a good acceptance in terms of usability, learning, motivation, thinking over, satisfaction and scalability. Additionally, a machine-learning based approach was applied to find group peculiarities and differences among them in order to identify the need for further personalization of the learning activity.S

    Bayes classifiers for imbalanced traffic accidents datasets

    Full text link
    [EN] Traffic accidents data sets are usually imbalanced, where the number of instances classified under the killed or severe injuries class (minority) is much lower than those classified under the slight injuries class (majority). This, however, supposes a challenging problem for classification algorithms and may cause obtaining a model that well cover the slight injuries instances whereas the killed or severe injuries instances are misclassified frequently. Based on traffic accidents data collected on urban and suburban roads in Jordan for three years (2009-2011); three different data balancing techniques were used: under sampling which removes some instances of the majority class, oversampling which creates new instances of the minority class and a mix technique that combines both. In addition, different Bayes classifiers were compared for the different imbalanced and balanced data sets: Averaged One-Dependence Estimators, Weightily Average One-Dependence Estimators, and Bayesian networks in order to identify factors that affect the severity of an accident. The results indicated that using the balanced data sets, especially those created using oversampling techniques, with Bayesian networks improved classifying a traffic accident according to its severity and reduced the misclassification of killed and severe injuries instances. On the other hand, the following variables were found to contribute to the occurrence of a killed causality or a severe injury in a traffic accident: number of vehicles involved, accident pattern, number of directions, accident type, lighting, surface condition, and speed limit. This work, to the knowledge of the authors, is the first that aims at analyzing historical data records for traffic accidents occurring in Jordan and the first to apply balancing techniques to analyze injury severity of traffic accidents. (C) 2015 Elsevier Ltd. All rights reserved.The authors are grateful to the Police Traffic Department in Jordan for providing the data necessary for this research. Griselda Lopez wishes to express her acknowledgement to the regional ministry of Economy, Innovation and Science of the regional government of Andalusia (Spain) for their scholarship to train teachers and researchers in Deficit Areas, which has made this work possible. The authors appreciate the reviewers' comments and effort in order to improve the paper.Mujalli, R.; LĂłpez-Maldonado, G.; Garach, L. (2016). Bayes classifiers for imbalanced traffic accidents datasets. Accident Analysis & Prevention. 88:37-51. https://doi.org/10.1016/j.aap.2015.12.003S37518

    Monitoring and Managing Interaction Patterns in Human-Robot Interaction

    Get PDF
    Nowadays, one of the most challenging problems in Human-Robot Interaction (HRI) is to make robots able to understand humans to successfully accomplish tasks in human environments. HRI has a very different role in all the robotics fields. While autonomous robots do not require a complex HRI system, it is of vital importance for service robots. The goal of this thesis is to study if behavioural patterns that users unconsciously apply when interacting with a robot can be useful to recognise the users' intentions in a particular situation. To carry out this study a prototype has been developed to test in an automatic and objective way, if those interaction patterns performed by several users in the area of service robots are useful to recognise their intentions and disambiguate unclear situations.By using verbal and non-verbal communication that the user unconsciously applies when interacting with a robot, we want to determine automatically what the user is trying to present

    Learning extended tree augmented naive structures

    Get PDF
    This work proposes an extended version of the well-known tree-augmented naive Bayes (TAN) classifier where the structure learning step is performed without requiring features to be connected to the class. Based on a modification of Edmonds' algorithm, our structure learning procedure explores a superset of the structures that are considered by TAN, yet achieves global optimality of the learning score function in a very efficient way (quadratic in the number of features, the same complexity as learning TANs). We enhance our procedure with a new score function that only takes into account arcs that are relevant to predict the class, as well as an optimization over the equivalent sample size during learning. These ideas may be useful for structure learning of Bayesian networks in general. A range of experiments shows that we obtain models with better prediction accuracy than naive Bayes and TAN, and comparable to the accuracy of the state-of-the-art classifier averaged one-dependence estimator (AODE). We release our implementation of ETAN so that it can be easily installed and run within Weka

    Analysis of Roadway Traffic Accidents Based on Rough Sets and Bayesian Networks

    Get PDF
    The paper integrates Rough Sets (RS) and Bayesian Networks (BN) for roadway traffic accident analysis. RS reduction of attributes is first employed to generate the key set of attributes affecting accident outcomes, which are then fed into a BN structure as nodes for BN construction and accident outcome classification. Such RS-based BN framework combines the advantages of RS in knowledge reduction capability and BN in describing interrelationships among different attributes. The framework is demonstrated using the 100-car naturalistic driving data from Virginia Tech Transportation Institute to predict accident type. Comparative evaluation with the baseline BNs shows the RS-based BNs generally have a higher prediction accuracy and lower network complexity while with comparable prediction coverage and receiver operating characteristic curve area, proving that the proposed RS-based BN overall outperforms the BNs with/without traditional feature selection approaches. The proposed RS-based BN indicates the most significant attributes that affect accident types include pre-crash manoeuvre, driver’s attention from forward roadway to centre mirror, number of secondary tasks undertaken, traffic density, and relation to junction, most of which feature pre-crash driver states and driver behaviours that have not been extensively researched in literature, and could give further insight into the nature of traffic accidents.</p

    The classification performance of Bayesian Networks Classifiers: a case study of detecting Denial of Service (DoS) attacks in cloud computing environments

    Get PDF
    In this research we propose a Bayesian networks approach as a promissory classification technique for detecting malicious traffic due to Denial of Service (DoS) attacks. Bayesian networks have been applied in numerous fields fraught with uncertainty and they have been proved to be successful. They have excelled tremendously in classification tasks i.e. text analysis, medical diagnoses and environmental modeling and management. The detection of DoS attacks has received tremendous attention in the field of network security. DoS attacks have proved to be detrimental and are the bane of cloud computing environments. Large business enterprises have been/or are still unwilling to outsource their businesses to the cloud due to the intrusive tendencies that the cloud platforms are prone too. To make use of Bayesian networks it is imperative to understand the ―ecosystem‖ of factors that are external to modeling the Bayesian algorithm itself. Understanding these factors have proven to result in comparable improvement in classification performance beyond the augmentation of the existing algorithms. Literature provides discussions pertaining to the factors that impact the classification capability, however it was noticed that the effects of the factors are not universal, they tend to be unique for each domain problem. This study investigates the effects of modeling parameters on the classification performance of Bayesian network classifiers in detecting DoS attacks in cloud platforms. We analyzed how structural complexity, training sample size, the choice of discretization method and lastly the score function both individually and collectively impact the performance of classifying between normal and DoS attacks on the cloud. To study the aforementioned factors, we conducted a series of experiments in detecting live DoS attacks launched against a deployed cloud and thereafter examined the classification performance in terms of accuracy of different classes of Bayesian networks. NSL-KDD dataset was used as our training set. We used ownCloud software to deploy our cloud platform. To launch DoS attacks, we used hping3 hacker friendly utility. A live packet capture was used as our test set. WEKA version 3.7.12 was used for our experiments. Our results show that the progression in model complexity improves the classification performance. This is attributed to the increase in the number of attribute correlations. Also the size of the training sample size proved to improve classification ability. Our findings noted that the choice of discretization algorithm does matter in the quest for optimal classification performance. Furthermore, our results indicate that the choice of scoring function does not affect the classification performance of Bayesian networks. Conclusions drawn from this research are prescriptive particularly for a novice machine learning researcher with valuable recommendations that ensure optimal classification performance of Bayesian networks classifiers

    Advancing Environmental Human Health Risk Assessment through Bayesian Network Analysis

    Get PDF
    Regulatory agencies rely on quantitative risk assessment to design policies, such as environmental quality standards, to protect public health. Although risk assessment forms the foundation of important policy decisions, recent reviews have indicated the need for technical and practical improvements to risk assessment. This dissertation advances the application of Bayesian networks (BNs) in environmental human health risk assessment in response to this need. BNs were developed to support causal inference in artificial intelligence applications but are not currently used by environmental regulatory agencies. First, a proof-of-concept BN is developed to test BN performance in predicting the effect of maternal exposure to arsenic in drinking water on the risk of newborn lower birthweight for gestational age. The network is the first of its kind to model a dose-response relationship connecting an environmental hazard to a human health outcome. In addition, unlike prevailing regulatory risk assessment approaches, it accounts for inter-individual metabolic differences. The BN is shown to outperform current regulatory risk assessment methods in balancing predictive sensitivity and specificity. Second, a BN is developed to predict the effect of arsenic exposure in drinking water on the risk of diabetes and prediabetes, while accounting for inter-individual differences in arsenic metabolism and body mass index. In addition, the BN’s utility to risk managers is demonstrated by using the model to predict the population-level health consequences of reduced arsenic exposure (including decreased diabetes prevalence). These predictions demonstrate the importance of considering both cancer and non-cancer outcomes when making policy. BNs’ ability to facilitate cost-benefit calculations in regulatory contexts is highlighted. Finally, improvements to risk assessment utility by using BNs are illustrated through a model developed to quantify risk to wastewater treatment workers of contracting Ebola virus disease from contact with contaminated wastewater during an outbreak. The model is used to identify key factors affecting risk and captures risk under different mitigation strategies. These results suggest that BNs offer a quantitatively sophisticated, flexible, and transparent method that addresses key challenges in current risk assessment practice in support of policymaking.Doctor of Philosoph
    corecore