233 research outputs found

    Micro-analysis of seriation skills

    Get PDF

    A Bayesian method for evaluating and discovering disease loci associations

    Get PDF
    Background: A genome-wide association study (GWAS) typically involves examining representative SNPs in individuals from some population. A GWAS data set can concern a million SNPs and may soon concern billions. Researchers investigate the association of each SNP individually with a disease, and it is becoming increasingly commonplace to also analyze multi-SNP associations. Techniques for handling so many hypotheses include the Bonferroni correction and recently developed Bayesian methods. These methods can encounter problems. Most importantly, they are not applicable to a complex multi-locus hypothesis which has several competing hypotheses rather than only a null hypothesis. A method that computes the posterior probability of complex hypotheses is a pressing need. Methodology/Findings: We introduce the Bayesian network posterior probability (BNPP) method which addresses the difficulties. The method represents the relationship between a disease and SNPs using a directed acyclic graph (DAG) model, and computes the likelihood of such models using a Bayesian network scoring criterion. The posterior probability of a hypothesis is computed based on the likelihoods of all competing hypotheses. The BNPP can not only be used to evaluate a hypothesis that has previously been discovered or suspected, but also to discover new disease loci associations. The results of experiments using simulated and real data sets are presented. Our results concerning simulated data sets indicate that the BNPP exhibits both better evaluation and discovery performance than does a p-value based method. For the real data sets, previous findings in the literature are confirmed and additional findings are found. Conclusions/Significance: We conclude that the BNPP resolves a pressing problem by providing a way to compute the posterior probability of complex multi-locus hypotheses. A researcher can use the BNPP to determine the expected utility of investigating a hypothesis further. Furthermore, we conclude that the BNPP is a promising method for discovering disease loci associations. © 2011 Jiang et al

    Inequality and violent crime: evidence from data on robbery and violent theft

    Get PDF
    This article argues that the link between income inequality and violent property crime might be spurious, complementing a similar argument in prior analysis by the author on the determinants of homicide. In contrast, Fajnzylber, Lederman & Loayza (1998; 2002a, b) provide seemingly strong and robust evidence that inequality causes a higher rate of both homicide and robbery/violent theft even after controlling for country-specific fixed effects. Our results suggest that inequality is not a statistically significant determinant, unless either country-specific effects are not controlled for or the sample is artificially restricted to a small number of countries. The reason why the link between inequality and violent property crime might be spurious is that income inequality is likely to be strongly correlated with country-specific fixed effects such as cultural differences. A high degree of inequality might be socially undesirable for any number of reasons, but that it causes violent crime is far from proven

    Uniform random generation of large acyclic digraphs

    Full text link
    Directed acyclic graphs are the basic representation of the structure underlying Bayesian networks, which represent multivariate probability distributions. In many practical applications, such as the reverse engineering of gene regulatory networks, not only the estimation of model parameters but the reconstruction of the structure itself is of great interest. As well as for the assessment of different structure learning algorithms in simulation studies, a uniform sample from the space of directed acyclic graphs is required to evaluate the prevalence of certain structural features. Here we analyse how to sample acyclic digraphs uniformly at random through recursive enumeration, an approach previously thought too computationally involved. Based on complexity considerations, we discuss in particular how the enumeration directly provides an exact method, which avoids the convergence issues of the alternative Markov chain methods and is actually computationally much faster. The limiting behaviour of the distribution of acyclic digraphs then allows us to sample arbitrarily large graphs. Building on the ideas of recursive enumeration based sampling we also introduce a novel hybrid Markov chain with much faster convergence than current alternatives while still being easy to adapt to various restrictions. Finally we discuss how to include such restrictions in the combinatorial enumeration and the new hybrid Markov chain method for efficient uniform sampling of the corresponding graphs.Comment: 15 pages, 2 figures. To appear in Statistics and Computin

    Learning genetic epistasis using Bayesian network scoring criteria

    Get PDF
    <p>Abstract</p> <p>Background</p> <p>Gene-gene epistatic interactions likely play an important role in the genetic basis of many common diseases. Recently, machine-learning and data mining methods have been developed for learning epistatic relationships from data. A well-known combinatorial method that has been successfully applied for detecting epistasis is <it>Multifactor Dimensionality Reduction </it>(MDR). Jiang et al. created a combinatorial epistasis learning method called <it>BNMBL </it>to learn Bayesian network (BN) epistatic models. They compared BNMBL to MDR using simulated data sets. Each of these data sets was generated from a model that associates two SNPs with a disease and includes 18 unrelated SNPs. For each data set, BNMBL and MDR were used to score all 2-SNP models, and BNMBL learned significantly more correct models. In real data sets, we ordinarily do not know the number of SNPs that influence phenotype. BNMBL may not perform as well if we also scored models containing more than two SNPs. Furthermore, a number of other BN scoring criteria have been developed. They may detect epistatic interactions even better than BNMBL.</p> <p>Although BNs are a promising tool for learning epistatic relationships from data, we cannot confidently use them in this domain until we determine which scoring criteria work best or even well when we try learning the correct model without knowledge of the number of SNPs in that model.</p> <p>Results</p> <p>We evaluated the performance of 22 BN scoring criteria using 28,000 simulated data sets and a real Alzheimer's GWAS data set. Our results were surprising in that the Bayesian scoring criterion with large values of a hyperparameter called α performed best. This score performed better than other BN scoring criteria and MDR at <it>recall </it>using simulated data sets, at detecting the hardest-to-detect models using simulated data sets, and at substantiating previous results using the real Alzheimer's data set.</p> <p>Conclusions</p> <p>We conclude that representing epistatic interactions using BN models and scoring them using a BN scoring criterion holds promise for identifying epistatic genetic variants in data. In particular, the Bayesian scoring criterion with large values of a hyperparameter α appears more promising than a number of alternatives.</p

    A spatio‑temporal model of homicide in El Salvador

    Get PDF
    This paper examines the spatio-temporal evolution of homicide across the municipalities of El Salvador. It aims at identifying both temporal trends and spatial clusters that may contribute to the formation of time-stable corridors lying behind a historically (recurrent) high homicide rate. The results from this study reveal the presence of significant clusters of high homicide municipalities in the Western part of the country that have remained stable over time, and a process of formation of high homicide clusters in the Eastern region. The results show an increasing homicide trend from 2002 to 2013 with significant municipality-specific differential trends across the country. The data suggests that links may exist between the dynamics of homicide rates, drug trafficking and organized crime

    An evolutionary technique to approximate multiple optimal alignments

    Get PDF
    The alignment of observed and modeled behavior is an essential aid for organizations, since it opens the door for root-cause analysis and enhancement of processes. The state-of-the-art technique for computing alignments has exponential time and space complexity, hindering its applicability for medium and large instances. Moreover, the fact that there may be multiple optimal alignments is perceived as a negative situation, while in reality it may provide a more comprehensive picture of the model’s explanation of observed behavior, from which other techniques may benefit. This paper presents a novel evolutionary technique for approximating multiple optimal alignments. Remarkably, the memory footprint of the proposed technique is bounded, representing an unprecedented guarantee with respect to the state-of-the-art methods for the same task. The technique is implemented into a tool, and experiments on several benchmarks are provided.Peer ReviewedPostprint (author's final draft

    Dynamic reliability assessment of flare systems by combining fault tree analysis and Bayesian networks

    Get PDF
    YesFlaring is a combustion process commonly used in the oil and gas industry to dispose flammable waste gases. Flare flameout occurs when these gases escape unburnt from the flare tip causing the discharge of flammable and/or toxic vapor clouds. The toxic gases released during this process have the potential to initiate safety hazards and cause serious harm to the ecosystem and human health. Flare flameout could be caused by environmental conditions, equipment failure, and human error. However, to better understand the causes of flare flameout, a rigorous analysis of the behavior of flare systems under failure conditions is required. In this article, we used fault tree analysis (FTA) and the dynamic Bayesian network (DBN) to assess the reliability of flare systems. In this study, we analyzed 40 different combinations of basic events that can cause flare flameout to determine the event with the highest impact on system failure. In the quantitative analysis, we use both constant and time-dependent failure rates of system components. The results show that combining these two approaches allows for robust probabilistic reasoning on flare system reliability, which can help improving the safety and asset integrity of process facilities. The proposed DBN model constitutes a significant step to improve the safety and reliability of flare systems in the oil and gas industry

    A genetic algorithm-Bayesian network approach for the analysis of metabolomics and spectroscopic data: application to the rapid detection of Bacillus spores and identification of Bacillus species

    Get PDF
    Background The rapid identification of Bacillus spores and bacterial identification are paramount because of their implications in food poisoning, pathogenesis and their use as potential biowarfare agents. Many automated analytical techniques such as Curie-point pyrolysis mass spectrometry (Py-MS) have been used to identify bacterial spores giving use to large amounts of analytical data. This high number of features makes interpretation of the data extremely difficult We analysed Py-MS data from 36 different strains of aerobic endospore-forming bacteria encompassing seven different species. These bacteria were grown axenically on nutrient agar and vegetative biomass and spores were analyzed by Curie-point Py-MS. Results We develop a novel genetic algorithm-Bayesian network algorithm that accurately identifies sand selects a small subset of key relevant mass spectra (biomarkers) to be further analysed. Once identified, this subset of relevant biomarkers was then used to identify Bacillus spores successfully and to identify Bacillus species via a Bayesian network model specifically built for this reduced set of features. Conclusions This final compact Bayesian network classification model is parsimonious, computationally fast to run and its graphical visualization allows easy interpretation of the probabilistic relationships among selected biomarkers. In addition, we compare the features selected by the genetic algorithm-Bayesian network approach with the features selected by partial least squares-discriminant analysis (PLS-DA). The classification accuracy results show that the set of features selected by the GA-BN is far superior to PLS-DA

    Constraint solving in uncertain and dynamic environments - a survey

    Get PDF
    International audienceThis article follows a tutorial, given by the authors on dynamic constraint solving at CP 2003 (Ninth International Conference on Principles and Practice of Constraint Programming) in Kinsale, Ireland. It aims at offering an overview of the main approaches and techniques that have been proposed in the domain of constraint satisfaction to deal with uncertain and dynamic environments
    • …
    corecore