225 research outputs found

    Discovering a junction tree behind a Markov network by a greedy algorithm

    Full text link
    In an earlier paper we introduced a special kind of k-width junction tree, called k-th order t-cherry junction tree in order to approximate a joint probability distribution. The approximation is the best if the Kullback-Leibler divergence between the true joint probability distribution and the approximating one is minimal. Finding the best approximating k-width junction tree is NP-complete if k>2. In our earlier paper we also proved that the best approximating k-width junction tree can be embedded into a k-th order t-cherry junction tree. We introduce a greedy algorithm resulting very good approximations in reasonable computing time. In this paper we prove that if the Markov network underlying fullfills some requirements then our greedy algorithm is able to find the true probability distribution or its best approximation in the family of the k-th order t-cherry tree probability distributions. Our algorithm uses just the k-th order marginal probability distributions as input. We compare the results of the greedy algorithm proposed in this paper with the greedy algorithm proposed by Malvestuto in 1991.Comment: The paper was presented at VOCAL 2010 in Veszprem, Hungar

    The importance of sedimenting organic matter, relative to oxygen and temperature, in structuring lake profundal macroinvertebrate assemblages

    Get PDF
    We quantified the role of a main food resource, sedimenting organic matter (SOM), relative to oxygen (DO) and temperature (TEMP) in structuring profundal macroinvertebrate assemblages in boreal lakes. SOM from 26 basins of 11 Finnish lakes was analysed for quantity (sedimentation rates), quality (C:N:P stoichiometry) and origin (carbon stable isotopes, d13C). Hypolimnetic oxygen and temperature were measured from each site during summer stratification. Partial canonical correspondence analysis (CCA) and partial regression analyses were used to quantify contributions of SOM, DO and TEMP to community composition and three macroinvertebrate metrics. The results suggested a major contribution of SOM in regulating the community composition and total biomass. Oxygen best explained the Shannon diversity, whereas TEMP had largest contribution to the variation of Benthic Quality Index. Community composition was most strongly related to d13C of SOM. Based on additional d13C and stoichiometric analyses of chironomid taxa, marked differences were apparent in their utilization of SOM and body stoichiometry; taxa characteristic of oligotrophic conditions exhibited higher C:N ratios and lower C:P and N:P ratios compared to the species typical of eutrophic lakes. The results highlight the role of SOM in regulating benthic communities and the distributions of individual species, particularly in oligotrophic systems

    Modelling with non-stratified chain event graphs

    Get PDF
    © 2019, Springer Nature Switzerland AG. Chain Event Graphs (CEGs) are recent probabilistic graphical modelling tools that have proved successful in modelling scenarios with context-specific independencies. Although the theory underlying CEGs supports appropriate representation of structural zeroes, the literature so far does not provide an adaptation of the vanilla CEG methods for a real-world application presenting structural zeroes also known as the non-stratified CEG class. To illustrate these methods, we present a non-stratified CEG representing a public health intervention designed to reduce the risk and rate of falling in the elderly. We then compare the CEG model to the more conventional Bayesian Network model when applied to this setting

    Worldwide trends in diabetes since 1980: a pooled analysis of 751 population-based studies with 4.4 million participants

    Get PDF
    BACKGROUND: One of the global targets for non-communicable diseases is to halt, by 2025, the rise in the age-standardised adult prevalence of diabetes at its 2010 levels. We aimed to estimate worldwide trends in diabetes, how likely it is for countries to achieve the global target, and how changes in prevalence, together with population growth and ageing, are affecting the number of adults with diabetes. METHODS: We pooled data from population-based studies that had collected data on diabetes through measurement of its biomarkers. We used a Bayesian hierarchical model to estimate trends in diabetes prevalence—defined as fasting plasma glucose of 7·0 mmol/L or higher, or history of diagnosis with diabetes, or use of insulin or oral hypoglycaemic drugs—in 200 countries and territories in 21 regions, by sex and from 1980 to 2014. We also calculated the posterior probability of meeting the global diabetes target if post-2000 trends continue. FINDINGS: We used data from 751 studies including 4 372 000 adults from 146 of the 200 countries we make estimates for. Global age-standardised diabetes prevalence increased from 4·3% (95% credible interval 2·4–7·0) in 1980 to 9·0% (7·2–11·1) in 2014 in men, and from 5·0% (2·9–7·9) to 7·9% (6·4–9·7) in women. The number of adults with diabetes in the world increased from 108 million in 1980 to 422 million in 2014 (28·5% due to the rise in prevalence, 39·7% due to population growth and ageing, and 31·8% due to interaction of these two factors). Age-standardised adult diabetes prevalence in 2014 was lowest in northwestern Europe, and highest in Polynesia and Micronesia, at nearly 25%, followed by Melanesia and the Middle East and north Africa. Between 1980 and 2014 there was little change in age-standardised diabetes prevalence in adult women in continental western Europe, although crude prevalence rose because of ageing of the population. By contrast, age-standardised adult prevalence rose by 15 percentage points in men and women in Polynesia and Micronesia. In 2014, American Samoa had the highest national prevalence of diabetes (>30% in both sexes), with age-standardised adult prevalence also higher than 25% in some other islands in Polynesia and Micronesia. If post-2000 trends continue, the probability of meeting the global target of halting the rise in the prevalence of diabetes by 2025 at the 2010 level worldwide is lower than 1% for men and is 1% for women. Only nine countries for men and 29 countries for women, mostly in western Europe, have a 50% or higher probability of meeting the global target. INTERPRETATION: Since 1980, age-standardised diabetes prevalence in adults has increased, or at best remained unchanged, in every country. Together with population growth and ageing, this rise has led to a near quadrupling of the number of adults with diabetes worldwide. The burden of diabetes, both in terms of prevalence and number of adults affected, has increased faster in low-income and middle-income countries than in high-income countries. FUNDING: Wellcome Trust

    Conditional independence relations among biological markers may improve clinical decision as in the case of triple negative breast cancers

    Get PDF
    The associations existing among different biomarkers are important in clinical settings because they contribute to the characterisation of specific pathways related to the natural history of the disease, genetic and environmental determinants. Despite the availability of binary/linear (or at least monotonic) correlation indices, the full exploitation of molecular information depends on the knowledge of direct/indirect conditional independence (and eventually causal) relationships among biomarkers, and with target variables in the population of interest. In other words, that depends on inferences which are performed on the joint multivariate distribution of markers and target variables. Graphical models, such as Bayesian Networks, are well suited to this purpose. Therefore, we reconsidered a previously published case study on classical biomarkers in breast cancer, namely estrogen receptor (ER), progesterone receptor (PR), a proliferative index (Ki67/MIB-1) and to protein HER2/neu (NEU) and p53, to infer conditional independence relations existing in the joint distribution by inferring (learning) the structure of graphs entailing those relations of independence. We also examined the conditional distribution of a special molecular phenotype, called triple-negative, in which ER, PR and NEU were absent. We confirmed that ER is a key marker and we found that it was able to define subpopulations of patients characterized by different conditional independence relations among biomarkers. We also found a preliminary evidence that, given a triple-negative profile, the distribution of p53 protein is mostly supported in 'zero' and 'high' states providing useful information in selecting patients that could benefit from an adjuvant anthracyclines/alkylating agent-based chemotherapy

    Resolving Individuals Contributing Trace Amounts of DNA to Highly Complex Mixtures Using High-Density SNP Genotyping Microarrays

    Get PDF
    We use high-density single nucleotide polymorphism (SNP) genotyping microarrays to demonstrate the ability to accurately and robustly determine whether individuals are in a complex genomic DNA mixture. We first develop a theoretical framework for detecting an individual's presence within a mixture, then show, through simulations, the limits associated with our method, and finally demonstrate experimentally the identification of the presence of genomic DNA of specific individuals within a series of highly complex genomic mixtures, including mixtures where an individual contributes less than 0.1% of the total genomic DNA. These findings shift the perceived utility of SNPs for identifying individual trace contributors within a forensics mixture, and suggest future research efforts into assessing the viability of previously sub-optimal DNA sources due to sample contamination. These findings also suggest that composite statistics across cohorts, such as allele frequency or genotype counts, do not mask identity within genome-wide association studies. The implications of these findings are discussed

    Measuring inequality: tools and an illustration

    Get PDF
    BACKGROUND: This paper examines an aspect of the problem of measuring inequality in health services. The measures that are commonly applied can be misleading because such measures obscure the difficulty in obtaining a complete ranking of distributions. The nature of the social welfare function underlying these measures is important. The overall object is to demonstrate that varying implications for the welfare of society result from inequality measures. METHOD: Various tools for measuring a distribution are applied to some illustrative data on four distributions about mental health services. Although these data refer to this one aspect of health, the exercise is of broader relevance than mental health. The summary measures of dispersion conventionally used in empirical work are applied to the data here, such as the standard deviation, the coefficient of variation, the relative mean deviation and the Gini coefficient. Other, less commonly used measures also are applied, such as Theil's Index of Entropy, Atkinson's Measure (using two differing assumptions about the inequality aversion parameter). Lorenz curves are also drawn for these distributions. RESULTS: Distributions are shown to have differing rankings (in terms of which is more equal than another), depending on which measure is applied. CONCLUSION: The scope and content of the literature from the past decade about health inequalities and inequities suggest that the economic literature from the past 100 years about inequality and inequity may have been overlooked, generally speaking, in the health inequalities and inequity literature. An understanding of economic theory and economic method, partly introduced in this article, is helpful in analysing health inequality and inequity

    Lab Retriever: a software tool for calculating likelihood ratios incorporating a probability of drop-out for forensic DNA profiles

    Get PDF
    BACKGROUND: Technological advances have enabled the analysis of very small amounts of DNA in forensic cases. However, the DNA profiles from such evidence are frequently incomplete and can contain contributions from multiple individuals. The complexity of such samples confounds the assessment of the statistical weight of such evidence. One approach to account for this uncertainty is to use a likelihood ratio framework to compare the probability of the evidence profile under different scenarios. While researchers favor the likelihood ratio framework, few open-source software solutions with a graphical user interface implementing these calculations are available for practicing forensic scientists. RESULTS: To address this need, we developed Lab Retriever, an open-source, freely available program that forensic scientists can use to calculate likelihood ratios for complex DNA profiles. Lab Retriever adds a graphical user interface, written primarily in JavaScript, on top of a C++ implementation of the previously published R code of Balding. We redesigned parts of the original Balding algorithm to improve computational speed. In addition to incorporating a probability of allelic drop-out and other critical parameters, Lab Retriever computes likelihood ratios for hypotheses that can include up to four unknown contributors to a mixed sample. These computations are completed nearly instantaneously on a modern PC or Mac computer. CONCLUSIONS: Lab Retriever provides a practical software solution to forensic scientists who wish to assess the statistical weight of evidence for complex DNA profiles. Executable versions of the program are freely available for Mac OSX and Windows operating systems. ELECTRONIC SUPPLEMENTARY MATERIAL: The online version of this article (doi:10.1186/s12859-015-0740-8) contains supplementary material, which is available to authorized users
    corecore