225 research outputs found
Discovering a junction tree behind a Markov network by a greedy algorithm
In an earlier paper we introduced a special kind of k-width junction tree,
called k-th order t-cherry junction tree in order to approximate a joint
probability distribution. The approximation is the best if the Kullback-Leibler
divergence between the true joint probability distribution and the
approximating one is minimal. Finding the best approximating k-width junction
tree is NP-complete if k>2. In our earlier paper we also proved that the best
approximating k-width junction tree can be embedded into a k-th order t-cherry
junction tree. We introduce a greedy algorithm resulting very good
approximations in reasonable computing time.
In this paper we prove that if the Markov network underlying fullfills some
requirements then our greedy algorithm is able to find the true probability
distribution or its best approximation in the family of the k-th order t-cherry
tree probability distributions. Our algorithm uses just the k-th order marginal
probability distributions as input.
We compare the results of the greedy algorithm proposed in this paper with
the greedy algorithm proposed by Malvestuto in 1991.Comment: The paper was presented at VOCAL 2010 in Veszprem, Hungar
Identification and characterisation of constitutional chromosome abnormalities using arrays of bacterial artificial chromosomes
The importance of sedimenting organic matter, relative to oxygen and temperature, in structuring lake profundal macroinvertebrate assemblages
We quantified the role of a main food
resource, sedimenting organic matter (SOM), relative
to oxygen (DO) and temperature (TEMP) in structuring
profundal macroinvertebrate assemblages in
boreal lakes. SOM from 26 basins of 11 Finnish lakes
was analysed for quantity (sedimentation rates),
quality (C:N:P stoichiometry) and origin (carbon
stable isotopes, d13C). Hypolimnetic oxygen and
temperature were measured from each site during
summer stratification. Partial canonical correspondence
analysis (CCA) and partial regression analyses
were used to quantify contributions of SOM, DO and
TEMP to community composition and three macroinvertebrate
metrics. The results suggested a major
contribution of SOM in regulating the community
composition and total biomass. Oxygen best explained
the Shannon diversity, whereas TEMP had largest
contribution to the variation of Benthic Quality Index.
Community composition was most strongly related to d13C of SOM. Based on additional d13C and stoichiometric
analyses of chironomid taxa, marked differences
were apparent in their utilization of SOM and
body stoichiometry; taxa characteristic of oligotrophic
conditions exhibited higher C:N ratios and lower C:P
and N:P ratios compared to the species typical of
eutrophic lakes. The results highlight the role of SOM
in regulating benthic communities and the distributions
of individual species, particularly in oligotrophic
systems
Modelling with non-stratified chain event graphs
© 2019, Springer Nature Switzerland AG. Chain Event Graphs (CEGs) are recent probabilistic graphical modelling tools that have proved successful in modelling scenarios with context-specific independencies. Although the theory underlying CEGs supports appropriate representation of structural zeroes, the literature so far does not provide an adaptation of the vanilla CEG methods for a real-world application presenting structural zeroes also known as the non-stratified CEG class. To illustrate these methods, we present a non-stratified CEG representing a public health intervention designed to reduce the risk and rate of falling in the elderly. We then compare the CEG model to the more conventional Bayesian Network model when applied to this setting
Worldwide trends in diabetes since 1980: a pooled analysis of 751 population-based studies with 4.4 million participants
BACKGROUND: One of the global targets for non-communicable diseases is to halt, by 2025, the rise in the age-standardised adult prevalence of diabetes at its 2010 levels. We aimed to estimate worldwide trends in diabetes, how likely it is for countries to achieve the global target, and how changes in prevalence, together with population growth and ageing, are affecting the number of adults with diabetes. METHODS: We pooled data from population-based studies that had collected data on diabetes through measurement of its biomarkers. We used a Bayesian hierarchical model to estimate trends in diabetes prevalence—defined as fasting plasma glucose of 7·0 mmol/L or higher, or history of diagnosis with diabetes, or use of insulin or oral hypoglycaemic drugs—in 200 countries and territories in 21 regions, by sex and from 1980 to 2014. We also calculated the posterior probability of meeting the global diabetes target if post-2000 trends continue. FINDINGS: We used data from 751 studies including 4 372 000 adults from 146 of the 200 countries we make estimates for. Global age-standardised diabetes prevalence increased from 4·3% (95% credible interval 2·4–7·0) in 1980 to 9·0% (7·2–11·1) in 2014 in men, and from 5·0% (2·9–7·9) to 7·9% (6·4–9·7) in women. The number of adults with diabetes in the world increased from 108 million in 1980 to 422 million in 2014 (28·5% due to the rise in prevalence, 39·7% due to population growth and ageing, and 31·8% due to interaction of these two factors). Age-standardised adult diabetes prevalence in 2014 was lowest in northwestern Europe, and highest in Polynesia and Micronesia, at nearly 25%, followed by Melanesia and the Middle East and north Africa. Between 1980 and 2014 there was little change in age-standardised diabetes prevalence in adult women in continental western Europe, although crude prevalence rose because of ageing of the population. By contrast, age-standardised adult prevalence rose by 15 percentage points in men and women in Polynesia and Micronesia. In 2014, American Samoa had the highest national prevalence of diabetes (>30% in both sexes), with age-standardised adult prevalence also higher than 25% in some other islands in Polynesia and Micronesia. If post-2000 trends continue, the probability of meeting the global target of halting the rise in the prevalence of diabetes by 2025 at the 2010 level worldwide is lower than 1% for men and is 1% for women. Only nine countries for men and 29 countries for women, mostly in western Europe, have a 50% or higher probability of meeting the global target. INTERPRETATION: Since 1980, age-standardised diabetes prevalence in adults has increased, or at best remained unchanged, in every country. Together with population growth and ageing, this rise has led to a near quadrupling of the number of adults with diabetes worldwide. The burden of diabetes, both in terms of prevalence and number of adults affected, has increased faster in low-income and middle-income countries than in high-income countries. FUNDING: Wellcome Trust
Conditional independence relations among biological markers may improve clinical decision as in the case of triple negative breast cancers
The associations existing among different biomarkers are important in clinical settings because they contribute to the characterisation of specific pathways related to the natural history of the disease, genetic and environmental determinants. Despite the availability of binary/linear (or at least monotonic) correlation indices, the full exploitation of molecular information depends on the knowledge of direct/indirect conditional independence (and eventually causal) relationships among biomarkers, and with target variables in the population of interest. In other words, that depends on inferences which are performed on the joint multivariate distribution of markers and target variables. Graphical models, such as Bayesian Networks, are well suited to this purpose. Therefore, we reconsidered a previously published case study on classical biomarkers in breast cancer, namely estrogen receptor (ER), progesterone receptor (PR), a proliferative index (Ki67/MIB-1) and to protein HER2/neu (NEU) and p53, to infer conditional independence relations existing in the joint distribution by inferring (learning) the structure of graphs entailing those relations of independence. We also examined the conditional distribution of a special molecular phenotype, called triple-negative, in which ER, PR and NEU were absent. We confirmed that ER is a key marker and we found that it was able to define subpopulations of patients characterized by different conditional independence relations among biomarkers. We also found a preliminary evidence that, given a triple-negative profile, the distribution of p53 protein is mostly supported in 'zero' and 'high' states providing useful information in selecting patients that could benefit from an adjuvant anthracyclines/alkylating agent-based chemotherapy
Resolving Individuals Contributing Trace Amounts of DNA to Highly Complex Mixtures Using High-Density SNP Genotyping Microarrays
We use high-density single nucleotide polymorphism (SNP) genotyping microarrays to demonstrate the ability to accurately and robustly determine whether individuals are in a complex genomic DNA mixture. We first develop a theoretical framework for detecting an individual's presence within a mixture, then show, through simulations, the limits associated with our method, and finally demonstrate experimentally the identification of the presence of genomic DNA of specific individuals within a series of highly complex genomic mixtures, including mixtures where an individual contributes less than 0.1% of the total genomic DNA. These findings shift the perceived utility of SNPs for identifying individual trace contributors within a forensics mixture, and suggest future research efforts into assessing the viability of previously sub-optimal DNA sources due to sample contamination. These findings also suggest that composite statistics across cohorts, such as allele frequency or genotype counts, do not mask identity within genome-wide association studies. The implications of these findings are discussed
Measuring inequality: tools and an illustration
BACKGROUND: This paper examines an aspect of the problem of measuring inequality in health services. The measures that are commonly applied can be misleading because such measures obscure the difficulty in obtaining a complete ranking of distributions. The nature of the social welfare function underlying these measures is important. The overall object is to demonstrate that varying implications for the welfare of society result from inequality measures. METHOD: Various tools for measuring a distribution are applied to some illustrative data on four distributions about mental health services. Although these data refer to this one aspect of health, the exercise is of broader relevance than mental health. The summary measures of dispersion conventionally used in empirical work are applied to the data here, such as the standard deviation, the coefficient of variation, the relative mean deviation and the Gini coefficient. Other, less commonly used measures also are applied, such as Theil's Index of Entropy, Atkinson's Measure (using two differing assumptions about the inequality aversion parameter). Lorenz curves are also drawn for these distributions. RESULTS: Distributions are shown to have differing rankings (in terms of which is more equal than another), depending on which measure is applied. CONCLUSION: The scope and content of the literature from the past decade about health inequalities and inequities suggest that the economic literature from the past 100 years about inequality and inequity may have been overlooked, generally speaking, in the health inequalities and inequity literature. An understanding of economic theory and economic method, partly introduced in this article, is helpful in analysing health inequality and inequity
Rapid detection of allelic losses in brain tumours using microsatellite repeat markers and high-performance liquid chromatography
Lab Retriever: a software tool for calculating likelihood ratios incorporating a probability of drop-out for forensic DNA profiles
BACKGROUND: Technological advances have enabled the analysis of very small amounts of DNA in forensic cases. However, the DNA profiles from such evidence are frequently incomplete and can contain contributions from multiple individuals. The complexity of such samples confounds the assessment of the statistical weight of such evidence. One approach to account for this uncertainty is to use a likelihood ratio framework to compare the probability of the evidence profile under different scenarios. While researchers favor the likelihood ratio framework, few open-source software solutions with a graphical user interface implementing these calculations are available for practicing forensic scientists. RESULTS: To address this need, we developed Lab Retriever, an open-source, freely available program that forensic scientists can use to calculate likelihood ratios for complex DNA profiles. Lab Retriever adds a graphical user interface, written primarily in JavaScript, on top of a C++ implementation of the previously published R code of Balding. We redesigned parts of the original Balding algorithm to improve computational speed. In addition to incorporating a probability of allelic drop-out and other critical parameters, Lab Retriever computes likelihood ratios for hypotheses that can include up to four unknown contributors to a mixed sample. These computations are completed nearly instantaneously on a modern PC or Mac computer. CONCLUSIONS: Lab Retriever provides a practical software solution to forensic scientists who wish to assess the statistical weight of evidence for complex DNA profiles. Executable versions of the program are freely available for Mac OSX and Windows operating systems. ELECTRONIC SUPPLEMENTARY MATERIAL: The online version of this article (doi:10.1186/s12859-015-0740-8) contains supplementary material, which is available to authorized users
- …
