35 research outputs found

    A Quantile Variant of the EM Algorithm and Its Applications to Parameter Estimation with Interval Data

    Full text link
    The expectation-maximization (EM) algorithm is a powerful computational technique for finding the maximum likelihood estimates for parametric models when the data are not fully observed. The EM is best suited for situations where the expectation in each E-step and the maximization in each M-step are straightforward. A difficulty with the implementation of the EM algorithm is that each E-step requires the integration of the log-likelihood function in closed form. The explicit integration can be avoided by using what is known as the Monte Carlo EM (MCEM) algorithm. The MCEM uses a random sample to estimate the integral at each E-step. However, the problem with the MCEM is that it often converges to the integral quite slowly and the convergence behavior can also be unstable, which causes a computational burden. In this paper, we propose what we refer to as the quantile variant of the EM (QEM) algorithm. We prove that the proposed QEM method has an accuracy of O(1/K2)O(1/K^2) while the MCEM method has an accuracy of Op(1/K)O_p(1/\sqrt{K}). Thus, the proposed QEM method possesses faster and more stable convergence properties when compared with the MCEM algorithm. The improved performance is illustrated through the numerical studies. Several practical examples illustrating its use in interval-censored data problems are also provided

    Characterization of exponential distribution via regression of one record value on two non-adjacent record values

    Full text link
    We characterize the exponential distribution as the only one which satisfies a regression condition. This condition involves the regression function of a fixed record value given two other record values, one of them being previous and the other next to the fixed record value, and none of them are adjacent. In particular, it turns out that the underlying distribution is exponential if and only if given the first and last record values, the expected value of the median in a sample of record values equals the sample midrange.Comment: To appear in Metrik

    Reconsidering Association Testing Methods Using Single-Variant Test Statistics as Alternatives to Pooling Tests for Sequence Data with Rare Variants

    Get PDF
    Association tests that pool minor alleles into a measure of burden at a locus have been proposed for case-control studies using sequence data containing rare variants. However, such pooling tests are not robust to the inclusion of neutral and protective variants, which can mask the association signal from risk variants. Early studies proposing pooling tests dismissed methods for locus-wide inference using nonnegative single-variant test statistics based on unrealistic comparisons. However, such methods are robust to the inclusion of neutral and protective variants and therefore may be more useful than previously appreciated. In fact, some recently proposed methods derived within different frameworks are equivalent to performing inference on weighted sums of squared single-variant score statistics. In this study, we compared two existing methods for locus-wide inference using nonnegative single-variant test statistics to two widely cited pooling tests under more realistic conditions. We established analytic results for a simple model with one rare risk and one rare neutral variant, which demonstrated that pooling tests were less powerful than even Bonferroni-corrected single-variant tests in most realistic situations. We also performed simulations using variants with realistic minor allele frequency and linkage disequilibrium spectra, disease models with multiple rare risk variants and extensive neutral variation, and varying rates of missing genotypes. In all scenarios considered, existing methods using nonnegative single-variant test statistics had power comparable to or greater than two widely cited pooling tests. Moreover, in disease models with only rare risk variants, an existing method based on the maximum single-variant Cochran-Armitage trend chi-square statistic in the locus had power comparable to or greater than another existing method closely related to some recently proposed methods. We conclude that efficient locus-wide inference using single-variant test statistics should be reconsidered as a useful framework for devising powerful association tests in sequence data with rare variants

    The Newcomb-Benford Law in Its Relation to Some Common Distributions

    Get PDF
    An often reported, but nevertheless persistently striking observation, formalized as the Newcomb-Benford law (NBL), is that the frequencies with which the leading digits of numbers occur in a large variety of data are far away from being uniform. Most spectacular seems to be the fact that in many data the leading digit 1 occurs in nearly one third of all cases. Explanations for this uneven distribution of the leading digits were, among others, scale- and base-invariance. Little attention, however, found the interrelation between the distribution of the significant digits and the distribution of the observed variable. It is shown here by simulation that long right-tailed distributions of a random variable are compatible with the NBL, and that for distributions of the ratio of two random variables the fit generally improves. Distributions not putting most mass on small values of the random variable (e.g. symmetric distributions) fail to fit. Hence, the validity of the NBL needs the predominance of small values and, when thinking of real-world data, a majority of small entities. Analyses of data on stock prices, the areas and numbers of inhabitants of countries, and the starting page numbers of papers from a bibliography sustain this conclusion. In all, these findings may help to understand the mechanisms behind the NBL and the conditions needed for its validity. That this law is not only of scientific interest per se, but that, in addition, it has also substantial implications can be seen from those fields where it was suggested to be put into practice. These fields reach from the detection of irregularities in data (e.g. economic fraud) to optimizing the architecture of computers regarding number representation, storage, and round-off errors

    Accuracy of clinical pallor in the diagnosis of anaemia in children: a meta-analysis

    Get PDF
    BACKGROUND: Anaemia is highly prevalent in children of developing countries. It is associated with impaired physical growth and mental development. Palmar pallor is recommended at primary level for diagnosing it, on the basis of few studies. The objective of the study was to systematically assess the accuracy of clinical signs in the diagnosis of anaemia in children. METHODS: A systematic review on the accuracy of clinical signs of anaemia in children. We performed an Internet search in various databases and an additional reference tracking. Studies had to be on performance of clinical signs in the diagnosis of anaemia, using haemoglobin as the gold standard. We calculated pooled diagnostic likelihood ratios (LR's) and odds ratios (DOR's) for each clinical sign at different haemoglobin thresholds. RESULTS: Eleven articles met the inclusion criteria. Most studies were performed in Africa, in children underfive. Chi-square test for proportions and Cochran Q for DOR's and for LR's showed heterogeneity. Type of observer and haemoglobin technique influenced the results. Pooling was done using the random effects model. Pooled DOR at haemoglobin <11 g/dL was 4.3 (95% CI 2.6–7.2) for palmar pallor, 3.7 (2.3–5.9) for conjunctival pallor, and 3.4 (1.8–6.3) for nailbed pallor. DOR's and LR's were slightly better for nailbed pallor at all other haemoglobin thresholds. The accuracy did not vary substantially after excluding outliers. CONCLUSION: This meta-analysis did not document a highly accurate clinical sign of anaemia. In view of poor performance of clinical signs, universal iron supplementation may be an adequate control strategy in high prevalence areas. Further well-designed studies are needed in settings other than Africa. They should assess inter-observer variation, performance of combined clinical signs, phenotypic differences, and different degrees of anaemia

    Survival dimensionality reduction (SDR): development and clinical application of an innovative approach to detect epistasis in presence of right-censored data

    Get PDF
    Contains fulltext : 89126.pdf (publisher's version ) (Open Access)BACKGROUND: Epistasis is recognized as a fundamental part of the genetic architecture of individuals. Several computational approaches have been developed to model gene-gene interactions in case-control studies, however, none of them is suitable for time-dependent analysis. Herein we introduce the Survival Dimensionality Reduction (SDR) algorithm, a non-parametric method specifically designed to detect epistasis in lifetime datasets. RESULTS: The algorithm requires neither specification about the underlying survival distribution nor about the underlying interaction model and proved satisfactorily powerful to detect a set of causative genes in synthetic epistatic lifetime datasets with a limited number of samples and high degree of right-censorship (up to 70%). The SDR method was then applied to a series of 386 Dutch patients with active rheumatoid arthritis that were treated with anti-TNF biological agents. Among a set of 39 candidate genes, none of which showed a detectable marginal effect on anti-TNF responses, the SDR algorithm did find that the rs1801274 SNP in the Fc gamma RIIa gene and the rs10954213 SNP in the IRF5 gene non-linearly interact to predict clinical remission after anti-TNF biologicals. CONCLUSIONS: Simulation studies and application in a real-world setting support the capability of the SDR algorithm to model epistatic interactions in candidate-genes studies in presence of right-censored data. Availability: http://sourceforge.net/projects/sdrproject/

    Modelling and simulating non-stationary arrival processes to facilitate analysis

    No full text
    This paper introduces a method to model and simulate non-stationary, non-renewal arrival processes that depends only on the analyst setting intuitive and easily controllable parameters. Thus, it is suitable for assessing the impact of non-stationary, non-exponential, and non-independent arrivals on simulated performance when they are suspected. A specific implementation of the method is also described and provided for download
    corecore