22 research outputs found

    Testing for an ignorable sampling bias under random double truncation

    Get PDF
    In clinical and epidemiological research doubly truncated data often appear. This is the case, for instance, when the data registry is formed by interval sampling. Double truncation generally induces a sampling bias on the target variable, so proper corrections of ordinary estimation and inference procedures must be used. Unfortunately, the nonparametric maximum likelihood estimator of a doubly truncated distribution has several drawbacks, like potential nonexistence and nonuniqueness issues, or large estimation variance. Interestingly, no correction for double truncation is needed when the sampling bias is ignorable, which may occur with interval sampling and other sampling designs. In such a case the ordinary empirical distribution function is a consistent and fully efficient estimator that generally brings remarkable variance improvements compared to the nonparametric maximum likelihood estimator. Thus, identification of such situations is critical for the simple and efficient estimation of the target distribution. In this article, we introduce for the first time formal testing procedures for the null hypothesis of ignorable sampling bias with doubly truncated data. The asymptotic properties of the proposed test statistic are investigated. A bootstrap algorithm to approximate the null distribution of the test in practice is introduced. The finite sample performance of the method is studied in simulated scenarios. Finally, applications to data on onset for childhood cancer and Parkinson’s disease are given. Variance improvements in estimation are discussed and illustrated.Agencia Estatal de Investigación | Ref. PID2020-118101GB-I0

    ‘SGoFicance Trace’: Assessing Significance in High Dimensional Testing Problems

    Get PDF
    Recently, an exact binomial test called SGoF (Sequential Goodness-of-Fit) has been introduced as a new method for handling high dimensional testing problems. SGoF looks for statistical significance when comparing the amount of null hypotheses individually rejected at level γ = 0.05 with the expected amount under the intersection null, and then proceeds to declare a number of effects accordingly. SGoF detects an increasing proportion of true effects with the number of tests, unlike other methods for which the opposite is true. It is worth mentioning that the choice γ = 0.05 is not essential to the SGoF procedure, and more power may be reached at other values of γ depending on the situation. In this paper we enhance the possibilities of SGoF by letting the γ vary on the whole interval (0,1). In this way, we introduce the ‘SGoFicance Trace’ (from SGoF's significance trace), a graphical complement to SGoF which can help to make decisions in multiple-testing problems. A script has been written for the computation in R of the SGoFicance Trace. This script is available from the web site http://webs.uvigo.es/acraaj/SGoFicance.htm

    Estimation of Spanish Households’ Duration of Residence from Data on Current Residence Time

    Get PDF
    The study of the housing market’s peculiarities is of great interest, since they influence the economy and welfare of a given country.Households’ duration of residence plays a key role for explaining private decisions (as buying or renting a house) as well as public decisions (politics oriented to increase the leasing offer or to reduce the cost when entering a house for the first time). Despite of the relevance of this durable good in socio-economics, we are not aware of any investigation on the households’ duration of residence in Spain, which constitutes an extra motivation for performing such a study. The goal of the present work is inferring the distribution of households’ duration of residence from data on current residence time. The Theory of Renewal Processes will be a key tool in our study. Our data come from the Survey of Family Budgets (1980 and 1990) and from the Complete Panel Survey of Homes in the EU (2000). The richness of these data allows for an evaluation of the heterogeneity of the duration of residence according to some variables of interest: tenure, geographic localization, total salary, and value of the housing.Residential duration, EPF, PHOGUE, Renewal Processes

    Assessing Significance in High-Throughput Experiments by Sequential Goodness of Fit and q-Value Estimation

    Get PDF
    We developed a new multiple hypothesis testing adjustment called SGoF+ implemented as a sequential goodness of fit metatest which is a modification of a previous algorithm, SGoF, taking advantage of the information of the distribution of p-values in order to fix the rejection region. The new method uses a discriminant rule based on the maximum distance between the uniform distribution of p-values and the observed one, to set the null for a binomial test. This new approach shows a better power/pFDR ratio than SGoF. In fact SGoF+ automatically sets the threshold leading to the maximum power and the minimum false non-discovery rate inside the SGoF' family of algorithms. Additionally, we suggest combining the information provided by SGoF+ with the estimate of the FDR that has been committed when rejecting a given set of nulls. We study different positive false discovery rate, pFDR, estimation methods to combine q-value estimates jointly with the information provided by the SGoF+ method. Simulations suggest that the combination of SGoF+ metatest with the q-value information is an interesting strategy to deal with multiple testing issues. These techniques are provided in the latest version of the SGoF+ software freely available at http://webs.uvigo.es/acraaj/SGoF.htm

    A new multitest correction (SGoF) that increases its statistical power when increasing the number of tests

    Get PDF
    <p>Abstract</p> <p>Background</p> <p>The detection of true significant cases under multiple testing is becoming a fundamental issue when analyzing high-dimensional biological data. Unfortunately, known multitest adjustments reduce their statistical power as the number of tests increase. We propose a new multitest adjustment, based on a sequential goodness of fit metatest (SGoF), which increases its statistical power with the number of tests. The method is compared with Bonferroni and FDR-based alternatives by simulating a multitest context via two different kinds of tests: 1) one-sample t-test, and 2) homogeneity G-test.</p> <p>Results</p> <p>It is shown that SGoF behaves especially well with small sample sizes when 1) the alternative hypothesis is weakly to moderately deviated from the null model, 2) there are widespread effects through the family of tests, and 3) the number of tests is large.</p> <p>Conclusion</p> <p>Therefore, SGoF should become an important tool for multitest adjustment when working with high-dimensional biological data.</p

    On the statistical properties of SGoF multitesting method

    Get PDF
    In this paper we establish the statistical properties of SGoF multitesting method under a mixture model. It is assumed that the available set of p-values is statistically independent. Special attention is paid to the huge dimension problem in which the number of tests goes to infinity. Formulae for the power and the rate of false discoveries/non-discoveries of SGoF are given, so the role of the gamma-parameter of SGoF is understood. The existing connection between SGoF and a test of significance for the proportion of non-true nulls below gamma is explored. This connection suggests a possible modification of SGoF which may improve the power of the method. Simulation studies and a real data illustration are included.Xunta de Galicia | Ref. 10PXIB300068PRMinisterio de Ciencia e InnovaciĂłn | Ref. MTM2008-0312

    Nonparametric estimation of transition probabilities for a general progressive multi-state model under cross-sectional sampling

    Get PDF
    Nonparametric estimation of the transition probability matrix of a progressive multi‐state model is considered under cross‐sectional sampling. Two different estimators adapted to possibly right‐censored and left‐truncated data are proposed. The estimators require full retrospective information before the truncation time, which, when exploited, increases efficiency. They are obtained as differences between two survival functions constructed for sub‐samples of subjects occupying specific states at a certain time point. Both estimators correct the oversampling of relatively large survival times by using the left‐truncation times associated with the cross‐sectional observation. Asymptotic results are established, and finite sample performance is investigated through simulations. One of the proposed estimators performs better when there is no censoring, while the second one is strongly recommended with censored data. The new estimators are applied to data on patients in intensive care units (ICUs).Spanish Ministry of Economy and Competitiveness | Ref. MTM2014-55966-PThe Israel Science Foundation | Ref. Grant No. 519/1

    Kernel density estimation with doubly truncated data

    Get PDF
    In some applications with astronomical and survival data, doubly truncated data are sometimes encountered. In this work we introduce kernel-type density estimation for a random variable which is sampled under random double truncation. Two different estimators are considered. As usual, the estimators are defined as a convolution between a kernel function and an estimator of the cumulative distribution function, which may be the NPMLE [2] or a semiparametric estimator [9]. Asymptotic properties of the introduced estimators are explored. Their finite sample behaviour is investigated through simulations

    On the Statistical Properties of SGoF Multitesting Method

    No full text
    In this paper we establish the statistical properties of SGoF multitesting method under a mixture model. It is assumed that the available set of p-values is statistically independent. Special attention is paid to the huge dimension problem in which the number of tests goes to infinity. Formulae for the power and the rate of false discoveries/non-discoveries of SGoF are given, so the role of the gamma-parameter of SGoF is understood. The existing connection between SGoF and a test of significance for the proportion of non-true nulls below gamma is explored. This connection suggests a possible modification of SGoF which may improve the power of the method. Simulation studies and a real data illustration are included.

    Estimation of Transition Probabilities for the Illness-Death Model: Package TP.idm

    No full text
    In this paper the R package TP.idm to compute an empirical transition probability matrix for the illness-death model is introduced. This package implements a novel nonparametric estimator which is particularly well suited for non-Markov processes observed under right censoring. Variance estimates and confidence limits are also implemented in the package.Spanish Ministry of Economy and Competitiveness | Ref. MTM2014-55966-
    corecore