Search CORE

22 research outputs found

Testing for an ignorable sampling bias under random double truncation

Author: De Uña Alvarez Jacobo
Publication venue: 'Wiley'
Publication date: 14/06/2023
Field of study

In clinical and epidemiological research doubly truncated data often appear. This is the case, for instance, when the data registry is formed by interval sampling. Double truncation generally induces a sampling bias on the target variable, so proper corrections of ordinary estimation and inference procedures must be used. Unfortunately, the nonparametric maximum likelihood estimator of a doubly truncated distribution has several drawbacks, like potential nonexistence and nonuniqueness issues, or large estimation variance. Interestingly, no correction for double truncation is needed when the sampling bias is ignorable, which may occur with interval sampling and other sampling designs. In such a case the ordinary empirical distribution function is a consistent and fully efficient estimator that generally brings remarkable variance improvements compared to the nonparametric maximum likelihood estimator. Thus, identification of such situations is critical for the simple and efficient estimation of the target distribution. In this article, we introduce for the first time formal testing procedures for the null hypothesis of ignorable sampling bias with doubly truncated data. The asymptotic properties of the proposed test statistic are investigated. A bootstrap algorithm to approximate the null distribution of the test in practice is introduced. The finite sample performance of the method is studied in simulated scenarios. Finally, applications to data on onset for childhood cancer and Parkinson’s disease are given. Variance improvements in estimation are discussed and illustrated.Agencia Estatal de Investigación | Ref. PID2020-118101GB-I0

Investigo

‘SGoFicance Trace’: Assessing Significance in High Dimensional Testing Problems

Author: Carvajal-Rodriguez Antonio
de Uña-Alvarez Jacobo
Publication venue: Public Library of Science
Publication date: 01/01/2010
Field of study

Recently, an exact binomial test called SGoF (Sequential Goodness-of-Fit) has been introduced as a new method for handling high dimensional testing problems. SGoF looks for statistical significance when comparing the amount of null hypotheses individually rejected at level γ = 0.05 with the expected amount under the intersection null, and then proceeds to declare a number of effects accordingly. SGoF detects an increasing proportion of true effects with the number of tests, unlike other methods for which the opposite is true. It is worth mentioning that the choice γ = 0.05 is not essential to the SGoF procedure, and more power may be reached at other values of γ depending on the situation. In this paper we enhance the possibilities of SGoF by letting the γ vary on the whole interval (0,1). In this way, we introduce the ‘SGoFicance Trace’ (from SGoF's significance trace), a graphical complement to SGoF which can help to make decisions in multiple-testing problems. A script has been written for the computation in R of the SGoFicance Trace. This script is available from the web site http://webs.uvigo.es/acraaj/SGoFicance.htm

CiteSeerX

Public Library of Science (PLOS)

Investigo

LAReferencia - Red Federada de Repositorios Institucionales de Publicaciones Científicas Latinoamericanas

Directory of Open Access Journals

PubMed Central

Estimation of Spanish Households’ Duration of Residence from Data on Current Residence Time

Author: Jacobo de Uña Alvarez
Mª. Soledad Otero Giráldez
Raquel Arévalo Tomé
Publication venue
Publication date
Field of study

The study of the housing market’s peculiarities is of great interest, since they influence the economy and welfare of a given country.Households’ duration of residence plays a key role for explaining private decisions (as buying or renting a house) as well as public decisions (politics oriented to increase the leasing offer or to reduce the cost when entering a house for the first time). Despite of the relevance of this durable good in socio-economics, we are not aware of any investigation on the households’ duration of residence in Spain, which constitutes an extra motivation for performing such a study. The goal of the present work is inferring the distribution of households’ duration of residence from data on current residence time. The Theory of Renewal Processes will be a key tool in our study. Our data come from the Survey of Family Budgets (1980 and 1990) and from the Complete Panel Survey of Homes in the EU (2000). The richness of these data allows for an evaluation of the heterogeneity of the duration of residence according to some variables of interest: tenure, geographic localization, total salary, and value of the housing.Residential duration, EPF, PHOGUE, Renewal Processes

Research Papers in Economics

Assessing Significance in High-Throughput Experiments by Sequential Goodness of Fit and q-Value Estimation

Author: A Carvajal-Rodriguez
A Carvajal-Rodriguez
A Carvajal-Rodriguez
A Farcomeni
Antonio Carvajal-Rodriguez
AP Diz
B Efron
C Dalmasso
DV Zaykin
Ioannis P. Androulakis
J de Uña-Alvarez
J de Uña-Alvarez
J Storey
Jacobo de Uña-Alvarez
JD Storey
JD Storey
JD Storey
KI Kim
N Meinshausen
SB Pounds
W Barry
WH Press
Y Benjamini
Y Benjamini
Y Benjamini
Publication venue: Public Library of Science
Publication date: 01/01/2011
Field of study

We developed a new multiple hypothesis testing adjustment called SGoF+ implemented as a sequential goodness of fit metatest which is a modification of a previous algorithm, SGoF, taking advantage of the information of the distribution of p-values in order to fix the rejection region. The new method uses a discriminant rule based on the maximum distance between the uniform distribution of p-values and the observed one, to set the null for a binomial test. This new approach shows a better power/pFDR ratio than SGoF. In fact SGoF+ automatically sets the threshold leading to the maximum power and the minimum false non-discovery rate inside the SGoF' family of algorithms. Additionally, we suggest combining the information provided by SGoF+ with the estimate of the FDR that has been committed when rejecting a given set of nulls. We study different positive false discovery rate, pFDR, estimation methods to combine q-value estimates jointly with the information provided by the SGoF+ method. Simulations suggest that the combination of SGoF+ metatest with the q-value information is an interesting strategy to deal with multiple testing issues. These techniques are provided in the latest version of the SGoF+ software freely available at http://webs.uvigo.es/acraaj/SGoF.htm

CiteSeerX

Public Library of Science (PLOS)

Investigo

Crossref

LAReferencia - Red Federada de Repositorios Institucionales de Publicaciones Científicas Latinoamericanas

Directory of Open Access Journals

PubMed Central

A new multitest correction (SGoF) that increases its statistical power when increasing the number of tests

Author: A Farcomeni
A Gordon
AG Clark
Antonio Carvajal-Rodríguez
C Dalmasso
C Kendziorski
D Greenbaum
D Nguyen
Emilio Rolán-Alvarez
G Marenne
H Yang
J Lu
Jacobo de Uña-Alvarez
JD Storey
JD Storey
K Strimmer
KF Manly
KJ Rothman
LJ Martin
M Martínez-Fernández
P Broberg
P Perco
PH Westfall
RR Sokal
S Guindon
S Holm
S Pounds
S Pounds
T Fawcett
TS Mehta
W Barry
WH Press
WR Rice
Y Benjamini
Y Benjamini
Y Benjamini
Y Pawitan
Publication venue: BioMed Central
Publication date: 01/01/2009
Field of study

Abstract Background The detection of true significant cases under multiple testing is becoming a fundamental issue when analyzing high-dimensional biological data. Unfortunately, known multitest adjustments reduce their statistical power as the number of tests increase. We propose a new multitest adjustment, based on a sequential goodness of fit metatest (SGoF), which increases its statistical power with the number of tests. The method is compared with Bonferroni and FDR-based alternatives by simulating a multitest context via two different kinds of tests: 1) one-sample t-test, and 2) homogeneity G-test. Results It is shown that SGoF behaves especially well with small sample sizes when 1) the alternative hypothesis is weakly to moderately deviated from the null model, 2) there are widespread effects through the family of tests, and 3) the number of tests is large. Conclusion Therefore, SGoF should become an important tool for multitest adjustment when working with high-dimensional biological data.</p

Crossref

Springer - Publisher Connector

Directory of Open Access Journals

PubMed Central

On the statistical properties of SGoF multitesting method

Author: De Uña Alvarez Jacobo
Publication venue: 'Walter de Gruyter GmbH'
Publication date: 10/01/2019
Field of study

In this paper we establish the statistical properties of SGoF multitesting method under a mixture model. It is assumed that the available set of p-values is statistically independent. Special attention is paid to the huge dimension problem in which the number of tests goes to infinity. Formulae for the power and the rate of false discoveries/non-discoveries of SGoF are given, so the role of the gamma-parameter of SGoF is understood. The existing connection between SGoF and a test of significance for the proportion of non-true nulls below gamma is explored. This connection suggests a possible modification of SGoF which may improve the power of the method. Simulation studies and a real data illustration are included.Xunta de Galicia | Ref. 10PXIB300068PRMinisterio de Ciencia e Innovación | Ref. MTM2008-0312

Investigo

Nonparametric estimation of transition probabilities for a general progressive multi-state model under cross-sectional sampling

Author: De Uña Alvarez Jacobo
Mandel Micha
Publication venue: 'Wiley'
Publication date: 01/01/2018
Field of study

Nonparametric estimation of the transition probability matrix of a progressive multi‐state model is considered under cross‐sectional sampling. Two different estimators adapted to possibly right‐censored and left‐truncated data are proposed. The estimators require full retrospective information before the truncation time, which, when exploited, increases efficiency. They are obtained as differences between two survival functions constructed for sub‐samples of subjects occupying specific states at a certain time point. Both estimators correct the oversampling of relatively large survival times by using the left‐truncation times associated with the cross‐sectional observation. Asymptotic results are established, and finite sample performance is investigated through simulations. One of the proposed estimators performs better when there is no censoring, while the second one is strongly recommended with censored data. The new estimators are applied to data on patients in intensive care units (ICUs).Spanish Ministry of Economy and Competitiveness | Ref. MTM2014-55966-PThe Israel Science Foundation | Ref. Grant No. 519/1

Investigo

LAReferencia - Red Federada de Repositorios Institucionales de Publicaciones Científicas Latinoamericanas

Kernel density estimation with doubly truncated data

Author: De Uña Alvarez Jacobo
Moreira Carla Maria
Publication venue: 'Institute of Mathematical Statistics'
Publication date: 29/06/2018
Field of study

In some applications with astronomical and survival data, doubly truncated data are sometimes encountered. In this work we introduce kernel-type density estimation for a random variable which is sampled under random double truncation. Two different estimators are considered. As usual, the estimators are defined as a convolution between a kernel function and an estimator of the cumulative distribution function, which may be the NPMLE [2] or a semiparametric estimator [9]. Asymptotic properties of the introduced estimators are explored. Their finite sample behaviour is investigated through simulations

Investigo

On the Statistical Properties of SGoF Multitesting Method

Author: de Uña-Alvarez Jacobo
Publication venue
Publication date
Field of study

Research Papers in Economics

Estimation of Transition Probabilities for the Illness-Death Model: Package TP.idm

Author: Balboa Vanesa
De Uña Alvarez Jacobo
Publication venue: 'Foundation for Open Access Statistic'
Publication date: 21/03/2019
Field of study

In this paper the R package TP.idm to compute an empirical transition probability matrix for the illness-death model is introduced. This package implements a novel nonparametric estimator which is particularly well suited for non-Markov processes observed under right censoring. Variance estimates and confidence limits are also implemented in the package.Spanish Ministry of Economy and Competitiveness | Ref. MTM2014-55966-

Investigo