41 research outputs found

    Estimation of the density of regression errors by pointwise model selection

    Get PDF
    HAL is a multi-disciplinary open access archive for the deposit and dissemination of scientific research documents, whether they are published or not. The documents may come from teaching and research institutions in France or abroad, or from public or private research centers. L'archive ouverte pluridisciplinaire HAL, est destinée au dépôt età la diffusion de documents scientifiques de niveau recherche, publiés ou non, emanant desétablissements d'enseignement et de recherche français ouétrangers, des laboratoires publics ou privés. ESTIMATION OF THE DENSITY OF REGRESSION ERRORS BY POINTWISE MODEL SELECTION S. PLANCADE Abstract. This paper presents two results: a density estimator and an estimator of regression error density. We first propose a density estimator constructed by model selection, which is adaptive for the quadratic risk at a given point. Then we apply this result to estimate the error density in an homoscedastic regression framework Yi = b(Xi) + ǫi, from which we observe a sample (Xi, Yi). Given an adaptive estimator b b of the regression function, we apply the density estimation procedure to the residuals We get an estimator of the density of ǫi whose rate of convergence for the quadratic pointwise risk is the maximum of two rates: the minimax rate we would get if the errors were directly observed and the minimax rate of convergence of b b for the quadratic integrated risk

    Generalization of the normal-exponential model: exploration of a more accurate parametrisation for the signal distribution on Illumina BeadArrays

    Full text link
    Motivation: Illumina BeadArray technology includes negative control features that allow a precise estimation of the background noise. As an alternative to the background subtraction proposed in BeadStudio which leads to an important loss of information by generating negative values, a background correction method modeling the observed intensities as the sum of the exponentially distributed signal and normally distributed noise has been developed. Nevertheless, Wang and Ye (2011) display a kernel-based estimator of the signal distribution on Illumina BeadArrays and suggest that a gamma distribution would represent a better modeling of the signal density. Hence, the normal-exponential modeling may not be appropriate for Illumina data and background corrections derived from this model may lead to wrong estimation. Results: We propose a more flexible modeling based on a gamma distributed signal and a normal distributed background noise and develop the associated background correction. Our model proves to be markedly more accurate to model Illumina BeadArrays: on the one hand, this model offers a more correct fit of the observed intensities. On the other hand, the comparison of the operating characteristics of several background correction procedures on spike-in and on normal-gamma simulated data shows high similarities, reinforcing the validation of the normal-gamma modeling. The performance of the background corrections based on the normal-gamma and normal-exponential models are compared on two dilution data sets. Surprisingly, we observe that the implementation of a more accurate parametrisation in the model-based background correction does not increase the sensitivity. These results may be explained by the operating characteristics of the estimators: the normal-gamma background correction offers an improvement in terms of bias, but at the cost of a loss in precision

    A processual model for functional analyses of carcinogenesis in the prospective cohort design

    Get PDF
    Published version also available at http://dx.doi.org/10.1016/j.mehy.2015.07.006Traditionally, the prospective design has been chosen for risk factor analyses of lifestyle and cancer using mainly estimation by survival analysis methods. With new technologies, epidemiologists can expand their prospective studies to include functional genomics given either as transcriptomics, mRNA and microRNA, or epigenetics in blood or other biological materials. The novel functional analyses should not be assessed using classical survival analyses since the main goal is not risk estimation, but the analysis of functional genomics as part of the dynamic carcinogenic process over time, i.e., a ‘‘processual’’ approach. In the risk factor model, time to event is analysed as a function of exposure variables known at start of follow-up (fixed covariates) or changing over the follow-up period (time-dependent covariates). In the processual model, transcriptomics or epigenetics is considered as functions of time and exposures. The success of this novel approach depends on the development of new statistical methods with the capacity of describing and analysing the time-dependent curves or trajectories for tens of thousands of genes simultaneously. This approach also focuses on multilevel or integrative analyses introducing novel statistical methods in epidemiology. The processual approach as part of systems epidemiology might represent in a near future an alternative to human in vitro studies using human biological material for understanding the mechanisms and pathways involved in carcinogenesis

    A new statistical method for curve group analysis of longitudinal gene expression data illustrated for breast cancer in the NOWAC postgenome cohort as a proof of principle

    Get PDF
    International audienceA new statistical method for curve group analysis of longitudinal gene expression data illustrated for breast cancer in the NOWAC postgenome cohort as a proof of principle Abstract Background: The understanding of changes in temporal processes related to human carcinogenesis is limited. One approach for prospective functional genomic studies is to compile trajectories of differential expression of genes, based on measurements from many case-control pairs. We propose a new statistical method that does not assume any parametric shape for the gene trajectories. Methods: The trajectory of a gene is defined as the curve representing the changes in gene expression levels in the blood as a function of time to cancer diagnosis. In a nested case–control design it consists of differences in gene expression levels between cases and controls. Genes can be grouped into curve groups, each curve group corresponding to genes with a similar development over time. The proposed new statistical approach is based on a set of hypothesis testing that can determine whether or not there is development in gene expression levels over time, and whether this development varies among different strata. Curve group analysis may reveal significant differences in gene expression levels over time among the different strata considered. This new method was applied as a " proof of concept " to breast cancer in the Norwegian Women and Cancer (NOWAC) postgenome cohort, using blood samples collected prospectively that were specifically preserved for transcriptomic analyses (PAX tube). Cohort members diagnosed with invasive breast cancer through 2009 were identified through linkage to the Cancer Registry of Norway, and for each case a random control from the postgenome cohort was also selected, matched by birth year and time of blood sampling, to create a case-control pair. After exclusions, 441 case-control pairs were available for analyses, in which we considered strata of lymph node status at time of diagnosis and time of diagnosis with respect to breast cancer screening visits. Results: The development of gene expression levels in the NOWAC postgenome cohort varied in the last years before breast cancer diagnosis, and this development differed by lymph node status and participation in the Norwegian Breast Cancer Screening Program. The differences among the investigated strata appeared larger in the year before breast cancer diagnosis compared to earlier years.ConclusionsThis approach shows good properties in term of statistical power and type 1 error under minimal assumptions. When applied to a real data set it was able to discriminate between groups of genes with non-linear similar patterns before diagnosis

    Model selection for hazard rate estimation in presence of censoring

    No full text
    International audienceThis note presents an estimator of the hazard rate function based on right censored data. A collection of estimators is built from a regression-type contrast, in a general collection of linear models. Then, a penalised model selection procedure provides an estimator which satisfies an oracle inequality. In particular, we can prove that it is adaptive in the minimax sense on Hölder spaces
    corecore