272 research outputs found

    Semi-parametric estimation of the hazard function in a model with covariate measurement error

    Get PDF
    We consider a model where the failure hazard function, conditional on a covariate ZZ is given by R(t,θ0Z)=η_γ0(t)f_β0(Z)R(t,\theta^0|Z)=\eta\_{\gamma^0}(t)f\_{\beta^0}(Z), with θ0=(β0,γ0)Rm+p\theta^0=(\beta^0,\gamma^0)^\top\in \mathbb{R}^{m+p}. The baseline hazard function η_γ0\eta\_{\gamma^0} and relative risk f_β0f\_{\beta^0} belong both to parametric families. The covariate ZZ is measured through the error model U=Z+ϵU=Z+\epsilon where ϵ\epsilon is independent from ZZ, with known density f_ϵf\_\epsilon. We observe a nn-sample (X_i,D_i,U_i)(X\_i, D\_i, U\_i), i=1,...,ni=1,...,n, where X_iX\_i is the minimum between the failure time and the censoring time, and D_iD\_i is the censoring indicator. We aim at estimating θ0\theta^0 in presence of the unknown density gg. Our estimation procedure based on least squares criterion provide two estimators. The first one minimizes an estimation of the least squares criterion where gg is estimated by density deconvolution. Its rate depends on the smoothnesses of f_ϵf\_\epsilon and f_β(z)f\_\beta(z) as a function of zz,. We derive sufficient conditions that ensure the n\sqrt{n}-consistency. The second estimator is constructed under conditions ensuring that the least squares criterion can be directly estimated with the parametric rate. These estimators, deeply studied through examples are in particular n\sqrt{n}-consistent and asymptotically Gaussian in the Cox model and in the excess risk model, whatever is f_ϵf\_\epsilon

    Unsupervised Classification for Tiling Arrays: ChIP-chip and Transcriptome

    Full text link
    Tiling arrays make possible a large scale exploration of the genome thanks to probes which cover the whole genome with very high density until 2 000 000 probes. Biological questions usually addressed are either the expression difference between two conditions or the detection of transcribed regions. In this work we propose to consider simultaneously both questions as an unsupervised classification problem by modeling the joint distribution of the two conditions. In contrast to previous methods, we account for all available information on the probes as well as biological knowledge like annotation and spatial dependence between probes. Since probes are not biologically relevant units we propose a classification rule for non-connected regions covered by several probes. Applications to transcriptomic and ChIP-chip data of Arabidopsis thaliana obtained with a NimbleGen tiling array highlight the importance of a precise modeling and the region classification

    Estimation of the hazard function in a semiparametric model with covariate measurement error

    Get PDF
    International audienceWe consider a failure hazard function, conditional on a time-independent covariate , given by . The baseline hazard function and the relative risk both belong to parametric families with . The covariate has an unknown density and is measured with an error through an additive error model where is a random variable, independent from , with known density . We observe a -sample , = 1, ..., , where is the minimum between the failure time and the censoring time, and is the censoring indicator. Using least square criterion and deconvolution methods, we propose a consistent estimator of using the observations , = 1, ..., .
We give an upper bound for its risk which depends on the smoothness properties of and as a function of , and we derive sufficient conditions for the -consistency. We give detailed examples considering various type of relative risks and various types of error density . In particular, in the Cox model and in the excess risk model, the estimator of is -consistent and asymptotically Gaussian regardless of the form of

    Normalization for triple-target microarray experiments

    Get PDF
    <p>Abstract</p> <p>Background</p> <p>Most microarray studies are made using labelling with one or two dyes which allows the hybridization of one or two samples on the same slide. In such experiments, the most frequently used dyes are <it>Cy</it>3 and <it>Cy</it>5. Recent improvements in the technology (dye-labelling, scanner and, image analysis) allow hybridization up to four samples simultaneously. The two additional dyes are <it>Alexa</it>488 and <it>Alexa</it>494. The triple-target or four-target technology is very promising, since it allows more flexibility in the design of experiments, an increase in the statistical power when comparing gene expressions induced by different conditions and a scaled down number of slides. However, there have been few methods proposed for statistical analysis of such data. Moreover the lowess correction of the global dye effect is available for only two-color experiments, and even if its application can be derived, it does not allow simultaneous correction of the raw data.</p> <p>Results</p> <p>We propose a two-step normalization procedure for triple-target experiments. First the dye bleeding is evaluated and corrected if necessary. Then the signal in each channel is normalized using a generalized lowess procedure to correct a global dye bias. The normalization procedure is validated using triple-self experiments and by comparing the results of triple-target and two-color experiments. Although the focus is on triple-target microarrays, the proposed method can be used to normalize <it>p </it>differently labelled targets co-hybridized on a same array, for any value of <it>p </it>greater than 2.</p> <p>Conclusion</p> <p>The proposed normalization procedure is effective: the technical biases are reduced, the number of false positives is under control in the analysis of differentially expressed genes, and the triple-target experiments are more powerful than the corresponding two-color experiments. There is room for improving the microarray experiments by simultaneously hybridizing more than two samples.</p

    Clustering high-throughput sequencing data with Poisson mixture models

    Get PDF
    In recent years gene expression studies have increasingly made use of next generation sequencing technology. In turn, research concerning the appropriate statistical methods for the analysis of digital gene expression has flourished, primarily in the context of normalization and differential analysis. In this work, we focus on the question of clustering digital gene expression profiles as a means to discover groups of co-expressed genes. We propose two parameterizations of a Poisson mixture model to cluster expression profiles of high-throughput sequencing data. A set of simulation studies compares the performance of the proposed models with that of an approach developed for a similar type of data, namely serial analysis of gene expression. We also study the performance of these approaches on two real high-throughput sequencing data sets. The R package HTSCluster used to implement the proposed Poisson mixture models is available on CRAN.De plus en plus, les études d'expression de gènes utilisent les techniques de séquençage de nouvelle génération, entraînant une recherche grandissante sur les méthodes les plus appropriées pour l'exploitation des données digitales d'expression, à commencer pour leur normalisation et l'analyse différentielle. Ici, nous nous intéressons à la classification non supervisée des profils d'expression pour la découverte de groupes de gènes coexprimés. Nous proposons deux paramétrisations d'un modèle de mélange de Poisson pour classer des données de séquençage haut-débit. Par des simulations, nous comparons les performances de ces modèles avec des méthodes similaires conçus pour l'analyse en série de l'expression des gènes (SAGE). Nous étudions aussi les performances de ces modèles sur deux jeux de données réelles. Le package R HTSCluster associé à cette étude est disponible sur le CRAN

    Variable Selection in Model-based Clustering: A General Variable Role Modeling

    Get PDF
    The currently available variable selection procedures in model-based clustering assume that the irrelevant clustering variables are all independent or are all linked with the relevant clustering variables. We propose a more versatile variable selection model which describes three possible roles for each variable: The relevant clustering variables, the irrelevant clustering variables dependent on a part of the relevant clustering variables and the irrelevant clustering variables totally independent of all the relevant variables. A model selection criterion and a variable selection algorithm are derived for this new variable role modeling. The model identifiability and the consistency of the variable selection criterion are also established. Numerical experiments on simulated datasets and on a real dataset highlight the interest of this new modeling

    Comparing Model Selection and Regularization Approaches to Variable Selection in Model-Based Clustering

    Get PDF
    International audienceWe compare two major approaches to variable selection in clustering: model selection and regularization. Based on previous results, we select the method of Maugis et al. (2009b), which modified the method of Raftery and Dean (2006), as a current state of the art model selection method. We select the method of Witten and Tibshirani (2010) as a current state of the art regularization method. We compared the methods by simulation in terms of their accuracy in both classification and variable selection. In the first simulation experiment all the variables were conditionally independent given cluster membership. We found that variable selection (of either kind) yielded substantial gains in classification accuracy when the clusters were well separated, but few gains when the clusters were close together. We found that the two variable selection methods had comparable classification accuracy, but that the model selection approach had substantially better accuracy in selecting variables. In our second simulation experiment, there were correlations among the variables given the cluster memberships. We found that the model selection approach was substantially more accurate in terms of both classification and variable selection than the regularization approach, and that both gave more accurate classifications than K-means without variable selection. But the model selection approach is not available in a very high dimension contextNous considérons deux approches importantes pour la sélection de variables en classification non supervisée : la sélection par modèle et la régularisation. Parmi les procédures existantes de sélection de variables par sélection de modèles, nous choisissons la méthode de Maugis et al. (2009b), généralisation de celle de Raftery et Dean (2006). Pour les méthodes fondées sur la régularisation, nous nous intéressons à la méthode de Witten and Tibshirani (2010). Nous comparons les performances de classification et de sélection de variables de ces deux procédures sur des données simulées. Nous montrons que la sélection de variables permet d’améliorer la classification quand les classes sont bien séparées. Les deux procédures de sélection de variables étudiées donnent des classifications analogues dans le premier exemple, mais l’approche par sélection de modèles a de meilleures performances pour la sélection de variables. Dans le second exemple, les variables sont corrélées. Nous montrons que l’approche par sélection de modèles améliore globalement la classification et la sélection de variables par rapport à la régularisation, et les deux procédures donnent de meilleurs résultats que l’algorithme des K-means (sans sélection de variables) pour la classification. Mais, il convient de noter que la sélection par modèles est inopérante pour les très grandes dimensions. Enfin, ce travail de comparaison est également mené sur des données réelles

    Search for the genes involved in oocyte maturation and early embryo development in the hen

    Get PDF
    <p>Abstract</p> <p>Background</p> <p>The initial stages of development depend on mRNA and proteins accumulated in the oocyte, and during these stages, certain genes are essential for fertilization, first cleavage and embryonic genome activation. The aim of this study was first to search for avian oocyte-specific genes using an <it>in silico </it>and a microarray approaches, then to investigate the temporal and spatial dynamics of the expression of some of these genes during follicular maturation and early embryogenesis.</p> <p>Results</p> <p>The <it>in silico </it>approach allowed us to identify 18 chicken homologs of mouse potential oocyte genes found by digital differential display. Using the chicken Affymetrix microarray, we identified 461 genes overexpressed in granulosa cells (GCs) and 250 genes overexpressed in the germinal disc (GD) of the hen oocyte. Six genes were identified using both <it>in silico </it>and microarray approaches. Based on GO annotations, GC and GD genes were differentially involved in biological processes, reflecting different physiological destinations of these two cell layers. Finally we studied the spatial and temporal dynamics of the expression of 21 chicken genes. According to their expression patterns all these genes are involved in different stages of final follicular maturation and/or early embryogenesis in the chicken. Among them, 8 genes (<it>btg4</it>, <it>chkmos</it>, <it>wee</it>, <it>zpA</it>, <it>dazL</it>, <it>cvh</it>, <it>zar1 </it>and <it>ktfn) </it>were preferentially expressed in the maturing occyte and <it>cvh</it>, <it>zar1 </it>and <it>ktfn </it>were also highly expressed in the early embryo.</p> <p>Conclusion</p> <p>We showed that <it>in silico </it>and Affymetrix microarray approaches were relevant and complementary in order to find new avian genes potentially involved in oocyte maturation and/or early embryo development, and allowed the discovery of new potential chicken mature oocyte and chicken granulosa cell markers for future studies. Moreover, detailed study of the expression of some of these genes revealed promising candidates for maternal effect genes in the chicken. Finally, the finding concerning the different state of rRNA compared to that of mRNA during the postovulatory period shed light on some mechanisms through which oocyte to embryo transition occurs in the hen.</p
    corecore