Search CORE

272 research outputs found

Semi-parametric estimation of the hazard function in a model with covariate measurement error

Author: Martin-Magniette Marie-Laure
Taupin Marie-Luce
Publication venue
Publication date: 08/06/2006
Field of study

We consider a model where the failure hazard function, conditional on a covariate

Z

is given by

R(t,\theta^0|Z)=\eta\_{\gamma^0}(t)f\_{\beta^0}(Z)

, with

\theta^0=(\beta^0,\gamma^0)^\top\in \mathbb{R}^{m+p}

. The baseline hazard function

\eta\_{\gamma^0}

and relative risk

f\_{\beta^0}

belong both to parametric families. The covariate

Z

is measured through the error model

U=Z+\epsilon

where

\epsilon

is independent from

Z

, with known density

f\_\epsilon

. We observe a

n

-sample

(X\_i, D\_i, U\_i)

i=1,...,n

, where

X\_i

is the minimum between the failure time and the censoring time, and

D\_i

is the censoring indicator. We aim at estimating

\theta^0

in presence of the unknown density

g

. Our estimation procedure based on least squares criterion provide two estimators. The first one minimizes an estimation of the least squares criterion where

g

is estimated by density deconvolution. Its rate depends on the smoothnesses of

f\_\epsilon

and

f\_\beta(z)

as a function of

z

,. We derive sufficient conditions that ensure the

\sqrt{n}

-consistency. The second estimator is constructed under conditions ensuring that the least squares criterion can be directly estimated with the parametric rate. These estimators, deeply studied through examples are in particular

\sqrt{n}

-consistent and asymptotically Gaussian in the Cox model and in the excess risk model, whatever is

f\_\epsilon

arXiv.org e-Print Archive

HAL Evry

HAL Descartes

Hal-Diderot

Unsupervised Classification for Tiling Arrays: ChIP-chip and Transcriptome

Author: Aubourg Sébastien
Brunaud Véronique
Bérard Caroline
Martin-Magniette Marie-Laure
Robin Stéphane
Publication venue
Publication date: 01/01/2011
Field of study

Tiling arrays make possible a large scale exploration of the genome thanks to probes which cover the whole genome with very high density until 2 000 000 probes. Biological questions usually addressed are either the expression difference between two conditions or the detection of transcribed regions. In this work we propose to consider simultaneously both questions as an unsupervised classification problem by modeling the joint distribution of the two conditions. In contrast to previous methods, we account for all available information on the probes as well as biological knowledge like annotation and spatial dependence between probes. Since probes are not biologically relevant units we propose a classification rule for non-connected regions covered by several probes. Applications to transcriptomic and ChIP-chip data of Arabidopsis thaliana obtained with a NimbleGen tiling array highlight the importance of a precise modeling and the region classification

arXiv.org e-Print Archive

HAL Evry

HAL Descartes

Estimation of the hazard function in a semiparametric model with covariate measurement error

Author: Martin-Magniette Marie-Laure
Taupin Marie-Luce
Publication venue: 'EDP Sciences'
Publication date: 01/01/2008
Field of study

International audienceWe consider a failure hazard function, conditional on a time-independent covariate , given by . The baseline hazard function and the relative risk both belong to parametric families with . The covariate has an unknown density and is measured with an error through an additive error model where is a random variable, independent from , with known density . We observe a -sample , = 1, ..., , where is the minimum between the failure time and the censoring time, and is the censoring indicator. Using least square criterion and deconvolution methods, we propose a consistent estimator of using the observations , = 1, ..., . We give an upper bound for its risk which depends on the smoothness properties of and as a function of , and we derive sufficient conditions for the -consistency. We give detailed examples considering various type of relative risks and various types of error density . In particular, in the Cox model and in the excess risk model, the estimator of is -consistent and asymptotically Gaussian regardless of the form of

HAL Evry

CiteSeerX

EDP Sciences OAI-PMH repository (1.2.0)

HAL Descartes

Numérisation de Documents Anciens Mathématiques

Normalization for triple-target microarray experiments

Author: Aubert Julie
Bar-Hen Avner
Daudin Jean-Jacques
Elftieh Samira
Magniette Frederic
Martin-Magniette Marie-Laure
Renou Jean-Pierre
Publication venue: BioMed Central
Publication date: 01/01/2008
Field of study

Abstract Background Most microarray studies are made using labelling with one or two dyes which allows the hybridization of one or two samples on the same slide. In such experiments, the most frequently used dyes are <it>Cy</it>3 and <it>Cy</it>5. Recent improvements in the technology (dye-labelling, scanner and, image analysis) allow hybridization up to four samples simultaneously. The two additional dyes are <it>Alexa</it>488 and <it>Alexa</it>494. The triple-target or four-target technology is very promising, since it allows more flexibility in the design of experiments, an increase in the statistical power when comparing gene expressions induced by different conditions and a scaled down number of slides. However, there have been few methods proposed for statistical analysis of such data. Moreover the lowess correction of the global dye effect is available for only two-color experiments, and even if its application can be derived, it does not allow simultaneous correction of the raw data. Results We propose a two-step normalization procedure for triple-target experiments. First the dye bleeding is evaluated and corrected if necessary. Then the signal in each channel is normalized using a generalized lowess procedure to correct a global dye bias. The normalization procedure is validated using triple-self experiments and by comparing the results of triple-target and two-color experiments. Although the focus is on triple-target microarrays, the proposed method can be used to normalize <it>p </it>differently labelled targets co-hybridized on a same array, for any value of <it>p </it>greater than 2. Conclusion The proposed normalization procedure is effective: the technical biases are reduced, the number of false positives is under control in the analysis of differentially expressed genes, and the triple-target experiments are more powerful than the corresponding two-color experiments. There is room for improving the microarray experiments by simultaneously hybridizing more than two samples.</p

HAL Evry

Springer - Publisher Connector

Directory of Open Access Journals

PubMed Central

HAL Descartes

ProdInra

Clustering high-throughput sequencing data with Poisson mixture models

Author: Celeux Gilles
Martin-Magniette Marie-Laure
Maugis-Rabusseau Cathy
Rau Andrea
Publication venue: HAL CCSD
Publication date: 01/01/2011
Field of study

In recent years gene expression studies have increasingly made use of next generation sequencing technology. In turn, research concerning the appropriate statistical methods for the analysis of digital gene expression has flourished, primarily in the context of normalization and differential analysis. In this work, we focus on the question of clustering digital gene expression profiles as a means to discover groups of co-expressed genes. We propose two parameterizations of a Poisson mixture model to cluster expression profiles of high-throughput sequencing data. A set of simulation studies compares the performance of the proposed models with that of an approach developed for a similar type of data, namely serial analysis of gene expression. We also study the performance of these approaches on two real high-throughput sequencing data sets. The R package HTSCluster used to implement the proposed Poisson mixture models is available on CRAN.De plus en plus, les études d'expression de gènes utilisent les techniques de séquençage de nouvelle génération, entraînant une recherche grandissante sur les méthodes les plus appropriées pour l'exploitation des données digitales d'expression, à commencer pour leur normalisation et l'analyse différentielle. Ici, nous nous intéressons à la classification non supervisée des profils d'expression pour la découverte de groupes de gènes coexprimés. Nous proposons deux paramétrisations d'un modèle de mélange de Poisson pour classer des données de séquençage haut-débit. Par des simulations, nous comparons les performances de ces modèles avec des méthodes similaires conçus pour l'analyse en série de l'expression des gènes (SAGE). Nous étudions aussi les performances de ces modèles sur deux jeux de données réelles. Le package R HTSCluster associé à cette étude est disponible sur le CRAN

HAL Evry

Scientific Publications of the University of Toulouse II Le Mirail

INRIA a CCSD electronic archive server

HAL-INSA Toulouse

ProdInra

Hal-Diderot

Variable Selection in Model-based Clustering: A General Variable Role Modeling

Author: Celeux Gilles
Martin-Magniette Marie-Laure
Maugis Cathy
Publication venue: HAL CCSD
Publication date: 01/12/2008
Field of study

The currently available variable selection procedures in model-based clustering assume that the irrelevant clustering variables are all independent or are all linked with the relevant clustering variables. We propose a more versatile variable selection model which describes three possible roles for each variable: The relevant clustering variables, the irrelevant clustering variables dependent on a part of the relevant clustering variables and the irrelevant clustering variables totally independent of all the relevant variables. A model selection criterion and a variable selection algorithm are derived for this new variable role modeling. The model identifiability and the consistency of the variable selection criterion are also established. Numerical experiments on simulated datasets and on a real dataset highlight the interest of this new modeling

HAL Evry

INRIA a CCSD electronic archive server

Hal-Diderot

Comparing Model Selection and Regularization Approaches to Variable Selection in Model-Based Clustering

Author: Celeux Gilles
Martin-Magniette Marie-Laure
Maugis Cathy
Raftery Adrian E.
Publication venue: Société Française de Statistique et Société Mathématique de France
Publication date: 01/01/2014
Field of study

International audienceWe compare two major approaches to variable selection in clustering: model selection and regularization. Based on previous results, we select the method of Maugis et al. (2009b), which modified the method of Raftery and Dean (2006), as a current state of the art model selection method. We select the method of Witten and Tibshirani (2010) as a current state of the art regularization method. We compared the methods by simulation in terms of their accuracy in both classification and variable selection. In the first simulation experiment all the variables were conditionally independent given cluster membership. We found that variable selection (of either kind) yielded substantial gains in classification accuracy when the clusters were well separated, but few gains when the clusters were close together. We found that the two variable selection methods had comparable classification accuracy, but that the model selection approach had substantially better accuracy in selecting variables. In our second simulation experiment, there were correlations among the variables given the cluster memberships. We found that the model selection approach was substantially more accurate in terms of both classification and variable selection than the regularization approach, and that both gave more accurate classifications than K-means without variable selection. But the model selection approach is not available in a very high dimension contextNous considérons deux approches importantes pour la sélection de variables en classification non supervisée : la sélection par modèle et la régularisation. Parmi les procédures existantes de sélection de variables par sélection de modèles, nous choisissons la méthode de Maugis et al. (2009b), généralisation de celle de Raftery et Dean (2006). Pour les méthodes fondées sur la régularisation, nous nous intéressons à la méthode de Witten and Tibshirani (2010). Nous comparons les performances de classification et de sélection de variables de ces deux procédures sur des données simulées. Nous montrons que la sélection de variables permet d’améliorer la classification quand les classes sont bien séparées. Les deux procédures de sélection de variables étudiées donnent des classifications analogues dans le premier exemple, mais l’approche par sélection de modèles a de meilleures performances pour la sélection de variables. Dans le second exemple, les variables sont corrélées. Nous montrons que l’approche par sélection de modèles améliore globalement la classification et la sélection de variables par rapport à la régularisation, et les deux procédures donnent de meilleurs résultats que l’algorithme des K-means (sans sélection de variables) pour la classification. Mais, il convient de noter que la sélection par modèles est inopérante pour les très grandes dimensions. Enfin, ce travail de comparaison est également mené sur des données réelles

INRIA a CCSD electronic archive server

Search for the genes involved in oocyte maturation and early embryo development in the hen

Author: Balzergue Sandrine
Batellier Florence
Blesbois Elisabeth
Couty Isabelle
Elis Sebastien
Govoroun Marina S
Martin-Magniette Marie-Laure
Monget Philippe
Publication venue: BioMed Central
Publication date: 01/01/2008
Field of study

Abstract Background The initial stages of development depend on mRNA and proteins accumulated in the oocyte, and during these stages, certain genes are essential for fertilization, first cleavage and embryonic genome activation. The aim of this study was first to search for avian oocyte-specific genes using an <it>in silico </it>and a microarray approaches, then to investigate the temporal and spatial dynamics of the expression of some of these genes during follicular maturation and early embryogenesis. Results The <it>in silico </it>approach allowed us to identify 18 chicken homologs of mouse potential oocyte genes found by digital differential display. Using the chicken Affymetrix microarray, we identified 461 genes overexpressed in granulosa cells (GCs) and 250 genes overexpressed in the germinal disc (GD) of the hen oocyte. Six genes were identified using both <it>in silico </it>and microarray approaches. Based on GO annotations, GC and GD genes were differentially involved in biological processes, reflecting different physiological destinations of these two cell layers. Finally we studied the spatial and temporal dynamics of the expression of 21 chicken genes. According to their expression patterns all these genes are involved in different stages of final follicular maturation and/or early embryogenesis in the chicken. Among them, 8 genes (<it>btg4</it>, <it>chkmos</it>, <it>wee</it>, <it>zpA</it>, <it>dazL</it>, <it>cvh</it>, <it>zar1 </it>and <it>ktfn) </it>were preferentially expressed in the maturing occyte and <it>cvh</it>, <it>zar1 </it>and <it>ktfn </it>were also highly expressed in the early embryo. Conclusion We showed that <it>in silico </it>and Affymetrix microarray approaches were relevant and complementary in order to find new avian genes potentially involved in oocyte maturation and/or early embryo development, and allowed the discovery of new potential chicken mature oocyte and chicken granulosa cell markers for future studies. Moreover, detailed study of the expression of some of these genes revealed promising candidates for maternal effect genes in the chicken. Finally, the finding concerning the different state of rRNA compared to that of mRNA during the postovulatory period shed light on some mechanisms through which oocyte to embryo transition occurs in the hen.</p

HAL Evry

Crossref

Springer - Publisher Connector

Directory of Open Access Journals

PubMed Central

HAL Descartes

HAL Université de Tours

ProdInra