Search CORE

46 research outputs found

Segmentation of the Poisson and negative binomial rate models: a penalized estimator

Author: Cleynen Alice
Lebarbier Emilie
Publication venue
Publication date: 17/03/2013
Field of study

We consider the segmentation problem of Poisson and negative binomial (i.e. overdispersed Poisson) rate distributions. In segmentation, an important issue remains the choice of the number of segments. To this end, we propose a penalized log-likelihood estimator where the penalty function is constructed in a non-asymptotic context following the works of L. Birg\'e and P. Massart. The resulting estimator is proved to satisfy an oracle inequality. The performances of our criterion is assessed using simulated and real datasets in the RNA-seq data analysis context

arXiv.org e-Print Archive

CiteSeerX

EDP Sciences OAI-PMH repository (1.2.0)

Numérisation de Documents Anciens Mathématiques

Evaluation of mineralogy per geological layers by Approximate Bayesian Computation

Author: Bruned Vianney
Cleynen Alice
Mas André
Wlodarczyck Sylvain
Publication venue
Publication date: 21/05/2019
Field of study

We propose a new methodology to perform mineralogic inversion from wellbore logs based on a Bayesian linear regression model. Our method essentially relies on three steps. The first step makes use of Approximate Bayesian Computation (ABC) and selects from the Bayesian generator a set of candidates-volumes corresponding closely to the wellbore data responses. The second step gathers these candidates through a density-based clustering algorithm. A mineral scenario is assigned to each cluster through direct mineralogical inversion, and we provide a confidence estimate for each lithological hypothesis. The advantage of this approach is to explore all possible mineralogy hypotheses that match the wellbore data. This pipeline is tested on both synthetic and real datasets

arXiv.org e-Print Archive

HAL Descartes

Statistical approaches for segmentation (application to genome annotation)

Author: CLEYNEN Alice
DUDOIT Sandrine
ROBIN Stéphane
Publication venue
Publication date: 01/01/2013
Field of study

Nous proposons de modéliser les données issues des technologies de séquençage du transcriptome (RNA-Seq) à l'aide de la loi binomiale négative, et nous construisons des modèles de segmentation adaptés à leur étude à différentes échelles biologiques, dans le contexte où ces technologies sont devenues un outil précieux pour l'annotation de génome, l'analyse de l'expression des gènes, et la détection de nouveaux transcrits. Nous développons un algorithme de segmentation rapide pour analyser des séries à l'échelle du chromosome, et nous proposons deux méthodes pour l'estimation du nombre de segments, directement lié au nombre de gènes exprimés dans la cellule, qu'ils soient précédemment annotés ou détectés à cette même occasion. L'objectif d'annotation précise des gènes, et plus particulièrement de comparaison des sites de début et fin de transcription entre individus, nous amène naturellement à nous intéresser à la comparaison des localisations de ruptures dans des séries indépendantes. Nous construisons ainsi dans un cadre de segmentation bayésienne des outils de réponse à nos questions pour lesquels nous sommes capable de fournir des mesures d'incertitude. Nous illustrons nos modèles, tous implémentés dans des packages R, sur des données RNA-Seq provenant d'expériences sur la levure, et montrons par exemple que les frontières des introns sont conservées entre conditions tandis que les débuts et fin de transcriptions sont soumis à l'épissage différentiel.We propose to model the output of transcriptome sequencing technologies (RNA-Seq) using the negative binomial distribution, as well as build segmentation models suited to their study at different biological scales, in the context of these technologies becoming a valuable tool for genome annotation, gene expression analysis, and new-transcript discovery. We develop a fast segmentation algorithm to analyze whole chromosomes series, and we propose two methods for estimating the number of segments, a key feature related to the number of genes expressed in the cell, should they be identified from previous experiments or discovered at this occasion. Research on precise gene annotation, and in particular comparison of transcription boundaries for individuals, naturally leads us to the statistical comparison of change-points in independent series. To address our questions, we build tools, in a Bayesian segmentation framework, for which we are able to provide uncertainty measures. We illustrate our models, all implemented in R packages, on an RNA-Seq dataset from a study on yeast, and show for instance that the intron boundaries are conserved across conditions while the beginning and end of transcripts are subject to differential splicing.PARIS11-SCD-Bib. électronique (914719901) / SudocORSAY-PARIS 11-Bib. Maths (914712203) / SudocSudocFranceF

OpenGrey Repository

Statistical approaches for segmentation (application to genome annotation)

Author: CLEYNEN Alice
DUDOIT Sandrine
ROBIN Stéphane
Publication venue
Publication date: 01/01/1995
Field of study

Archivio Ricerca Ca'Foscari

Archivio istituzionale della ricerca - Università degli Studi di Venezia Ca' Foscari

OpenGrey Repository

Segmentor3IsBack: an R package for the fast and exact segmentation of Seq-data

Author: A Cleynen
AB Olshen
Alice Cleynen
C Durot
C Erdman
C Rivera
C Xie
D Risso
DY Chiang
E Lebarbier
Emilie Lebarbier
F Picard
F Picard
G Rigaill
Guillem Rigaill
H Akaike
J Bockhorst
J Bullard
J Franke
JJ Shen
JV Braun
Michel Koskas
N Johnson
NR Zhang
R Killick
S Arlot
S Yoon
Stéphane Robin
TD Hocking
TM Luong
V Boeva
Y Yao
Z Zhang
Publication venue: 'Springer Science and Business Media LLC'
Publication date
Field of study

Crossref

Two Isoforms of the mRNA Binding Protein IGF2BP2 Are Generated by Alternative Translational Initiation

Author: A Clery
A Crocoll
A Doria
A Stancakova
A Zhang
Alice M. Sorrell
B Liao
B Liao
C Maris
CF Calkhoven
CR Tessier
E Zeggini
FC Nielsen
FC Nielsen
FM Spagnoli
H Kawaji
Hang T. T. Le
I Cleynen
J Christiansen
J Nielsen
J Nielsen
J Vikesaa
J Zhang
JK Yisraeli
JR Brants
JR Hogg
JY Zhang
Kenneth Siddle
L Jonson
L Messaoudi
L Rahman
LJ Scott
M Kozak
M Kozak
M Wagner
MJ Groenewoud
NA Hammer
P Ioannidis
PH Giangrande
R Saxena
R Sladek
R Valverde
S Weinlich
TV Hansen
Wael El-Rifai
X Li
Y Wu
Publication venue: Public Library of Science
Publication date: 01/01/2011
Field of study

IGF2BP2 is a member of a family of mRNA binding proteins that, collectively, have been shown to bind to several different mRNAs in mammalian cells, including one of the mRNAs encoding insulin-like growth factor-2. Polymorphisms in the Igf2bp2 gene are associated with risk of developing type 2 diabetes, but detailed functional characterisation of IGF2BP2 protein is lacking. By immunoblotting with C-terminally reactive antibodies we identified a novel IGF2BP2 isoform with a molecular weight of 58 kDa in both human and rodents, that is expressed at somewhat lower levels than the full-length 65 kDa protein. We demonstrated by mutagenesis that this isoform is generated by alternative translation initiation at the internal Met69. It lacks a conserved N-terminal RNA Recognition Motif (RRM) and would be predicted to differ functionally from the canonical full length isoform. We further investigated IGF2BP2 mRNA transcripts by amplification of cDNA using 5′-RACE. We identified multiple transcription start sites of the human, mouse and rat Igf2bp2 genes in a highly conserved region only 50–90 nts upstream of the major translation start site, ruling out the existence of N-terminally extended isoforms. We conclude that structural heterogeneity of IGF2BP2 protein should be taken into account when considering cellular function

CiteSeerX

Public Library of Science (PLOS)

Crossref

Directory of Open Access Journals

PubMed Central