Search CORE

86 research outputs found

SMAGEXP: a galaxy tool suite for transcriptomics data meta-analysis

Author: Blanck Samuel
Marot Guillemette
Publication venue
Publication date: 22/02/2018
Field of study

Bakground: With the proliferation of available microarray and high throughput sequencing experiments in the public domain, the use of meta-analysis methods increases. In these experiments, where the sample size is often limited, meta-analysis offers the possibility to considerably enhance the statistical power and give more accurate results. For those purposes, it combines either effect sizes or results of single studies in a appropriate manner. R packages metaMA and metaRNASeq perform meta-analysis on microarray and NGS data, respectively. They are not interchangeable as they rely on statistical modeling specific to each technology. Results: SMAGEXP (Statistical Meta-Analysis for Gene EXPression) integrates metaMA and metaRNAseq packages into Galaxy. We aim to propose a unified way to carry out meta-analysis of gene expression data, while taking care of their specificities. We have developed this tool suite to analyse microarray data from Gene Expression Omnibus (GEO) database or custom data from affymetrix microarrays. These data are then combined to carry out meta-analysis using metaMA package. SMAGEXP also offers to combine raw read counts from Next Generation Sequencing (NGS) experiments using DESeq2 and metaRNASeq package. In both cases, key values, independent from the technology type, are reported to judge the quality of the meta-analysis. These tools are available on the Galaxy main tool shed. Source code, help and installation instructions are available on github. Conclusion: The use of Galaxy offers an easy-to-use gene expression meta-analysis tool suite based on the metaMA and metaRNASeq packages

arXiv.org e-Print Archive

INRIA a CCSD electronic archive server

Differential meta-analysis of RNA-seq data from multiple studies

Author: Jaffrézic Florence
Marot Guillemette
Rau Andrea
Publication venue
Publication date: 14/06/2013
Field of study

High-throughput sequencing is now regularly used for studies of the transcriptome (RNA-seq), particularly for comparisons among experimental conditions. For the time being, a limited number of biological replicates are typically considered in such experiments, leading to low detection power for differential expression. As their cost continues to decrease, it is likely that additional follow-up studies will be conducted to re-address the same biological question. We demonstrate how p-value combination techniques previously used for microarray meta-analyses can be used for the differential analysis of RNA-seq data from multiple related studies. These techniques are compared to a negative binomial generalized linear model (GLM) including a fixed study effect on simulated data and real data on human melanoma cell lines. The GLM with fixed study effect performed well for low inter-study variation and small numbers of studies, but was outperformed by the meta-analysis methods for moderate to large inter-study variability and larger numbers of studies. To conclude, the p-value combination techniques illustrated here are a valuable tool to perform differential meta-analyses of RNA-seq data by appropriately accounting for biological and technical variability within studies as well as additional study-specific effects. An R package metaRNASeq is available on the R Forge

arXiv.org e-Print Archive

Crossref

Springer - Publisher Connector

INRIA a CCSD electronic archive server

PubMed Central

HAL Descartes

New efficient algorithms for multiple change-point detection with kernels

Author: Celisse Alain
Marot Guillemette
Pierre-Jean Morgane
Rigaill Guillem
Publication venue
Publication date: 01/09/2016
Field of study

Several statistical approaches based on reproducing kernels have been proposed to detect abrupt changes arising in the full distribution of the observations and not only in the mean or variance. Some of these approaches enjoy good statistical properties (oracle inequality, \ldots). Nonetheless, they have a high computational cost both in terms of time and memory. This makes their application difficult even for small and medium sample sizes (

n< 10^4

). This computational issue is addressed by first describing a new efficient and exact algorithm for kernel multiple change-point detection with an improved worst-case complexity that is quadratic in time and linear in space. It allows dealing with medium size signals (up to

n \approx 10^5

). Second, a faster but approximation algorithm is described. It is based on a low-rank approximation to the Gram matrix. It is linear in time and space. This approximation algorithm can be applied to large-scale signals (

n \geq 10^6

). These exact and approximation algorithms have been implemented in \texttt{R} and \texttt{C} for various kernels. The computational and statistical performances of these new algorithms have been assessed through empirical experiments. The runtime of the new algorithms is observed to be faster than that of other considered procedures. Finally, simulations confirmed the higher statistical accuracy of kernel-based approaches to detect changes that are not only in the mean. These simulations also illustrate the flexibility of kernel-based approaches to analyze complex biological profiles made of DNA copy number and allele B frequencies. An R package implementing the approach will be made available on github

arXiv.org e-Print Archive

HAL Evry

INRIA a CCSD electronic archive server

Hal-Diderot

Meta-analysis of RNA-Seq data

Author: Marot Guillemette
Publication venue: HAL CCSD
Publication date: 28/06/2022
Field of study

National audienc

INRIA a CCSD electronic archive server

Introduction to statistics for omics data

Author: Marot Guillemette
Publication venue: HAL CCSD
Publication date: 14/11/2022
Field of study

International audienc

INRIA a CCSD electronic archive server

MPAgenomics : An R package for multi-patients analysis of genomic markers

Author: Blanck Samuel
Celisse Alain
Cheok Meyling
Figeac Martin
Grimonprez Quentin
Marot Guillemette
Publication venue
Publication date: 20/01/2014
Field of study

MPAgenomics, standing for multi-patients analysis (MPA) of genomic markers, is an R-package devoted to: (i) efficient segmentation, and (ii) genomic marker selection from multi-patient copy number and SNP data profiles. It provides wrappers from commonly used packages to facilitate their repeated (sometimes difficult) use, offering an easy-to-use pipeline for beginners in R. The segmentation of successive multiple profiles (finding losses and gains) is based on a new automatic choice of influential parameters since default ones were misleading in the original packages. Considering multiple profiles in the same time, MPAgenomics wraps efficient penalized regression methods to select relevant markers associated with a given response

arXiv.org e-Print Archive

CiteSeerX

Crossref

INRIA a CCSD electronic archive server

HAL-Inserm

HAL Descartes

PubMed Central

Linking different kinds of Omics data through a model-based clustering approach

Author: Marot Guillemette
Ternynck Camille
Vandewalle V
Publication venue: HAL CCSD
Publication date: 26/08/2019
Field of study

International audienc

INRIA a CCSD electronic archive server

Sélection de groupes de variables corrélées par classification ascendante hiérarchique et group-lasso

Author: Celisse Alain
Grimonprez Quentin
Marot Guillemette
Publication venue: HAL CCSD
Publication date: 01/06/2015
Field of study

National audienceIn a context of variable selection, the use of penalized regressions in presence of high correlations might be problematic. Only a subset of the correlated variables is selected. Firstly aggregating related variables can help both for selection and interpretation. However, clustering methods require calibration of additional parameters. We will introduce a new method combining hierarchical clustering and group selection.Dans un contexte de sélection de variables, utiliser des régressions pénalisées en présence de fortes corrélations peut poser problème. Seul un sous-ensemble des variables corrélées est sélectionné. Agréger préalablement les variables liées entre elles peut aider aussi bien a la sélection qu'à l' interprétation. Cependant, les méthodes de regroupement de variables nécessitent la calibration de paramètres supplémentaires. Nous présenterons une nouvelle méthode combinant classification ascendante hiérarchique et sélection de groupes de variables

INRIA a CCSD electronic archive server

HAL Descartes

Analyse multi-patients de données génomiques

Author: Celisse Alain
Grimonprez Quentin
Marot Guillemette
Publication venue: HAL CCSD
Publication date: 02/06/2014
Field of study

National audienceMPAgenomics, standing for multi-patients analysis (MPA) of genomic markers, is an R-package devoted to: (i) efficient segmentation, and (ii) genomic marker selection from multi-patient copy number and SNP data profiles.It provides wrappers from commonly used packages to facilitate their repeated (sometimes difficult) use, offering an easy-to-use pipeline for beginners in R. The segmentation of successive multiple profiles (finding losses and gains) is based on a new automatic choice of influential parameters since default ones were misleading in the original packages. Considering multiple profiles in the same time, MPAgenomics wraps efficient penalized regression methods to select relevant markers associated with a given response

INRIA a CCSD electronic archive server

Sélection de groupes de variables corrélées par classification ascendante hiérarchique et group-lasso

Author: Celisse Alain
Grimonprez Quentin
Marot Guillemette
Publication venue: HAL CCSD
Publication date: 28/08/2015
Field of study

National audienceDans un contexte de sélection de variables, utiliser des régressions pénalisées en présence de fortes corrélations peut poser problème. Seul un sous-ensemble des variables corrélées est sélectionné. Agréger préalablement les variables liées entre elles peut aider aussi bien à la sélection qu'à l'interprétation. Cependant, les méthodes de regroupement de variables nécessitent la calibration de paramètres supplémentaires. Nous présenterons une nouvelle méthode combinant classification ascendante hiérarchique et sélection de groupes de variables

INRIA a CCSD electronic archive server