86 research outputs found
SMAGEXP: a galaxy tool suite for transcriptomics data meta-analysis
Bakground: With the proliferation of available microarray and high throughput
sequencing experiments in the public domain, the use of meta-analysis methods
increases. In these experiments, where the sample size is often limited,
meta-analysis offers the possibility to considerably enhance the statistical
power and give more accurate results. For those purposes, it combines either
effect sizes or results of single studies in a appropriate manner. R packages
metaMA and metaRNASeq perform meta-analysis on microarray and NGS data,
respectively. They are not interchangeable as they rely on statistical modeling
specific to each technology.
Results: SMAGEXP (Statistical Meta-Analysis for Gene EXPression) integrates
metaMA and metaRNAseq packages into Galaxy. We aim to propose a unified way to
carry out meta-analysis of gene expression data, while taking care of their
specificities. We have developed this tool suite to analyse microarray data
from Gene Expression Omnibus (GEO) database or custom data from affymetrix
microarrays. These data are then combined to carry out meta-analysis using
metaMA package. SMAGEXP also offers to combine raw read counts from Next
Generation Sequencing (NGS) experiments using DESeq2 and metaRNASeq package. In
both cases, key values, independent from the technology type, are reported to
judge the quality of the meta-analysis. These tools are available on the Galaxy
main tool shed. Source code, help and installation instructions are available
on github.
Conclusion: The use of Galaxy offers an easy-to-use gene expression
meta-analysis tool suite based on the metaMA and metaRNASeq packages
Differential meta-analysis of RNA-seq data from multiple studies
High-throughput sequencing is now regularly used for studies of the
transcriptome (RNA-seq), particularly for comparisons among experimental
conditions. For the time being, a limited number of biological replicates are
typically considered in such experiments, leading to low detection power for
differential expression. As their cost continues to decrease, it is likely that
additional follow-up studies will be conducted to re-address the same
biological question. We demonstrate how p-value combination techniques
previously used for microarray meta-analyses can be used for the differential
analysis of RNA-seq data from multiple related studies. These techniques are
compared to a negative binomial generalized linear model (GLM) including a
fixed study effect on simulated data and real data on human melanoma cell
lines. The GLM with fixed study effect performed well for low inter-study
variation and small numbers of studies, but was outperformed by the
meta-analysis methods for moderate to large inter-study variability and larger
numbers of studies. To conclude, the p-value combination techniques illustrated
here are a valuable tool to perform differential meta-analyses of RNA-seq data
by appropriately accounting for biological and technical variability within
studies as well as additional study-specific effects. An R package metaRNASeq
is available on the R Forge
New efficient algorithms for multiple change-point detection with kernels
Several statistical approaches based on reproducing kernels have been
proposed to detect abrupt changes arising in the full distribution of the
observations and not only in the mean or variance. Some of these approaches
enjoy good statistical properties (oracle inequality, \ldots). Nonetheless,
they have a high computational cost both in terms of time and memory. This
makes their application difficult even for small and medium sample sizes (). This computational issue is addressed by first describing a new
efficient and exact algorithm for kernel multiple change-point detection with
an improved worst-case complexity that is quadratic in time and linear in
space. It allows dealing with medium size signals (up to ).
Second, a faster but approximation algorithm is described. It is based on a
low-rank approximation to the Gram matrix. It is linear in time and space. This
approximation algorithm can be applied to large-scale signals ().
These exact and approximation algorithms have been implemented in \texttt{R}
and \texttt{C} for various kernels. The computational and statistical
performances of these new algorithms have been assessed through empirical
experiments. The runtime of the new algorithms is observed to be faster than
that of other considered procedures. Finally, simulations confirmed the higher
statistical accuracy of kernel-based approaches to detect changes that are not
only in the mean. These simulations also illustrate the flexibility of
kernel-based approaches to analyze complex biological profiles made of DNA copy
number and allele B frequencies. An R package implementing the approach will be
made available on github
Introduction to statistics for omics data
International audienc
MPAgenomics : An R package for multi-patients analysis of genomic markers
MPAgenomics, standing for multi-patients analysis (MPA) of genomic markers,
is an R-package devoted to: (i) efficient segmentation, and (ii) genomic marker
selection from multi-patient copy number and SNP data profiles. It provides
wrappers from commonly used packages to facilitate their repeated (sometimes
difficult) use, offering an easy-to-use pipeline for beginners in R. The
segmentation of successive multiple profiles (finding losses and gains) is
based on a new automatic choice of influential parameters since default ones
were misleading in the original packages. Considering multiple profiles in the
same time, MPAgenomics wraps efficient penalized regression methods to select
relevant markers associated with a given response
Linking different kinds of Omics data through a model-based clustering approach
International audienc
Sélection de groupes de variables corrélées par classification ascendante hiérarchique et group-lasso
National audienceIn a context of variable selection, the use of penalized regressions in presence of high correlations might be problematic. Only a subset of the correlated variables is selected. Firstly aggregating related variables can help both for selection and interpretation. However, clustering methods require calibration of additional parameters. We will introduce a new method combining hierarchical clustering and group selection.Dans un contexte de sélection de variables, utiliser des régressions pénalisées en présence de fortes corrélations peut poser problÚme. Seul un sous-ensemble des variables corrélées est sélectionné. Agréger préalablement les variables liées entre elles peut aider aussi bien a la sélection qu'à l' interprétation. Cependant, les méthodes de regroupement de variables nécessitent la calibration de paramÚtres supplémentaires. Nous présenterons une nouvelle méthode combinant classification ascendante hiérarchique et sélection de groupes de variables
Analyse multi-patients de données génomiques
National audienceMPAgenomics, standing for multi-patients analysis (MPA) of genomic markers, is an R-package devoted to: (i) efficient segmentation, and (ii) genomic marker selection from multi-patient copy number and SNP data profiles.It provides wrappers from commonly used packages to facilitate their repeated (sometimes difficult) use, offering an easy-to-use pipeline for beginners in R. The segmentation of successive multiple profiles (finding losses and gains) is based on a new automatic choice of influential parameters since default ones were misleading in the original packages. Considering multiple profiles in the same time, MPAgenomics wraps efficient penalized regression methods to select relevant markers associated with a given response
Sélection de groupes de variables corrélées par classification ascendante hiérarchique et group-lasso
National audienceDans un contexte de sélection de variables, utiliser des régressions pénalisées en présence de fortes corrélations peut poser problÚme. Seul un sous-ensemble des variables corrélées est sélectionné. Agréger préalablement les variables liées entre elles peut aider aussi bien à la sélection qu'à l'interprétation. Cependant, les méthodes de regroupement de variables nécessitent la calibration de paramÚtres supplémentaires. Nous présenterons une nouvelle méthode combinant classification ascendante hiérarchique et sélection de groupes de variables
- âŠ