Search CORE

32 research outputs found

Prediction area visualisation on the Small Round Blue Cell Tumors data (SRBCT [35]) data, described in the Results Section, with respect to the prediction distance.

Author: Amrit Singh (429768)
Benoît Gautier (4571203)
Florian Rohart (3574298)
Kim-Anh Lê Cao (159730)
Publication venue
Publication date
Field of study

From left to right: ‘maximum distance’, ‘Centroid distance’ and ‘Mahalanobis distance’. Sample prediction area plots from a PLS-DA model applied on a microarray data set with the expression levels of 2,308 genes on 63 samples. Samples are classified into four classes: Burkitt Lymphoma (BL), Ewing Sarcoma (EWS), Neuroblastoma (NB), and Rhabdomyosarcoma (RMS).</p

FigShare

Overview of the mixOmics multivariate methods for single and integrative ‘omics supervised analyses.

Author: Amrit Singh (429768)
Benoît Gautier (4571203)
Florian Rohart (3574298)
Kim-Anh Lê Cao (159730)
Publication venue
Publication date
Field of study

X denote a predictor ‘omics data set, and y a categorical outcome response (e.g. healthy vs. sick). Integrative analyses include N-integration with DIABLO (the same N samples are measured on different ‘omics platforms), and P-integration with MINT (the same P ‘omics predictors are measured in several independent studies). Sample plots depicted here use the mixOmics functions (from left to right) plotIndiv, plotArrow and plotIndiv in 3D; variable plots use the mixOmics functions network, cim, plotLoadings, plotVar and circosPlot. The graphical output functions are detailed in Supporting Information <a href="http://www.ploscompbiol.org/article/info:doi/10.1371/journal.pcbi.1005752#pcbi.1005752.s001" target="_blank">S1 Text</a>.</p

FigShare

Illustration of N-integrative supervised analysis with DIABLO.

Author: Amrit Singh (429768)
Benoît Gautier (4571203)
Florian Rohart (3574298)
Kim-Anh Lê Cao (159730)
Publication venue
Publication date
Field of study

A: sample plot per data set, B: sample scatterplot from plotDiablo displaying the first component in each data set (upper diagonal plot) and Pearson correlation between each component (lower diagonal plot). C: Clustered Image Map (Euclidean distance, Complete linkage) of the multi-omics signature. Samples are represented in rows, selected features on the first component in columns. D: Circos plot shows the positive (negative) correlation (r > 0.7) between selected features as indicated by the brown (black) links, feature names appear in the quadrants, E: Correlation Circle plot representing each type of selected features, F: relevance network visualisation of the selected features.</p

FigShare

Example of computational time for the data sets presented in the Results section with a macbook pro 2013, 2.6GHz, 16Go Ram.

Author: Amrit Singh (429768)
Benoît Gautier (4571203)
Florian Rohart (3574298)
Kim-Anh Lê Cao (159730)
Publication venue
Publication date
Field of study

Example of computational time for the data sets presented in the Results section with a macbook pro 2013, 2.6GHz, 16Go Ram.</p

FigShare

Illustration of a single ‘omics analysis with mixOmics.

Author: Amrit Singh (429768)
Benoît Gautier (4571203)
Florian Rohart (3574298)
Kim-Anh Lê Cao (159730)
Publication venue
Publication date
Field of study

A) Unsupervised preliminary analysis with PCA, A1: PCA sample plot, A2: percentage of explained variance per component. B) Supervised analysis with PLS-DA, B1: PLS-DA sample plot with confidence ellipse plots, B2: classification performance per component (overall and BER) for three prediction distances using repeated stratified cross-validation (10×5-fold CV). C) Supervised analysis and feature selection with sparse PLS-DA, C1: sPLS-DA sample plot with confidence ellipse plots, C2: arrow plot representing each sample pointing towards its outcome category, see more details in Supporting Information <a href="http://www.ploscompbiol.org/article/info:doi/10.1371/journal.pcbi.1005752#pcbi.1005752.s001" target="_blank">S1 Text</a>. C3: Clustered Image Map (Euclidean Distance, Complete linkage) where samples are represented in rows and selected features in columns (10, 300 and 30 genes selected on each component respectively), C4: ROC curve and AUC averaged using one-vs-all comparisons.</p

FigShare

Summary of the eighteen multivariate projection-based methods available in mixOmics version 6.0.0 or above for different types of analysis frameworks.

Author: Amrit Singh (429768)
Benoît Gautier (4571203)
Florian Rohart (3574298)
Kim-Anh Lê Cao (159730)
Publication venue
Publication date
Field of study

Note that our block.pls/plsda and sparse variants differ from the approaches from [<a href="http://www.ploscompbiol.org/article/info:doi/10.1371/journal.pcbi.1005752#pcbi.1005752.ref028" target="_blank">28</a>–<a href="http://www.ploscompbiol.org/article/info:doi/10.1371/journal.pcbi.1005752#pcbi.1005752.ref031" target="_blank">31</a>]. The wrappers for rgcca and sgcca are originally from the RGCCA package [<a href="http://www.ploscompbiol.org/article/info:doi/10.1371/journal.pcbi.1005752#pcbi.1005752.ref032" target="_blank">32</a>] but the argument inputs were further improved for mixOmics.</p

FigShare

Simulation results.

Author: Alain-Dominique Gorse (728532)
Bevan Emma Huang (788822)
Jasmin Straube (788821)
Kim-Anh Lê Cao (159730)
Publication venue
Publication date: 27/08/2015
Field of study

Averaged sensitivity for LMMSDE and LIMMA after 100 simulations. Differential expression between groups and/or time was tested with increasing noise and fold change (FC) levels.</p

FigShare

A Linear Mixed Model Spline Framework for Analysing Time Course ‘Omics’ Data

Author: Alain-Dominique Gorse (728532)
Bevan Emma Huang (788822)
Jasmin Straube (788821)
Kim-Anh Lê Cao (159730)
Publication venue
Publication date: 27/08/2015
Field of study

<div>Time course ‘omics’ experiments are becoming increasingly important to study system-wide dynamic regulation. Despite their high information content, analysis remains challenging. ‘Omics’ technologies capture quantitative measurements on tens of thousands of molecules. Therefore, in a time course ‘omics’ experiment molecules are measured for multiple subjects over multiple time points. This results in a large, high-dimensional dataset, which requires computationally efficient approaches for statistical analysis. Moreover, methods need to be able to handle missing values and various levels of noise. We present a novel, robust and powerful framework to analyze time course ‘omics’ data that consists of three stages: quality assessment and filtering, profile modelling, and analysis. The first step consists of removing molecules for which expression or abundance is highly variable over time. The second step models each molecular expression profile in a linear mixed model framework which takes into account subject-specific variability. The best model is selected through a serial model selection approach and results in dimension reduction of the time course data. The final step includes two types of analysis of the modelled trajectories, namely, clustering analysis to identify groups of correlated profiles over time, and differential expression analysis to identify profiles which differ over time and/or between treatment groups. Through simulation studies we demonstrate the high sensitivity and specificity of our approach for differential expression analysis. We then illustrate how our framework can bring novel insights on two time course ‘omics’ studies in breast cancer and kidney rejection. The methods are publicly available, implemented in the R CRAN package lmms.</div

Directory of Open Access Journals

University of Melbourne Institutional Repository

FigShare

Clustering of filter ratios on proteomic datasets.

Author: Alain-Dominique Gorse (728532)
Bevan Emma Huang (788822)
Jasmin Straube (788821)
Kim-Anh Lê Cao (159730)
Publication venue
Publication date
Field of study

Scatterplots of filter ratios RT on the x-axis against RI on the y-axis for A) iTraq breast cancer dataset and B) and C) the iTraq kidney rejection dataset for group Allograft Rejection (AR) and Non-Rejection (NR) respectively. Colors indicate clusters from a 2-cluster model-based clustering, with red squares indicating molecules that cluster as ‘informative’ and will remain in the analysis and blue circles indicating ‘non-informative’ molecules that will be removed prior to analysis.</p

FigShare

Types of models used to summarize profiles.

Author: Alain-Dominique Gorse (728532)
Bevan Emma Huang (788822)
Jasmin Straube (788821)
Kim-Anh Lê Cao (159730)
Publication venue
Publication date
Field of study

The number (proportion) of profiles modelled with each model selected by our proposed LMMS approach. Models are abbreviated as linear (LIN), spline (SPL), subject-specific intercept (SSI), and subject-specific intercept and slope (SSIS). Models were applied to cell line breast cancer data (Cell), Saccharomyces paradoxus evolution data (Yeast), Mus musculus chemotherapy data (Mouse), and Homo Sapiens kidney rejection Non-Rejection (NR) data (Human). The row ‘Removed’ indicates the percentage of filtered profiles using the 2-cluster model-based clustering on RT and RI.</p

FigShare