64 research outputs found

    Sparse PLS discriminant analysis: biologically relevant feature selection and graphical displays for multiclass problems

    Get PDF
    Background: Variable selection on high throughput biological data, such as gene expression or single nucleotide polymorphisms (SNPs), becomes inevitable to select relevant information and, therefore, to better characterize diseases or assess genetic structure. There are different ways to perform variable selection in large data sets. Statistical tests are commonly used to identify differentially expressed features for explanatory purposes, whereas Machine Learning wrapper approaches can be used for predictive purposes. In the case of multiple highly correlated variables, another option is to use multivariate exploratory approaches to give more insight into cell biology, biological pathways or complex traits.Results: A simple extension of a sparse PLS exploratory approach is proposed to perform variable selection in a multiclass classification framework.Conclusions: sPLS-DA has a classification performance similar to other wrapper or sparse discriminant analysis approaches on public microarray and SNP data sets. More importantly, sPLS-DA is clearly competitive in terms of computational efficiency and superior in terms of interpretability of the results via valuable graphical outputs. sPLS-DA is available in the R package mixOmics, which is dedicated to the analysis of large biological data sets

    Temporal development of the oral microbiome and prediction of early childhood caries

    Get PDF
    Human microbiomes are predicted to assemble in a reproducible and ordered manner yet there is limited knowledge on the development of the complex bacterial communities that constitute the oral microbiome. The oral microbiome plays major roles in many oral diseases including early childhood caries (ECC), which afflicts up to 70% of children in some countries. Saliva contains oral bacteria that are indicative of the whole oral microbiome and may have the ability to reflect the dysbiosis in supragingival plaque communities that initiates the clinical manifestations of ECC. The aim of this study was to determine the assembly of the oral microbiome during the first four years of life and compare it with the clinical development of ECC. The oral microbiomes of 134 children enrolled in a birth cohort study were determined at six ages between two months and four years-of-age and their mother’s oral microbiome was determined at a single time point. We identified and quantified 356 operational taxonomic units (OTUs) of bacteria in saliva by sequencing the V4 region of the bacterial 16S RNA genes. Bacterial alpha diversity increased from a mean of 31 OTUs in the saliva of infants at 1.9 months-of-age to 84 OTUs at 39 months-of-age. The oral microbiome showed a distinct shift in composition as the children matured. The microbiome data were compared with the clinical development of ECC in the cohort at 39, 48, and 60 months-of-age as determined by ICDAS-II assessment. Streptococcus mutans was the most discriminatory oral bacterial species between health and current disease, with an increased abundance in disease. Overall our study demonstrates an ordered temporal development of the oral microbiome, describes a limited core oral microbiome and indicates that saliva testing of infants may help predict ECC risk

    Integrative analysis of gene expression and copy number alterations using canonical correlation analysis

    Get PDF
    Supplementary Figure 1. Representation of the samples from the tuning set by their coordinates in the first two pairs of features (extracted from the tuning set) using regularized dual CCA, with regularization parameters tx = 0.9, ty = 0.3 (left panel), and PCA+CCA (right panel). We show the representations with respect to both the copy number features and the gene expression features in a superimposed way, where each sample is represented by two markers. The filled markers represent the coordinates in the features extracted from the copy number variables, and the open markers represent coordinates in the features extracted from the gene expression variables. Samples with different leukemia subtypes are shown with different colors. The first feature pair distinguishes the HD50 group from the rest, while the second feature pair represents the characteristics of the samples from the E2A/PBX1 subtype. The high canonical correlation obtained for the tuning samples with regularized dual CCA is apparent in the left panel, where the two points for each sample coincide. Nevertheless, the extracted features have a high generalization ability, as can be seen in the left panel of Figure 5, showing the representation of the validation samples. 1 Supplementary Figure 2. Representation of the samples from the tuning set by their coordinates in the first two pairs of features (extracted from the tuning set) using regularized dual CCA, with regularization parameters tx = 0, ty = 0 (left panel), and tx = 1, ty = 1 (right panel). We show the representations with respect to both the copy number features and the gene expression features in a superimposed way, where each sample is represented by tw

    Sparse canonical correlation analysis for identifying, connecting and completing gene-expression networks

    Get PDF
    <p>Abstract</p> <p>Background</p> <p>We generalized penalized canonical correlation analysis for analyzing microarray gene-expression measurements for checking completeness of known metabolic pathways and identifying candidate genes for incorporation in the pathway. We used Wold's method for calculation of the canonical variates, and we applied ridge penalization to the regression of pathway genes on canonical variates of the non-pathway genes, and the elastic net to the regression of non-pathway genes on the canonical variates of the pathway genes.</p> <p>Results</p> <p>We performed a small simulation to illustrate the model's capability to identify new candidate genes to incorporate in the pathway: in our simulations it appeared that a gene was correctly identified if the correlation with the pathway genes was 0.3 or more. We applied the methods to a gene-expression microarray data set of 12, 209 genes measured in 45 patients with glioblastoma, and we considered genes to incorporate in the glioma-pathway: we identified more than 25 genes that correlated > 0.9 with canonical variates of the pathway genes.</p> <p>Conclusion</p> <p>We concluded that penalized canonical correlation analysis is a powerful tool to identify candidate genes in pathway analysis.</p

    Protocol for a nested case-control study design for omics investigations in the Environmental Determinants of Islet Autoimmunity cohort

    Get PDF
    Background: The Environmental Determinants of Islet Autoimmunity (ENDIA) pregnancy-birth cohort investigates the developmental origins of type 1 diabetes (T1D), with recruitment between 2013 and 2019. ENDIA is the first study in the world with comprehensive data and biospecimen collection during pregnancy, at birth and through childhood from at-risk children who have a first-degree relative with T1D. Environmental exposures are thought to drive the progression to clinical T1D, with pancreatic islet autoimmunity (IA) developing in genetically susceptible individuals. The exposures and key molecular mechanisms driving this progression are unknown. Persistent IA is the primary outcome of ENDIA; defined as a positive antibody for at least one of IAA, GAD, ZnT8 or IA2 on two consecutive occasions and signifies high risk of clinical T1D.Method: A nested case-control (NCC) study design with 54 cases and 161 matched controls aims to investigate associations between persistent IA and longitudinal omics exposures in ENDIA. The NCC study will analyse samples obtained from ENDIA children who have either developed persistent IA or progressed to clinical T1D (cases) and matched control children at risk of developing persistent IA. Control children were matched on sex and age, with all four autoantibodies absent within a defined window of the case's onset date. Cases seroconverted at a median of 1.37 years (IQR 0.95, 2.56). Longitudinal omics data generated from approximately 16,000 samples of different biospecimen types, will enable evaluation of changes from pregnancy through childhood.Conclusions: This paper describes the ENDIA NCC study, omics platform design considerations and planned univariate and multivariate analyses for its longitudinal data. Methodologies for multivariate omics analysis with longitudinal data are discovery-focused and data driven. There is currently no single multivariate method tailored specifically for the longitudinal omics data that the ENDIA NCC study will generate and therefore omics analysis results will require either cross validation or independent validation.KEY MESSAGESThe ENDIA nested case-control study will utilize longitudinal omics data on approximately 16,000 samples from 190 unique children at risk of type 1 diabetes (T1D), including 54 who have developed islet autoimmunity (IA), followed during pregnancy, at birth and during early childhood, enabling the developmental origins of T1D to be explored.Helena Oakey ... Lynne C. Giles ... Rebecca L. Thomson ... Pat Ashwood ... Emma J. Knight ... Simon C. Barry ... Kelly McGorm ... Jennifer J. Couper ... Megan A. S. Penno ... the ENDIA Study Group ... et al

    Uncoupled Embryonic and Extra-Embryonic Tissues Compromise Blastocyst Development after Somatic Cell Nuclear Transfer

    Get PDF
    Somatic cell nuclear transfer (SCNT) is the most efficient cell reprogramming technique available, especially when working with bovine species. Although SCNT blastocysts performed equally well or better than controls in the weeks following embryo transfer at Day 7, elongation and gastrulation defects were observed prior to implantation. To understand the developmental implications of embryonic/extra-embryonic interactions, the morphological and molecular features of elongating and gastrulating tissues were analysed. At Day 18, 30 SCNT conceptuses were compared to 20 controls (AI and IVP: 10 conceptuses each); one-half of the SCNT conceptuses appeared normal while the other half showed signs of atypical elongation and gastrulation. SCNT was also associated with a high incidence of discordance in embryonic and extra-embryonic patterns, as evidenced by morphological and molecular “uncoupling”. Elongation appeared to be secondarily affected; only 3 of 30 conceptuses had abnormally elongated shapes and there were very few differences in gene expression when they were compared to the controls. However, some of these differences could be linked to defects in microvilli formation or extracellular matrix composition and could thus impact extra-embryonic functions. In contrast to elongation, gastrulation stages included embryonic defects that likely affected the hypoblast, the epiblast, or the early stages of their differentiation. When taking into account SCNT conceptus somatic origin, i.e. the reprogramming efficiency of each bovine ear fibroblast (Low: 0029, Med: 7711, High: 5538), we found that embryonic abnormalities or severe embryonic/extra-embryonic uncoupling were more tightly correlated to embryo loss at implantation than were elongation defects. Alternatively, extra-embryonic differences between SCNT and control conceptuses at Day 18 were related to molecular plasticity (high efficiency/high plasticity) and subsequent pregnancy loss. Finally, because it alters re-differentiation processes in vivo, SCNT reprogramming highlights temporally and spatially restricted interactions among cells and tissues in a unique way

    A novel approach for biomarker selection and the integration of repeated measures experiments from two assays

    Get PDF
    Background: High throughput 'omics' experiments are usually designed to compare changes observed between different conditions (or interventions) and to identify biomarkers capable of characterizing each condition. We consider the complex structure of repeated measurements from different assays where different conditions are applied on the same subjects
    • …
    corecore