102 research outputs found
Sparse PLS discriminant analysis: biologically relevant feature selection and graphical displays for multiclass problems
Background: Variable selection on high throughput biological data, such as gene expression or single nucleotide polymorphisms (SNPs), becomes inevitable to select relevant information and, therefore, to better characterize diseases or assess genetic structure. There are different ways to perform variable selection in large data sets. Statistical tests are commonly used to identify differentially expressed features for explanatory purposes, whereas Machine Learning wrapper approaches can be used for predictive purposes. In the case of multiple highly correlated variables, another option is to use multivariate exploratory approaches to give more insight into cell biology, biological pathways or complex traits.Results: A simple extension of a sparse PLS exploratory approach is proposed to perform variable selection in a multiclass classification framework.Conclusions: sPLS-DA has a classification performance similar to other wrapper or sparse discriminant analysis approaches on public microarray and SNP data sets. More importantly, sPLS-DA is clearly competitive in terms of computational efficiency and superior in terms of interpretability of the results via valuable graphical outputs. sPLS-DA is available in the R package mixOmics, which is dedicated to the analysis of large biological data sets
Integrative analysis of gene expression and copy number alterations using canonical correlation analysis
Supplementary Figure 1. Representation of the samples from the tuning set by their coordinates in the first two pairs of features (extracted from the tuning set) using regularized dual CCA, with regularization parameters tx = 0.9, ty = 0.3 (left panel), and PCA+CCA (right panel). We show the representations with respect to both the copy number features and the gene expression features in a superimposed way, where each sample is represented by two markers. The filled markers represent the coordinates in the features extracted from the copy number variables, and the open markers represent coordinates in the features extracted from the gene expression variables. Samples with different leukemia subtypes are shown with different colors. The first feature pair distinguishes the HD50 group from the rest, while the second feature pair represents the characteristics of the samples from the E2A/PBX1 subtype. The high canonical correlation obtained for the tuning samples with regularized dual CCA is apparent in the left panel, where the two points for each sample coincide. Nevertheless, the extracted features have a high generalization ability, as can be seen in the left panel of Figure 5, showing the representation of the validation samples. 1 Supplementary Figure 2. Representation of the samples from the tuning set by their coordinates in the first two pairs of features (extracted from the tuning set) using regularized dual CCA, with regularization parameters tx = 0, ty = 0 (left panel), and tx = 1, ty = 1 (right panel). We show the representations with respect to both the copy number features and the gene expression features in a superimposed way, where each sample is represented by tw
New Abundant Microbial Groups in Aquatic Hypersaline Environments
We describe the microbiota of two hypersaline saltern ponds, one of intermediate salinity (19%) and a NaCl saturated crystallizer pond (37%) using pyrosequencing. The analyses of these metagenomes (nearly 784â
Mb) reaffirmed the vast dominance of Haloquadratum walsbyi but also revealed novel, abundant and previously unsuspected microbial groups. We describe for the first time, a group of low GC Actinobacteria, related to freshwater Actinobacteria, abundant in low and intermediate salinities. Metagenomic assembly revealed three new abundant microbes: a low-GC euryarchaeon with the lowest GC content described for any euryarchaeon, a high-GC euryarchaeon and a gammaproteobacterium related to Alkalilimnicola and Nitrococcus. Multiple displacement amplification and sequencing of the genome from a single archaeal cell of the new low GC euryarchaeon suggest a photoheterotrophic and polysaccharide-degrading lifestyle and its relatedness to the recently described lineage of Nanohaloarchaea. These discoveries reveal the combined power of an unbiased metagenomic and single cell genomic approach
Pharmacogenetics Meets Metabolomics: Discovery of Tryptophan as a New Endogenous OCT2 Substrate Related to Metformin Disposition
Genetic polymorphisms of the organic cation transporter 2 (OCT2), encoded by SLC22A2, have been investigated in association with metformin disposition. A functional decrease in transport function has been shown to be associated with the OCT2 variants. Using metabolomics, our study aims at a comprehensive monitoring of primary metabolite changes in order to understand biochemical alteration associated with OCT2 polymorphisms and discovery of potential endogenous metabolites related to the genetic variation of OCT2. Using GC-TOF MS based metabolite profiling, clear clustering of samples was observed in Partial Least Square Discriminant Analysis, showing that metabolic profiles were linked to the genetic variants of OCT2. Tryptophan and uridine presented the most significant alteration in SLC22A2-808TT homozygous and the SLC22A2-808G>T heterozygous variants relative to the reference. Particularly tryptophan showed gene-dose effects of transporter activity according to OCT2 genotypes and the greatest linear association with the pharmacokinetic parameters (Clrenal, Clsec, Cl/F/kg, and Vd/F/kg) of metformin. An inhibition assay demonstrated the inhibitory effect of tryptophan on the uptake of 1-methyl-4-phenyl pyrinidium in a concentration dependent manner and subsequent uptake experiment revealed differential tryptophan-uptake rate in the oocytes expressing OCT2 reference and variant (808G>T). Our results collectively indicate tryptophan can serve as one of the endogenous substrate for the OCT2 as well as a biomarker candidate indicating the variability of the transport activity of OCT2
'Gut health': a new objective in medicine?
'Gut health' is a term increasingly used in the medical literature and by the food industry. It covers multiple positive aspects of the gastrointestinal (GI) tract, such as the effective digestion and absorption of food, the absence of GI illness, normal and stable intestinal microbiota, effective immune status and a state of well-being. From a scientific point of view, however, it is still extremely unclear exactly what gut health is, how it can be defined and how it can be measured. The GI barrier adjacent to the GI microbiota appears to be the key to understanding the complex mechanisms that maintain gut health. Any impairment of the GI barrier can increase the risk of developing infectious, inflammatory and functional GI diseases, as well as extraintestinal diseases such as immune-mediated and metabolic disorders. Less clear, however, is whether GI discomfort in general can also be related to GI barrier functions. In any case, methods of assessing, improving and maintaining gut health-related GI functions are of major interest in preventive medicine
Genome sequencing reveals Zika virus diversity and spread in the Americas
Although the recent Zika virus (ZIKV) epidemic in the Americas and its link to birth defects have attracted a great deal of attention, much remains unknown about ZIKV disease epidemiology and ZIKV evolution, in part owing to a lack of genomic data. Here we address this gap in knowledge by using multiple sequencing approaches to generate 110 ZIKV genomes from clinical and mosquito samples from 10 countries and territories, greatly expanding the observed viral genetic diversity from this outbreak. We analysed the timing and patterns of introductions into distinct geographic regions; our phylogenetic evidence suggests rapid expansion of the outbreak in Brazil and multiple introductions of outbreak strains into Puerto Rico, Honduras, Colombia, other Caribbean islands, and the continental United States. We find that ZIKV circulated undetected in multiple regions for many months before the first locally transmitted cases were confirmed, highlighting the importance of surveillance of viral infections. We identify mutations with possible functional implications for ZIKV biology and pathogenesis, as well as those that might be relevant to the effectiveness of diagnostic tests
- âŠ