16 research outputs found
STATegra: Multi-Omics Data Integration - A Conceptual Scheme With a Bioinformatics Pipeline
Technologies for profiling samples using different omics platforms have been at the forefront since the human genome project. Large-scale multi-omics data hold the promise of deciphering different regulatory layers. Yet, while there is a myriad of bioinformatics tools, each multi-omics analysis appears to start from scratch with an arbitrary decision over which tools to use and how to combine them. Therefore, it is an unmet need to conceptualize how to integrate such data and implement and validate pipelines in different cases. We have designed a conceptual framework (STATegra), aiming it to be as generic as possible for multi-omics analysis, combining available multi-omic anlaysis tools (machine learning component analysis, non-parametric data combination, and a multi-omics exploratory analysis) in a step-wise manner. While in several studies, we have previously combined those integrative tools, here, we provide a systematic description of the STATegra framework and its validation using two The Cancer Genome Atlas (TCGA) case studies. For both, the Glioblastoma and the Skin Cutaneous Melanoma (SKCM) cases, we demonstrate an enhanced capacity of the framework (and beyond the individual tools) to identify features and pathways compared to single-omics analysis. Such an integrative multi-omics analysis framework for identifying features and components facilitates the discovery of new biology. Finally, we provide several options for applying the STATegra framework when parametric assumptions are fulfilled and for the case when not all the samples are profiled for all omics. The STATegra framework is built using several tools, which are being integrated step-by-step as OpenSource in the STATegRa Bioconductor packag
STATegra, a comprehensive multi-omics dataset of B-cell differentiation in mouse
Multi-omics approaches use a diversity of high-throughput technologies to profile the different molecular layers of living cells. Ideally, the integration of this information should result in comprehensive systems models of cellular physiology and regulation. However, most multi-omics projects still include a limited number of molecular assays and there have been very few multi-omic studies that evaluate dynamic processes such as cellular growth, development and adaptation. Hence, we lack formal analysis methods and comprehensive multi-omics datasets that can be leveraged to develop true multi-layered models for dynamic cellular systems. Here we present the STAT egra multi-omics dataset that combines measurements from up to 10 different omics technologies applied to the same biological system, namely the well-studied mouse pre-B-cell differentiation. STATegra include
Semi-automated non-target processing in GC × GC–MS metabolomics analysis: applicability for biomedical studies
Due to the complexity of typical metabolomics samples and the many steps required to obtain quantitative data in GC × GC–MS consisting of deconvolution, peak picking, peak merging, and integration, the unbiased non-target quantification of GC × GC–MS data still poses a major challenge in metabolomics analysis. The feasibility of using commercially available software for non-target processing of GC × GC–MS data was assessed. For this purpose a set of mouse liver samples (24 study samples and five quality control (QC) samples prepared from the study samples) were measured with GC × GC–MS and GC–MS to study the development and progression of insulin resistance, a primary characteristic of diabetes type 2. A total of 170 and 691 peaks were quantified in, respectively, the GC–MS and GC × GC–MS data for all study and QC samples. The quantitative results for the QC samples were compared to assess the quality of semi-automated GC × GC–MS processing compared to targeted GC–MS processing which involved time-consuming manual correction of all wrongly integrated metabolites and was considered as golden standard. The relative standard deviations (RSDs) obtained with GC × GC–MS were somewhat higher than with GC–MS, due to less accurate processing. Still, the biological information in the study samples was preserved and the added value of GC × GC–MS was demonstrated; many additional candidate biomarkers were found with GC × GC–MS compared to GC–MS
Performance of methods that separate common and distinct variation in multiple data blocks
submittedVersio
Improving gene set enrichment analysis (GSEA) by using regulation directionality
To infer the biological meaning from transcriptome data, it is useful to focus on genes that are regulated by the same regulator, i.e., regulons. Unfortunately, current gene set enrichment analysis (GSEA) tools do not consider whether a gene is activated or repressed by a regulator. This distinction is crucial when analyzing regulons since a regulator can work as an activator of certain genes and as a repressor of other genes, yet both sets of genes belong to the same regulon. Therefore, simply averaging expression differences of the genes of such a regulon will not properly reflect the activity of the regulator. What makes it more complicated is the fact that many genes are regulated by different transcription factors, and current transcriptome analysis tools are unable to indicate which regulator is most likely responsible for the observed expression differencedifferencedifferenceof a gene. To address these challenges, we developed the gene set enrichment analysis program GINtool. Additional features of GINtool are novel graphical representations to facilitate the visualization of gene set analyses of transcriptome data, the possibility to include functional categories as gene sets for analysis, and the option to analyze expression differences within operons, which is useful when analyzing prokaryotic transcriptome and also proteome data. IMPORTANCE Measuring the activity of all genes in cells is a common way to elucidate the function and regulation of genes. These transcriptome analyses produce large amounts of data since genomes contain thousands of genes. The analysis of these large data sets is challenging. Therefore, we developed a new software tool called GINtool that can facilitate the analysis of transcriptome data by using prior knowledge of gene sets controlled by the same regulator, the so-called regulons. An important novelty of GINtool is that it can take into account the directionality of gene regulation in these analyses, i.e., whether a gene is activated or repressed, which is crucial to assess whether a regulon or functional category is affected. GINtool also includes new graphical methods to facilitate the visual inspection of regulation events in transcriptome data sets. These and additional analysis methods included in GINtool make it a powerful software tool to analyze transcriptome data.</p
Matrix Effect Compensation in Small-Molecule Profiling for an LC–TOF Platform Using Multicomponent Postcolumn Infusion
The
possible presence of matrix effect is one of the main concerns
in liquid chromatography–mass spectrometry (LC–MS)-driven
bioanalysis due to its impact on the reliability of the obtained quantitative
results. Here we propose an approach to correct for the matrix effect
in LC–MS with electrospray ionization using postcolumn infusion
of eight internal standards (PCI-IS). We applied this approach to
a generic ultraperformance liquid chromatography–time-of-flight
(UHPLC–TOF) platform developed for small-molecule profiling
with a main focus on drugs. Different urine samples were spiked with
19 drugs with different physicochemical properties and analyzed in
order to study matrix effect (in absolute and relative terms). Furthermore,
calibration curves for each analyte were constructed and quality control
samples at different concentration levels were analyzed to check the
applicability of this approach in quantitative analysis. The matrix
effect profiles of the PCI-ISs were different: this confirms that
the matrix effect is compound-dependent, and therefore the most suitable
PCI-IS has to be chosen for each analyte. Chromatograms were reconstructed
using analyte and PCI-IS responses, which were used to develop an
optimized method which compensates for variation in ionization efficiency.
The approach presented here improved the results in terms of matrix
effect dramatically. Furthermore, calibration curves of higher quality
are obtained, dynamic range is enhanced, and accuracy and precision
of QC samples is increased. The use of PCI-ISs is a very promising
step toward an analytical platform free of matrix effect, which can
make LC–MS analysis even more successful, adding a higher reliability
in quantification to its intrinsic high sensitivity and selectivity
Low-density lipoprotein receptor-knockout mice display impaired spatial memory associated with a decreased synaptic density in the hippocampus
The low-density lipoprotein receptor (LDLR) is the first described receptor for apolipoprotein E (apoE). We hypothesize that the absence of the LDLR, similar to the absence of apoE, results in impaired learning and memory processes. Six-month-old homozygous Ldlr-/- and wild-type littermates (Ldlr+/+), maintained on a standard lab chow diet, were used. Unlike humans, Ldlr-/- mice, under these conditions, do not develop atherosclerosis. The results of the Morris water escape task revealed an impaired spatial memory in the Ldlr-/- mice in comparison with Ldlr+/+ mice. Also in a T-maze task, the working memory performance of the Ldlr-/- mice was impaired. Furthermore, Ldlr-/- mice, in comparison with Ldlr+/+ mice, display a decreased number of synaptophysin-immunoreactive presynaptic boutons in the hippocampus CA1. In conclusion, the results show in mice deficiency for the LDLR results in impaired hippocampal-dependent memory functions. A decrease in the number of presynaptic boutons may underlay these behavioral alterations. Therefore, the LDLR may be an important receptor for apoE in the central nervous system
Adult GAMT deficiency: A literature review and report of two siblings
Guanidinoacetate methyltransferase (GAMT) deficiency is a creatine deficiency disorder and an inborn error of metabolism presenting with progressive intellectual and neurological deterioration. As most cases are identified and treated in early childhood, adult phenotypes that can help in understanding the natural history of the disorder are rare. We describe two adult cases of GAMT deficiency from a consanguineous family in Pakistan that presented with a history of global developmental delay, cognitive impairments, excessive drooling, behavioral abnormalities, contractures and apparent bone deformities initially presumed to be the reason for abnormal gait. Exome sequencing identified a homozygous nonsense variant in GAMT: NM_000156.5:c.134G>A (p.Trp45*). We also performed a literature review and compiled the genetic and clinical characteristics of all adult cases of GAMT deficiency reported to date. When compared to the adult cases previously reported, the musculoskeletal phenotype and the rapidly progressive nature of neurological and motor decline seen in our patients is striking. This study presents an opportunity to gain insights into the adult presentation of GAMT deficiency and highlights the need for in-depth evaluation and reporting of clinical features to expand our understanding of the phenotypic spectrum
Multi-Omic Approach to Identify Phenotypic Modifiers Underlying Cerebral Demyelination in X-Linked Adrenoleukodystrophy
X-linked adrenoleukodystrophy (ALD) is a peroxisomal metabolic disorder with a highly complex clinical presentation. ALD is caused by mutations in the ABCD1 gene, and is characterized by the accumulation of very long-chain fatty acids in plasma and tissues. Disease-causing mutations are ‘loss of function’ mutations, with no prognostic value with respect to the clinical outcome of an individual. All male patients with ALD develop spinal cord disease and a peripheral neuropathy in adulthood, although age of onset is highly variable. However, the lifetime prevalence to develop progressive white matter lesions, termed cerebral ALD (CALD), is only about 60%. Early identification of transition to CALD is critical since it can be halted by allogeneic hematopoietic stem cell therapy only in an early stage. The primary goal of this study is to identify molecular markers which may be prognostic of cerebral demyelination from a simple blood sample, with the hope that blood-based assays can replace the current protocols for diagnosis. We collected six well-characterized brother pairs affected by ALD and discordant for the presence of CALD and performed multi-omic profiling of blood samples including genome, epigenome, transcriptome, metabolome/lipidome, and proteome profiling. In our analysis we identify discordant genomic alleles present across all families as well as differentially abundant molecular features across the omics technologies. The analysis was focused on univariate modeling to discriminate the two phenotypic groups, but was unable to identify statistically significant candidate molecular markers. Our study highlights the issues caused by a large amount of inter-individual variation, and supports the emerging hypothesis that cerebral demyelination is a complex mix of environmental factors and/or heterogeneous genomic alleles. We confirm previous observations about the role of immune response, specifically auto-immunity and the potential role of PFN1 protein overabundance in CALD in a subset of the families. We envision our methodology as well as dataset has utility to the field for reproducing previous or enabling future modifier investigations