65 research outputs found
Novel personalized pathway-based metabolomics models reveal key metabolic pathways for breast cancer diagnosis
Comparison of logistic regression, SVM and random forest performance in the plasma training data set. Table S2. Pathway significance and relative log fold changes in our metabolomics data and TCGA breast cancer RNA-Seq data. Table S3. Detected metabolites and their differential test results among the two models. a All-stage diagnosis model. b Early-stage diagnosis model. Table S4. Single-variate logistic analysis of metabolites or pathways selected as features in the metabolite-based or pathway-based early-stage diagnosis model. Table S5. Comparison of pathway features in the full-size (101 input pathways) and half-size (51 input pathways) pathway-based early-stage diagnosis models. (DOCX 34 kb
DeepImpute: an accurate, fast, and scalable deep neural network method to impute single-cell RNA-seq data
Abstract
Single-cell RNA sequencing (scRNA-seq) offers new opportunities to study gene expression of tens of thousands of single cells simultaneously. We present DeepImpute, a deep neural network-based imputation algorithm that uses dropout layers and loss functions to learn patterns in the data, allowing for accurate imputation. Overall, DeepImpute yields better accuracy than other six publicly available scRNA-seq imputation methods on experimental data, as measured by the mean squared error or Pearson’s correlation coefficient. DeepImpute is an accurate, fast, and scalable imputation tool that is suited to handle the ever-increasing volume of scRNA-seq data, and is freely available at
https://github.com/lanagarmire/DeepImpute
.https://deepblue.lib.umich.edu/bitstream/2027.42/152237/1/13059_2019_Article_1837.pd
Genome-scale hypomethylation in the cord blood DNAs associated with early onset preeclampsia
Background: Preeclampsia is one of the leading causes of fetal and maternal morbidity and mortality worldwide.
Preterm babies of mothers with early onset preeclampsia (EOPE) are at higher risks for various diseases later on in life,
including cardiovascular diseases. We hypothesized that genome-wide epigenetic alterations occur in cord blood DNAs
in association with EOPE and conducted a case control study to compare the genome-scale methylome differences in
cord blood DNAs between 12 EOPE-associated and 8 normal births.
Results: Bioinformatics analysis of methylation data from the Infinium HumanMethylation450 BeadChip shows a
genome-scale hypomethylation pattern in EOPE, with 51,486 hypomethylated CpG sites and 12,563 hypermethylated sites
(adjusted P <0.05). A similar trend also exists in the proximal promoters (TSS200) associated with protein-coding genes.
Using summary statistics on the CpG sites in TSS200 regions, promoters of 643 and 389 genes are hypomethylated and
hypermethylated, respectively. Promoter-based differential methylation (DM) analysis reveals that genes in the farnesoid X
receptor and liver X receptor (FXR/LXR) pathway are enriched, indicating dysfunction of lipid metabolism in cord blood
cells. Additional biological functional alterations involve inflammation, cell growth, and hematological system
development. A two-way ANOVA analysis among coupled cord blood and amniotic membrane samples shows
that a group of genes involved in inflammation, lipid metabolism, and proliferation are persistently differentially
methylated in both tissues, including IL12B, FAS, PIK31, and IGF1.
Conclusions: These findings provide, for the first time, evidence of prominent genome-scale DNA methylation
modifications in cord blood DNAs associated with EOPE. They may suggest a connection between inflammation and
lipid dysregulation in EOPE-associated newborns and a higher risk of cardiovascular diseases later in adulthood
Challenges and perspectives in computational deconvolution in genomics data
Deciphering cell type heterogeneity is crucial for systematically
understanding tissue homeostasis and its dysregulation in diseases.
Computational deconvolution is an efficient approach to estimate cell type
abundances from a variety of omics data. Despite significant methodological
progress in computational deconvolution in recent years, challenges are still
outstanding. Here we enlist four significant challenges from availability of
the reference data, generation of simulation data, limitations of computational
methodologies, and benchmarking design and implementation. Finally, we make
recommendations on reference data generation, new directions of computational
methodologies and strategies to promote rigorous benchmarking
Perceptual and technical barriers in sharing and formatting metadata accompanying omics studies
Metadata, often termed "data about data," is crucial for organizing,
understanding, and managing vast omics datasets. It aids in efficient data
discovery, integration, and interpretation, enabling users to access,
comprehend, and utilize data effectively. Its significance spans the domains of
scientific research, facilitating data reproducibility, reusability, and
secondary analysis. However, numerous perceptual and technical barriers hinder
the sharing of metadata among researchers. These barriers compromise the
reliability of research results and hinder integrative meta-analyses of omics
studies . This study highlights the key barriers to metadata sharing, including
the lack of uniform standards, privacy and legal concerns, limitations in study
design, limited incentives, inadequate infrastructure, and the dearth of
well-trained personnel for metadata management and reuse. Proposed solutions
include emphasizing the promotion of standardization, educational efforts, the
role of journals and funding agencies, incentives and rewards, and the
improvement of infrastructure. More accurate, reliable, and impactful research
outcomes are achievable if the scientific community addresses these barriers,
facilitating more accurate, reliable, and impactful research outcomes
A Global Clustering Algorithm to Identify Long Intergenic Non-Coding RNA - with Applications in Mouse Macrophages
Identification of diffuse signals from the chromatin immunoprecipitation and high-throughput massively parallel sequencing (ChIP-Seq) technology poses significant computational challenges, and there are few methods currently available. We present a novel global clustering approach to enrich diffuse CHIP-Seq signals of RNA polymerase II and histone 3 lysine 4 trimethylation (H3K4Me3) and apply it to identify putative long intergenic non-coding RNAs (lincRNAs) in macrophage cells. Our global clustering method compares favorably to the local clustering method SICER that was also designed to identify diffuse CHIP-Seq signals. The validity of the algorithm is confirmed at several levels. First, 8 out of a total of 11 selected putative lincRNA regions in primary macrophages respond to lipopolysaccharides (LPS) treatment as predicted by our computational method. Second, the genes nearest to lincRNAs are enriched with biological functions related to metabolic processes under resting conditions but with developmental and immune-related functions under LPS treatment. Third, the putative lincRNAs have conserved promoters, modestly conserved exons, and expected secondary structures by prediction. Last, they are enriched with motifs of transcription factors such as PU.1 and AP.1, previously shown to be important lineage determining factors in macrophages, and 83% of them overlap with distal enhancers markers. In summary, GCLS based on RNA polymerase II and H3K4Me3 CHIP-Seq method can effectively detect putative lincRNAs that exhibit expected characteristics, as exemplified by macrophages in the study
- …