192 research outputs found
GSAE: an autoencoder with embedded gene-set nodes for genomics functional characterization
Bioinformatics tools have been developed to interpret gene expression data at
the gene set level, and these gene set based analyses improve the biologists'
capability to discover functional relevance of their experiment design. While
elucidating gene set individually, inter gene sets association is rarely taken
into consideration. Deep learning, an emerging machine learning technique in
computational biology, can be used to generate an unbiased combination of gene
set, and to determine the biological relevance and analysis consistency of
these combining gene sets by leveraging large genomic data sets. In this study,
we proposed a gene superset autoencoder (GSAE), a multi-layer autoencoder model
with the incorporation of a priori defined gene sets that retain the crucial
biological features in the latent layer. We introduced the concept of the gene
superset, an unbiased combination of gene sets with weights trained by the
autoencoder, where each node in the latent layer is a superset. Trained with
genomic data from TCGA and evaluated with their accompanying clinical
parameters, we showed gene supersets' ability of discriminating tumor subtypes
and their prognostic capability. We further demonstrated the biological
relevance of the top component gene sets in the significant supersets. Using
autoencoder model and gene superset at its latent layer, we demonstrated that
gene supersets retain sufficient biological information with respect to tumor
subtypes and clinical prognostic significance. Superset also provides high
reproducibility on survival analysis and accurate prediction for cancer
subtypes.Comment: Presented in the International Conference on Intelligent Biology and
Medicine (ICIBM 2018) at Los Angeles, CA, USA and published in BMC Systems
Biology 2018, 12(Suppl 8):14
Predicting drug response of tumors from integrated genomic profiles by deep neural networks
The study of high-throughput genomic profiles from a pharmacogenomics
viewpoint has provided unprecedented insights into the oncogenic features
modulating drug response. A recent screening of ~1,000 cancer cell lines to a
collection of anti-cancer drugs illuminated the link between genotypes and
vulnerability. However, due to essential differences between cell lines and
tumors, the translation into predicting drug response in tumors remains
challenging. Here we proposed a DNN model to predict drug response based on
mutation and expression profiles of a cancer cell or a tumor. The model
contains a mutation and an expression encoders pre-trained using a large
pan-cancer dataset to abstract core representations of high-dimension data,
followed by a drug response predictor network. Given a pair of mutation and
expression profiles, the model predicts IC50 values of 265 drugs. We trained
and tested the model on a dataset of 622 cancer cell lines and achieved an
overall prediction performance of mean squared error at 1.96 (log-scale IC50
values). The performance was superior in prediction error or stability than two
classical methods and four analog DNNs of our model. We then applied the model
to predict drug response of 9,059 tumors of 33 cancer types. The model
predicted both known, including EGFR inhibitors in non-small cell lung cancer
and tamoxifen in ER+ breast cancer, and novel drug targets. The comprehensive
analysis further revealed the molecular mechanisms underlying the resistance to
a chemotherapeutic drug docetaxel in a pan-cancer setting and the anti-cancer
potential of a novel agent, CX-5461, in treating gliomas and hematopoietic
malignancies. Overall, our model and findings improve the prediction of drug
response and the identification of novel therapeutic options.Comment: Accepted for presentation in the International Conference on
Intelligent Biology and Medicine (ICIBM 2018) at Los Angeles, CA, USA.
Currently under consideration for publication in a Supplement Issue of BMC
Genomic
Differential expression analysis of RNA sequencing data by incorporating non-exonic mapped reads
Background RNA sequencing (RNA-seq) is a powerful tool for genome-wide expression profiling of biological samples with the advantage of high-throughput and high resolution. There are many existing algorithms nowadays for quantifying expression levels and detecting differential gene expression, but none of them takes the misaligned reads that are mapped to non-exonic regions into account. We developed a novel algorithm, XBSeq, where a statistical model was established based on the assumption that observed signals are the convolution of true expression signals and sequencing noises. The mapped reads in non-exonic regions are considered as sequencing noises, which follows a Poisson distribution. Given measureable observed and noise signals from RNA-seq data, true expression signals, assuming governed by the negative binomial distribution, can be delineated and thus the accurate detection of differential expressed genes. Results We implemented our novel XBSeq algorithm and evaluated it by using a set of simulated expression datasets under different conditions, using a combination of negative binomial and Poisson distributions with parameters derived from real RNA-seq data. We compared the performance of our method with other commonly used differential expression analysis algorithms. We also evaluated the changes in true and false positive rates with variations in biological replicates, differential fold changes, and expression levels in non-exonic regions. We also tested the algorithm on a set of real RNA-seq data where the common and different detection results from different algorithms were reported. Conclusions In this paper, we proposed a novel XBSeq, a differential expression analysis algorithm for RNA-seq data that takes non-exonic mapped reads into consideration. When background noise is at baseline level, the performance of XBSeq and DESeq are mostly equivalent. However, our method surpasses DESeq and other algorithms with the increase of non-exonic mapped reads. Only in very low read count condition XBSeq had a slightly higher false discovery rate, which may be improved by adjusting the background noise effect in this situation. Taken together, by considering non-exonic mapped reads, XBSeq can provide accurate expression measurement and thus detect differential expressed genes even in noisy conditions
Fifteen new risk loci for coronary artery disease highlight arterial-wall-specific mechanisms
Coronary artery disease (CAD) is a leading cause of morbidity and mortality worldwide. Although 58 genomic regions have been associated with CAD thus far, most of the heritability is unexplained, indicating that additional susceptibility loci await identification. An efficient discovery strategy may be larger-scale evaluation of promising associations suggested by genome-wide association studies (GWAS). Hence, we genotyped 56,309 participants using a targeted gene array derived from earlier GWAS results and performed meta-analysis of results with 194,427 participants previously genotyped, totaling 88,192 CAD cases and 162,544 controls. We identified 25 new SNP-CAD associations (P < 5 × 10(-8), in fixed-effects meta-analysis) from 15 genomic regions, including SNPs in or near genes involved in cellular adhesion, leukocyte migration and atherosclerosis (PECAM1, rs1867624), coagulation and inflammation (PROCR, rs867186 (p.Ser219Gly)) and vascular smooth muscle cell differentiation (LMOD1, rs2820315). Correlation of these regions with cell-type-specific gene expression and plasma protein levels sheds light on potential disease mechanisms
New genetic loci implicated in fasting glucose homeostasis and their impact on type 2 diabetes risk.
Levels of circulating glucose are tightly regulated. To identify new loci influencing glycemic traits, we performed meta-analyses of 21 genome-wide association studies informative for fasting glucose, fasting insulin and indices of beta-cell function (HOMA-B) and insulin resistance (HOMA-IR) in up to 46,186 nondiabetic participants. Follow-up of 25 loci in up to 76,558 additional subjects identified 16 loci associated with fasting glucose and HOMA-B and two loci associated with fasting insulin and HOMA-IR. These include nine loci newly associated with fasting glucose (in or near ADCY5, MADD, ADRA2A, CRY2, FADS1, GLIS3, SLC2A2, PROX1 and C2CD4B) and one influencing fasting insulin and HOMA-IR (near IGF1). We also demonstrated association of ADCY5, PROX1, GCK, GCKR and DGKB-TMEM195 with type 2 diabetes. Within these loci, likely biological candidate genes influence signal transduction, cell proliferation, development, glucose-sensing and circadian regulation. Our results demonstrate that genetic studies of glycemic traits can identify type 2 diabetes risk loci, as well as loci containing gene variants that are associated with a modest elevation in glucose levels but are not associated with overt diabetes
Recommended from our members
Lineage of origin in rhabdomyosarcoma informs pharmacological response
Lineage or cell of origin of cancers is often unknown and thus is not a consideration in therapeutic approaches. Alveolar rhabdomyosarcoma (aRMS) is an aggressive childhood cancer for which the cell of origin remains debated. We used conditional genetic mouse models of aRMS to activate the pathognomonic Pax3:Foxo1 fusion oncogene and inactivate p53 in several stages of prenatal and postnatal muscle development. We reveal that lineage of origin significantly influences tumor histomorphology and sensitivity to targeted therapeutics. Furthermore, we uncovered differential transcriptional regulation of the Pax3:Foxo1 locus by tumor lineage of origin, which led us to identify the histone deacetylase inhibitor entinostat as a pharmacological agent for the potential conversion of Pax3:Foxo1-positive aRMS to a state akin to fusion-negative RMS through direct transcriptional suppression of Pax3:Foxo1.Stem Cell and Regenerative Biolog
X-chromosome and kidney function:evidence from a multi-trait genetic analysis of 908,697 individuals reveals sex-specific and sex-differential findings in genes regulated by androgen response elements
X-chromosomal genetic variants are understudied but can yield valuable insights into sexually dimorphic human traits and diseases. We performed a sex-stratified cross-ancestry X-chromosome-wide association meta-analysis of seven kidney-related traits (n = 908,697), identifying 23 loci genome-wide significantly associated with two of the traits: 7 for uric acid and 16 for estimated glomerular filtration rate (eGFR), including four novel eGFR loci containing the functionally plausible prioritized genes ACSL4, CLDN2, TSPAN6 and the female-specific DRP2. Further, we identified five novel sex-interactions, comprising male-specific effects at FAM9B and AR/EDA2R, and three sex-differential findings with larger genetic effect sizes in males at DCAF12L1 and MST4 and larger effect sizes in females at HPRT1. All prioritized genes in loci showing significant sex-interactions were located next to androgen response elements (ARE). Five ARE genes showed sex-differential expressions. This study contributes new insights into sex-dimorphisms of kidney traits along with new prioritized gene targets for further molecular research.</p
Genome-wide meta-analysis of 241,258 adults accounting for smoking behaviour identifies novel loci for obesity traits
Few genome-wide association studies (GWAS) account for environmental exposures, like smoking, potentially impacting the overall trait variance when investigating the genetic contribution to obesity-related traits. Here, we use GWAS data from 51,080 current smokers and 190,178 nonsmokers (87% European descent) to identify loci influencing BMI and central adiposity, measured as waist circumference and waist-to-hip ratio both adjusted for BMI. We identify 23 novel genetic loci, and 9 loci with convincing evidence of gene-smoking interaction (GxSMK) on obesity-related traits. We show consistent direction of effect for all identified loci and significance for 18 novel and for 5 interaction loci in an independent study sample. These loci highlight novel biological functions, including response to oxidative stress, addictive behaviour, and regulatory functions emphasizing the importance of accounting for environment in genetic analyses. Our results suggest that tobacco smoking may alter the genetic susceptibility to overall adiposity and body fat distribution.Peer reviewe
Genome-wide association study identifies six new loci influencing pulse pressure and mean arterial pressure.
Numerous genetic loci have been associated with systolic blood pressure (SBP) and diastolic blood pressure (DBP) in Europeans. We now report genome-wide association studies of pulse pressure (PP) and mean arterial pressure (MAP). In discovery (N = 74,064) and follow-up studies (N = 48,607), we identified at genome-wide significance (P = 2.7 × 10(-8) to P = 2.3 × 10(-13)) four new PP loci (at 4q12 near CHIC2, 7q22.3 near PIK3CG, 8q24.12 in NOV and 11q24.3 near ADAMTS8), two new MAP loci (3p21.31 in MAP4 and 10q25.3 near ADRB1) and one locus associated with both of these traits (2q24.3 near FIGN) that has also recently been associated with SBP in east Asians. For three of the new PP loci, the estimated effect for SBP was opposite of that for DBP, in contrast to the majority of common SBP- and DBP-associated variants, which show concordant effects on both traits. These findings suggest new genetic pathways underlying blood pressure variation, some of which may differentially influence SBP and DBP
- …