170 research outputs found

    Expression quantitative trait loci are highly sensitive to cellular differentiation state

    Get PDF
    Blood cell development from multipotent hematopoietic stem cells to specialized blood cells is accompanied by drastic changes in gene expression for which the triggers remain mostly unknown. Genetical genomics is an approach linking natural genetic variation to gene expression variation, thereby allowing the identification of genomic loci containing gene expression modulators (eQTLs). In this paper, we used a genetical genomics approach to analyze gene expression across four developmentally close blood cell types collected from a large number of genetically different but related mouse strains. We found that, while a significant number of eQTLs (365) had a consistent “static” regulatory effect on gene expression, an even larger number were found to be very sensitive to cell stage. As many as 1,283 eQTLs exhibited a “dynamic” behavior across cell types. By looking more closely at these dynamic eQTLs, we show that the sensitivity of eQTLs to cell stage is largely associated with gene expression changes in target genes. These results stress the importance of studying gene expression variation in well-defined cell populations. Only such studies will be able to reveal the important differences in gene regulation between different ce

    A genome-wide association study identifies protein quantitative trait loci (pQTLs)

    Get PDF
    There is considerable evidence that human genetic variation influences gene expression. Genome-wide studies have revealed that mRNA levels are associated with genetic variation in or close to the gene coding for those mRNA transcripts - cis effects, and elsewhere in the genome - trans effects. The role of genetic variation in determining protein levels has not been systematically assessed. Using a genome-wide association approach we show that common genetic variation influences levels of clinically relevant proteins in human serum and plasma. We evaluated the role of 496,032 polymorphisms on levels of 42 proteins measured in 1200 fasting individuals from the population based InCHIANTI study. Proteins included insulin, several interleukins, adipokines, chemokines, and liver function markers that are implicated in many common diseases including metabolic, inflammatory, and infectious conditions. We identified eight Cis effects, including variants in or near the IL6R (p = 1.8×10 -57), CCL4L1 (p = 3.9×10-21), IL18 (p = 6.8×10-13), LPA (p = 4.4×10-10), GGT1 (p = 1.5×10-7), SHBG (p = 3.1×10-7), CRP (p = 6.4×10-6) and IL1RN (p = 7.3×10-6) genes, all associated with their respective protein products with effect sizes ranging from 0.19 to 0.69 standard deviations per allele. Mechanisms implicated include altered rates of cleavage of bound to unbound soluble receptor (IL6R), altered secretion rates of different sized proteins (LPA), variation in gene copy number (CCL4L1) and altered transcription (GGT1). We identified one novel trans effect that was an association between ABO blood group and tumour necrosis factor alpha (TNF-alpha) levels (p = 6.8×10-40), but this finding was not present when TNF-alpha was measured using a different assay , or in a second study, suggesting an assay-specific association. Our results show that protein levels share some of the features of the genetics of gene expression. These include the presence of strong genetic effects in cis locations. The identification of protein quantitative trait loci (pQTLs) may be a powerful complementary method of improving our understanding of disease pathways. © 2008 Melzer et al

    Elastic SCAD as a novel penalization method for SVM classification tasks in high-dimensional data

    Get PDF
    <p>Abstract</p> <p>Background</p> <p>Classification and variable selection play an important role in knowledge discovery in high-dimensional data. Although Support Vector Machine (SVM) algorithms are among the most powerful classification and prediction methods with a wide range of scientific applications, the SVM does not include automatic feature selection and therefore a number of feature selection procedures have been developed. Regularisation approaches extend SVM to a feature selection method in a flexible way using penalty functions like LASSO, SCAD and Elastic Net.</p> <p>We propose a novel penalty function for SVM classification tasks, Elastic SCAD, a combination of SCAD and ridge penalties which overcomes the limitations of each penalty alone.</p> <p>Since SVM models are extremely sensitive to the choice of tuning parameters, we adopted an interval search algorithm, which in comparison to a fixed grid search finds rapidly and more precisely a global optimal solution.</p> <p>Results</p> <p>Feature selection methods with combined penalties (Elastic Net and Elastic SCAD SVMs) are more robust to a change of the model complexity than methods using single penalties. Our simulation study showed that Elastic SCAD SVM outperformed LASSO (<it>L</it><sub>1</sub>) and SCAD SVMs. Moreover, Elastic SCAD SVM provided sparser classifiers in terms of median number of features selected than Elastic Net SVM and often better predicted than Elastic Net in terms of misclassification error.</p> <p>Finally, we applied the penalization methods described above on four publicly available breast cancer data sets. Elastic SCAD SVM was the only method providing robust classifiers in sparse and non-sparse situations.</p> <p>Conclusions</p> <p>The proposed Elastic SCAD SVM algorithm provides the advantages of the SCAD penalty and at the same time avoids sparsity limitations for non-sparse data. We were first to demonstrate that the integration of the interval search algorithm and penalized SVM classification techniques provides fast solutions on the optimization of tuning parameters.</p> <p>The penalized SVM classification algorithms as well as fixed grid and interval search for finding appropriate tuning parameters were implemented in our freely available R package 'penalizedSVM'.</p> <p>We conclude that the Elastic SCAD SVM is a flexible and robust tool for classification and feature selection tasks for high-dimensional data such as microarray data sets.</p

    Height and timing of growth spurt during puberty in young people living with vertically acquired HIV in Europe and Thailand.

    Get PDF
    OBJECTIVE: The aim of this study was to describe growth during puberty in young people with vertically acquired HIV. DESIGN: Pooled data from 12 paediatric HIV cohorts in Europe and Thailand. METHODS: One thousand and ninety-four children initiating a nonnucleoside reverse transcriptase inhibitor or boosted protease inhibitor based regimen aged 1-10 years were included. Super Imposition by Translation And Rotation (SITAR) models described growth from age 8 years using three parameters (average height, timing and shape of the growth spurt), dependent on age and height-for-age z-score (HAZ) (WHO references) at antiretroviral therapy (ART) initiation. Multivariate regression explored characteristics associated with these three parameters. RESULTS: At ART initiation, median age and HAZ was 6.4 [interquartile range (IQR): 2.8, 9.0] years and -1.2 (IQR: -2.3 to -0.2), respectively. Median follow-up was 9.1 (IQR: 6.9, 11.4) years. In girls, older age and lower HAZ at ART initiation were independently associated with a growth spurt which occurred 0.41 (95% confidence interval 0.20-0.62) years later in children starting ART age 6 to 10 years compared with 1 to 2 years and 1.50 (1.21-1.78) years later in those starting with HAZ less than -3 compared with HAZ at least -1. Later growth spurts in girls resulted in continued height growth into later adolescence. In boys starting ART with HAZ less than -1, growth spurts were later in children starting ART in the oldest age group, but for HAZ at least -1, there was no association with age. Girls and boys who initiated ART with HAZ at least -1 maintained a similar height to the WHO reference mean. CONCLUSION: Stunting at ART initiation was associated with later growth spurts in girls. Children with HAZ at least -1 at ART initiation grew in height at the level expected in HIV negative children of a comparable age

    Dissecting Early Differentially Expressed Genes in a Mixture of Differentiating Embryonic Stem Cells

    Get PDF
    The differentiation of embryonic stem cells is initiated by a gradual loss of pluripotency-associated transcripts and induction of differentiation genes. Accordingly, the detection of differentially expressed genes at the early stages of differentiation could assist the identification of the causal genes that either promote or inhibit differentiation. The previous methods of identifying differentially expressed genes by comparing different cell types would inevitably include a large portion of genes that respond to, rather than regulate, the differentiation process. We demonstrate through the use of biological replicates and a novel statistical approach that the gene expression data obtained without prior separation of cell types are informative for detecting differentially expressed genes at the early stages of differentiation. Applying the proposed method to analyze the differentiation of murine embryonic stem cells, we identified and then experimentally verified Smarcad1 as a novel regulator of pluripotency and self-renewal. We formalized this statistical approach as a statistical test that is generally applicable to analyze other differentiation processes

    Functional Cohesion of Gene Sets Determined by Latent Semantic Indexing of PubMed Abstracts

    Get PDF
    High-throughput genomic technologies enable researchers to identify genes that are co-regulated with respect to specific experimental conditions. Numerous statistical approaches have been developed to identify differentially expressed genes. Because each approach can produce distinct gene sets, it is difficult for biologists to determine which statistical approach yields biologically relevant gene sets and is appropriate for their study. To address this issue, we implemented Latent Semantic Indexing (LSI) to determine the functional coherence of gene sets. An LSI model was built using over 1 million Medline abstracts for over 20,000 mouse and human genes annotated in Entrez Gene. The gene-to-gene LSI-derived similarities were used to calculate a literature cohesion p-value (LPv) for a given gene set using a Fisher's exact test. We tested this method against genes in more than 6,000 functional pathways annotated in Gene Ontology (GO) and found that approximately 75% of gene sets in GO biological process category and 90% of the gene sets in GO molecular function and cellular component categories were functionally cohesive (LPv<0.05). These results indicate that the LPv methodology is both robust and accurate. Application of this method to previously published microarray datasets demonstrated that LPv can be helpful in selecting the appropriate feature extraction methods. To enable real-time calculation of LPv for mouse or human gene sets, we developed a web tool called Gene-set Cohesion Analysis Tool (GCAT). GCAT can complement other gene set enrichment approaches by determining the overall functional cohesion of data sets, taking into account both explicit and implicit gene interactions reported in the biomedical literature

    Identification, Replication, and Functional Fine-Mapping of Expression Quantitative Trait Loci in Primary Human Liver Tissue

    Get PDF
    The discovery of expression quantitative trait loci (“eQTLs”) can help to unravel genetic contributions to complex traits. We identified genetic determinants of human liver gene expression variation using two independent collections of primary tissue profiled with Agilent (n = 206) and Illumina (n = 60) expression arrays and Illumina SNP genotyping (550K), and we also incorporated data from a published study (n = 266). We found that ∼30% of SNP-expression correlations in one study failed to replicate in either of the others, even at thresholds yielding high reproducibility in simulations, and we quantified numerous factors affecting reproducibility. Our data suggest that drug exposure, clinical descriptors, and unknown factors associated with tissue ascertainment and analysis have substantial effects on gene expression and that controlling for hidden confounding variables significantly increases replication rate. Furthermore, we found that reproducible eQTL SNPs were heavily enriched near gene starts and ends, and subsequently resequenced the promoters and 3′UTRs for 14 genes and tested the identified haplotypes using luciferase assays. For three genes, significant haplotype-specific in vitro functional differences correlated directly with expression levels, suggesting that many bona fide eQTLs result from functional variants that can be mechanistically isolated in a high-throughput fashion. Finally, given our study design, we were able to discover and validate hundreds of liver eQTLs. Many of these relate directly to complex traits for which liver-specific analyses are likely to be relevant, and we identified dozens of potential connections with disease-associated loci. These included previously characterized eQTL contributors to diabetes, drug response, and lipid levels, and they suggest novel candidates such as a role for NOD2 expression in leprosy risk and C2orf43 in prostate cancer. In general, the work presented here will be valuable for future efforts to precisely identify and functionally characterize genetic contributions to a variety of complex traits

    Concordant Gene Expression in Leukemia Cells and Normal Leukocytes Is Associated with Germline cis-SNPs

    Get PDF
    The degree to which gene expression covaries between different primary tissues within an individual is not well defined. We hypothesized that expression that is concordant across tissues is more likely influenced by genetic variability than gene expression which is discordant between tissues. We quantified expression of 11,873 genes in paired samples of primary leukemia cells and normal leukocytes from 92 patients with acute lymphoblastic leukemia (ALL). Genetic variation at >500,000 single nucleotide polymorphisms (SNPs) was also assessed. The expression of only 176/11,783 (1.5%) genes was correlated (p<0.008, FDR = 25%) in the two tissue types, but expression of a high proportion (20 of these 176 genes) was significantly related to cis-SNP genotypes (adjusted p<0.05). In an independent set of 134 patients with ALL, 14 of these 20 genes were validated as having expression related to cis-SNPs, as were 9 of 20 genes in a second validation set of HapMap cell lines. Genes whose expression was concordant among tissue types were more likely to be associated with germline cis-SNPs than genes with discordant expression in these tissues; genes affected were involved in housekeeping functions (GSTM2, GAPDH and NCOR1) and purine metabolism

    Post-supereruption recovery at Toba Caldera

    Get PDF
    Large calderas, or supervolcanoes, are sites of the most catastrophic and hazardous events on Earth, yet the temporal details of post-supereruption activity, or resurgence, remain largely unknown, limiting our ability to understand how supervolcanoes work and address their hazards. Toba Caldera, Indonesia, caused the greatest volcanic catastrophe of the last 100 kyr, climactically erupting ~74 ka. Since the supereruption, Toba has been in a state of resurgence but its magmatic and uplift history has remained unclear. Here we reveal that new 14 C, zircon U-Th crystallization and (U-Th)/He ages show resurgence commenced at 69.7±4.5 ka and continued until at least ~2.7 ka, progressing westward across the caldera, as reflected by post-caldera effusive lava eruptions and uplifted lake sediment. The major stratovolcano north of Toba, Sinabung, shows strong geochemical kinship with Toba, and zircons from recent eruption products suggest Toba's climactic magma reservoir extends beneath Sinabung and is being tapped during eruptions

    Elderly Japanese women with cervical carcinoma show higher proportions of both intermediate-risk human papillomavirus types and p53 mutations

    Get PDF
    The p53 mutation has been found only in 0–6% of cervical carcinomas. In light of recent studies demonstrating that mutation of p53 gene was found in over 20% of the patients with vulvar carcinoma a disease of elderly women and a known human papillomavirus (HPV)-related malignancy, we analysed mutation of the p53 gene in 46 women with cervical carcinomas at the age of 60 or more (mean; 71 years, range; 60–96 years). The presence of HPV and its type were analysed by polymerase chain reaction (PCR)-based assay using the consensus primers for L1 region. Mutation of the p53 gene was analysed by PCR-based single-strand conformation polymorphism and DNA sequencing technique. Point mutation of the p53 gene was detected in 5 out of 46 (11%) cervical carcinomas: 1 of 17 (6%) samples associated with high-risk HPVs (HPV 16 and HPV 18) and 4 of 27 samples (15%) with intermediate-risk HPVs (P = 0.36) whereas no mutation was found in 2 HPV negative cases. The mutated residues resided in the selective sequence known as a DNA-binding domain. The immunohistochemistry revealed the overexpression in cancer tissues positive for p53 mutation. All of the observed mutations of the p53 gene were transition type, suggesting that the mutation may be caused by endogenous mutagenesis. Although falling short of statistical significance reduces the strength of the conclusion, data presented here imply that p53 gene mutation, particularly along with intermediate-risk HPV types, may constitute one pathogenetic factor in cervical carcinoma affecting elderly women. © 1999 Cancer Research Campaig
    corecore