163 research outputs found

    Simultaneous clustering of gene expression data with clinical chemistry and pathological evaluations reveals phenotypic prototypes

    Get PDF
    BACKGROUND: Commonly employed clustering methods for analysis of gene expression data do not directly incorporate phenotypic data about the samples. Furthermore, clustering of samples with known phenotypes is typically performed in an informal fashion. The inability of clustering algorithms to incorporate biological data in the grouping process can limit proper interpretation of the data and its underlying biology. RESULTS: We present a more formal approach, the modk-prototypes algorithm, for clustering biological samples based on simultaneously considering microarray gene expression data and classes of known phenotypic variables such as clinical chemistry evaluations and histopathologic observations. The strategy involves constructing an objective function with the sum of the squared Euclidean distances for numeric microarray and clinical chemistry data and simple matching for histopathology categorical values in order to measure dissimilarity of the samples. Separate weighting terms are used for microarray, clinical chemistry and histopathology measurements to control the influence of each data domain on the clustering of the samples. The dynamic validity index for numeric data was modified with a category utility measure for determining the number of clusters in the data sets. A cluster's prototype, formed from the mean of the values for numeric features and the mode of the categorical values of all the samples in the group, is representative of the phenotype of the cluster members. The approach is shown to work well with a simulated mixed data set and two real data examples containing numeric and categorical data types. One from a heart disease study and another from acetaminophen (an analgesic) exposure in rat liver that causes centrilobular necrosis. CONCLUSION: The modk-prototypes algorithm partitioned the simulated data into clusters with samples in their respective class group and the heart disease samples into two groups (sick and buff denoting samples having pain type representative of angina and non-angina respectively) with an accuracy of 79%. This is on par with, or better than, the assignment accuracy of the heart disease samples by several well-known and successful clustering algorithms. Following modk-prototypes clustering of the acetaminophen-exposed samples, informative genes from the cluster prototypes were identified that are descriptive of, and phenotypically anchored to, levels of necrosis of the centrilobular region of the rat liver. The biological processes cell growth and/or maintenance, amine metabolism, and stress response were shown to discern between no and moderate levels of acetaminophen-induced centrilobular necrosis. The use of well-known and traditional measurements directly in the clustering provides some guarantee that the resulting clusters will be meaningfully interpretable

    An R 2 statistic for fixed effects in the linear mixed model

    Get PDF
    Statisticians most often use the linear mixed model to analyze Gaussian longitudinal data. The value and familiarity of the R2 statistic in the linear univariate model naturally creates great interest in extending it to the linear mixed model. We define and describe how to compute a model R2 statistic for the linear mixed model by using only a single model. The proposed R2 statistic measures multivariate association between the repeated outcomes and the fixed effects in the linear mixed model. The R2 statistic arises as a 1–1 function of an appropriate F statistic for testing all fixed effects (except typically the intercept) in a full model. The statistic compares the full model to a null model with all fixed effects deleted (except typically the intercept) while retaining exactly the same covariance structure. Furthermore, the R2 statistic leads immediately to a natural definition of a partial R2 statistic. A mixed model in which ethnicity gives a very small p-value as a longitudinal predictor of blood pressure compellingly illustrates the value of the statistic. In sharp contrast to the extreme p-value, a very small R2, a measure of statistical and scientific importance, indicates that ethnicity has an almost negligible association with the repeated blood pressure outcomes for the study

    Linking Ligand-Induced Alterations in Androgen Receptor Structure to Differential Gene Expression: A First Step in the Rational Design of Selective Androgen Receptor Modulators

    Get PDF
    We have previously identified a family of novel androgen receptor (AR) ligands that, upon binding, enable AR to adopt structures distinct from that observed in the presence of canonical agonists. In this report, we describe the use of these compounds to establish a relationship between AR structure and biological activity with a view to defining a rational approach with which to identify useful selective AR modulators. To this end, we used combinatorial peptide phage display coupled with molecular dynamic structure analysis to identify the surfaces on AR that are exposed specifically in the presence of selected AR ligands. Subsequently, we used a DNA microarray analysis to demonstrate that differently conformed receptors facilitate distinct patterns of gene expression in LNCaP cells. Interestingly, we observed a complete overlap in the identity of genes expressed after treatment with mechanistically distinct AR ligands. However, it was differences in the kinetics of gene regulation that distinguished these compounds. Follow-up studies, in cell-based assays of AR action, confirmed the importance of these alterations in gene expression. Together, these studies demonstrate an important link between AR structure, gene expression, and biological outcome. This relationship provides a firm underpinning for mechanism-based screens aimed at identifying SARMs with useful clinical profiles

    SAS/GLM and SAS/MIXED for Trend Analyses Using Fourier and Polynomial Regression for Centered and Non-Centered Variates

    Full text link
    9 pages, 1 article*SAS/GLM and SAS/MIXED for Trend Analyses Using Fourier and Polynomial Regression for Centered and Non-Centered Variates* (Federer, Walter T.; Singh, Murari; Wolfinger, Russell D.) 9 page

    Sources of variation in baseline gene expression levels from toxicogenomics study control animals across multiple laboratories

    Get PDF
    <p>Abstract</p> <p>Background</p> <p>The use of gene expression profiling in both clinical and laboratory settings would be enhanced by better characterization of variance due to individual, environmental, and technical factors. Meta-analysis of microarray data from untreated or vehicle-treated animals within the control arm of toxicogenomics studies could yield useful information on baseline fluctuations in gene expression, although control animal data has not been available on a scale and in a form best served for data-mining.</p> <p>Results</p> <p>A dataset of control animal microarray expression data was assembled by a working group of the Health and Environmental Sciences Institute's Technical Committee on the Application of Genomics in Mechanism Based Risk Assessment in order to provide a public resource for assessments of variability in baseline gene expression. Data from over 500 Affymetrix microarrays from control rat liver and kidney were collected from 16 different institutions. Thirty-five biological and technical factors were obtained for each animal, describing a wide range of study characteristics, and a subset were evaluated in detail for their contribution to total variability using multivariate statistical and graphical techniques.</p> <p>Conclusion</p> <p>The study factors that emerged as key sources of variability included gender, organ section, strain, and fasting state. These and other study factors were identified as key descriptors that should be included in the minimal information about a toxicogenomics study needed for interpretation of results by an independent source. Genes that are the most and least variable, gender-selective, or altered by fasting were also identified and functionally categorized. Better characterization of gene expression variability in control animals will aid in the design of toxicogenomics studies and in the interpretation of their results.</p

    Comparison of transcriptional responses in liver tissue and primary hepatocyte cell cultures after exposure to hexahydro-1, 3, 5-trinitro-1, 3, 5-triazine

    Get PDF
    BACKGROUND: Cell culture systems are useful in studying toxicological effects of chemicals such as Hexahydro-1,3,5-trinitro-1,3,5-triazine (RDX), however little is known as to how accurately isolated cells reflect responses of intact organs. In this work, we compare transcriptional responses in livers of Sprague-Dawley rats and primary hepatocyte cells after exposure to RDX to determine how faithfully the in vitro model system reflects in vivo responses. RESULTS: Expression patterns were found to be markedly different between liver tissue and primary cell cultures before exposure to RDX. Liver gene expression was enriched in processes important in toxicology such as metabolism of amino acids, lipids, aromatic compounds, and drugs when compared to cells. Transcriptional responses in cells exposed to 7.5, 15, or 30 mg/L RDX for 24 and 48 hours were different from those of livers isolated from rats 24 hours after exposure to 12, 24, or 48 mg/Kg RDX. Most of the differentially expressed genes identified across conditions and treatments could be attributed to differences between cells and tissue. Some similarity was observed in RDX effects on gene expression between tissue and cells, but also significant differences that appear to reflect the state of the cell or tissue examined. CONCLUSION: Liver tissue and primary cells express different suites of genes that suggest they have fundamental differences in their cell physiology. Expression effects related to RDX exposure in cells reflected a fraction of liver responses indicating that care must be taken in extrapolating from primary cells to whole animal organ toxicity effects

    The Reproducibility of Lists of Differentially Expressed Genes in Microarray Studies

    Get PDF
    Reproducibility is a fundamental requirement in scientific experiments and clinical contexts. Recent publications raise concerns about the reliability of microarray technology because of the apparent lack of agreement between lists of differentially expressed genes (DEGs). In this study we demonstrate that (1) such discordance may stem from ranking and selecting DEGs solely by statistical significance (P) derived from widely used simple t-tests; (2) when fold change (FC) is used as the ranking criterion, the lists become much more reproducible, especially when fewer genes are selected; and (3) the instability of short DEG lists based on P cutoffs is an expected mathematical consequence of the high variability of the t-values. We recommend the use of FC ranking plus a non-stringent P cutoff as a baseline practice in order to generate more reproducible DEG lists. The FC criterion enhances reproducibility while the P criterion balances sensitivity and specificity

    An Integrated Approach for the Analysis of Biological Pathways using Mixed Models

    Get PDF
    Gene class, ontology, or pathway testing analysis has become increasingly popular in microarray data analysis. Such approaches allow the integration of gene annotation databases, such as Gene Ontology and KEGG Pathway, to formally test for subtle but coordinated changes at a system level. Higher power in gene class testing is gained by combining weak signals from a number of individual genes in each pathway. We propose an alternative approach for gene-class testing based on mixed models, a class of statistical models that

    Maternal Influences on the Transmission of Leukocyte Gene Expression Profiles in Population Samples from Brisbane, Australia

    Get PDF
    Two gene expression profiling studies designed to identify maternal influences on development of the neonate immune system and to address the population structure of the leukocyte transcriptome were carried out in Brisbane, Australia. In the first study, a comparison of 19 leukocyte samples obtained from mothers in the last three weeks of pregnancy with 37 umbilical cord blood samples documented differential expression of 7,382 probes at a false discovery rate of 1%, representing approximately half of the expressed transcriptome. An even larger component of the variation involving 8,432 probes, notably enriched for Vitamin E and methotrexate-responsive genes, distinguished two sets of individuals, with perfect transmission of the two profile types between each of 16 mother-child pairs in the study. A minor profile of variation was found to distinguish the gene expression profiles of obese mothers and children of gestational diabetic mothers from those of children born to obese mothers. The second study was of adult leukocyte profiles from a cross-section of Red Cross blood donors sampled throughout Brisbane. The first two axes in this study are related to the third and fourth axes of variation in the first study and also reflect variation in the abundance of CD4 and CD8 transcripts. One of the profiles associated with the third axis is largely excluded from samples from the central portion of the city. Despite enrichment of insulin signaling and aspects of central metabolism among the differentially expressed genes, there was little correlation between leukocyte expression profiles and body mass index overall. Our data is consistent with the notion that maternal health and cytokine milieu directly impact gene expression in fetal tissues, but that there is likely to be a complex interplay between cultural, genetic, and other environmental factors in the programming of gene expression in leukocytes of newborn children
    • …
    corecore