198 research outputs found

    Simultaneous clustering of gene expression data with clinical chemistry and pathological evaluations reveals phenotypic prototypes

    Get PDF
    BACKGROUND: Commonly employed clustering methods for analysis of gene expression data do not directly incorporate phenotypic data about the samples. Furthermore, clustering of samples with known phenotypes is typically performed in an informal fashion. The inability of clustering algorithms to incorporate biological data in the grouping process can limit proper interpretation of the data and its underlying biology. RESULTS: We present a more formal approach, the modk-prototypes algorithm, for clustering biological samples based on simultaneously considering microarray gene expression data and classes of known phenotypic variables such as clinical chemistry evaluations and histopathologic observations. The strategy involves constructing an objective function with the sum of the squared Euclidean distances for numeric microarray and clinical chemistry data and simple matching for histopathology categorical values in order to measure dissimilarity of the samples. Separate weighting terms are used for microarray, clinical chemistry and histopathology measurements to control the influence of each data domain on the clustering of the samples. The dynamic validity index for numeric data was modified with a category utility measure for determining the number of clusters in the data sets. A cluster's prototype, formed from the mean of the values for numeric features and the mode of the categorical values of all the samples in the group, is representative of the phenotype of the cluster members. The approach is shown to work well with a simulated mixed data set and two real data examples containing numeric and categorical data types. One from a heart disease study and another from acetaminophen (an analgesic) exposure in rat liver that causes centrilobular necrosis. CONCLUSION: The modk-prototypes algorithm partitioned the simulated data into clusters with samples in their respective class group and the heart disease samples into two groups (sick and buff denoting samples having pain type representative of angina and non-angina respectively) with an accuracy of 79%. This is on par with, or better than, the assignment accuracy of the heart disease samples by several well-known and successful clustering algorithms. Following modk-prototypes clustering of the acetaminophen-exposed samples, informative genes from the cluster prototypes were identified that are descriptive of, and phenotypically anchored to, levels of necrosis of the centrilobular region of the rat liver. The biological processes cell growth and/or maintenance, amine metabolism, and stress response were shown to discern between no and moderate levels of acetaminophen-induced centrilobular necrosis. The use of well-known and traditional measurements directly in the clustering provides some guarantee that the resulting clusters will be meaningfully interpretable

    Linking Ligand-Induced Alterations in Androgen Receptor Structure to Differential Gene Expression: A First Step in the Rational Design of Selective Androgen Receptor Modulators

    Get PDF
    We have previously identified a family of novel androgen receptor (AR) ligands that, upon binding, enable AR to adopt structures distinct from that observed in the presence of canonical agonists. In this report, we describe the use of these compounds to establish a relationship between AR structure and biological activity with a view to defining a rational approach with which to identify useful selective AR modulators. To this end, we used combinatorial peptide phage display coupled with molecular dynamic structure analysis to identify the surfaces on AR that are exposed specifically in the presence of selected AR ligands. Subsequently, we used a DNA microarray analysis to demonstrate that differently conformed receptors facilitate distinct patterns of gene expression in LNCaP cells. Interestingly, we observed a complete overlap in the identity of genes expressed after treatment with mechanistically distinct AR ligands. However, it was differences in the kinetics of gene regulation that distinguished these compounds. Follow-up studies, in cell-based assays of AR action, confirmed the importance of these alterations in gene expression. Together, these studies demonstrate an important link between AR structure, gene expression, and biological outcome. This relationship provides a firm underpinning for mechanism-based screens aimed at identifying SARMs with useful clinical profiles

    An R 2 statistic for fixed effects in the linear mixed model

    Get PDF
    Statisticians most often use the linear mixed model to analyze Gaussian longitudinal data. The value and familiarity of the R2 statistic in the linear univariate model naturally creates great interest in extending it to the linear mixed model. We define and describe how to compute a model R2 statistic for the linear mixed model by using only a single model. The proposed R2 statistic measures multivariate association between the repeated outcomes and the fixed effects in the linear mixed model. The R2 statistic arises as a 1–1 function of an appropriate F statistic for testing all fixed effects (except typically the intercept) in a full model. The statistic compares the full model to a null model with all fixed effects deleted (except typically the intercept) while retaining exactly the same covariance structure. Furthermore, the R2 statistic leads immediately to a natural definition of a partial R2 statistic. A mixed model in which ethnicity gives a very small p-value as a longitudinal predictor of blood pressure compellingly illustrates the value of the statistic. In sharp contrast to the extreme p-value, a very small R2, a measure of statistical and scientific importance, indicates that ethnicity has an almost negligible association with the repeated blood pressure outcomes for the study

    SAS/GLM and SAS/MIXED for Trend Analyses Using Fourier and Polynomial Regression for Centered and Non-Centered Variates

    Full text link
    9 pages, 1 article*SAS/GLM and SAS/MIXED for Trend Analyses Using Fourier and Polynomial Regression for Centered and Non-Centered Variates* (Federer, Walter T.; Singh, Murari; Wolfinger, Russell D.) 9 page

    Development and Application of Bovine and Porcine Oligonucleotide Arrays with Protein-Based Annotation

    Get PDF
    The design of oligonucleotide sequences for the detection of gene expression in species with disparate volumes of genome and EST sequence information has been broadly studied. However, a congruous strategy has yet to emerge to allow the design of sensitive and specific gene expression detection probes. This study explores the use of a phylogenomic approach to align transcribed sequences to vertebrate protein sequences for the detection of gene families to design genomewide 70-mer oligonucleotide probe sequences for bovine and porcine. The bovine array contains 23,580 probes that target the transcripts of 16,341 genes, about 72% of the total number of bovine genes. The porcine array contains 19,980 probes targeting 15,204 genes, about 76% of the genes in the Ensembl annotation of the pig genome. An initial experiment using the bovine array demonstrates the specificity and sensitivity of the array

    Sources of variation in baseline gene expression levels from toxicogenomics study control animals across multiple laboratories

    Get PDF
    <p>Abstract</p> <p>Background</p> <p>The use of gene expression profiling in both clinical and laboratory settings would be enhanced by better characterization of variance due to individual, environmental, and technical factors. Meta-analysis of microarray data from untreated or vehicle-treated animals within the control arm of toxicogenomics studies could yield useful information on baseline fluctuations in gene expression, although control animal data has not been available on a scale and in a form best served for data-mining.</p> <p>Results</p> <p>A dataset of control animal microarray expression data was assembled by a working group of the Health and Environmental Sciences Institute's Technical Committee on the Application of Genomics in Mechanism Based Risk Assessment in order to provide a public resource for assessments of variability in baseline gene expression. Data from over 500 Affymetrix microarrays from control rat liver and kidney were collected from 16 different institutions. Thirty-five biological and technical factors were obtained for each animal, describing a wide range of study characteristics, and a subset were evaluated in detail for their contribution to total variability using multivariate statistical and graphical techniques.</p> <p>Conclusion</p> <p>The study factors that emerged as key sources of variability included gender, organ section, strain, and fasting state. These and other study factors were identified as key descriptors that should be included in the minimal information about a toxicogenomics study needed for interpretation of results by an independent source. Genes that are the most and least variable, gender-selective, or altered by fasting were also identified and functionally categorized. Better characterization of gene expression variability in control animals will aid in the design of toxicogenomics studies and in the interpretation of their results.</p

    Comparison of transcriptional responses in liver tissue and primary hepatocyte cell cultures after exposure to hexahydro-1, 3, 5-trinitro-1, 3, 5-triazine

    Get PDF
    BACKGROUND: Cell culture systems are useful in studying toxicological effects of chemicals such as Hexahydro-1,3,5-trinitro-1,3,5-triazine (RDX), however little is known as to how accurately isolated cells reflect responses of intact organs. In this work, we compare transcriptional responses in livers of Sprague-Dawley rats and primary hepatocyte cells after exposure to RDX to determine how faithfully the in vitro model system reflects in vivo responses. RESULTS: Expression patterns were found to be markedly different between liver tissue and primary cell cultures before exposure to RDX. Liver gene expression was enriched in processes important in toxicology such as metabolism of amino acids, lipids, aromatic compounds, and drugs when compared to cells. Transcriptional responses in cells exposed to 7.5, 15, or 30 mg/L RDX for 24 and 48 hours were different from those of livers isolated from rats 24 hours after exposure to 12, 24, or 48 mg/Kg RDX. Most of the differentially expressed genes identified across conditions and treatments could be attributed to differences between cells and tissue. Some similarity was observed in RDX effects on gene expression between tissue and cells, but also significant differences that appear to reflect the state of the cell or tissue examined. CONCLUSION: Liver tissue and primary cells express different suites of genes that suggest they have fundamental differences in their cell physiology. Expression effects related to RDX exposure in cells reflected a fraction of liver responses indicating that care must be taken in extrapolating from primary cells to whole animal organ toxicity effects

    An Integrated Approach for the Analysis of Biological Pathways using Mixed Models

    Get PDF
    Gene class, ontology, or pathway testing analysis has become increasingly popular in microarray data analysis. Such approaches allow the integration of gene annotation databases, such as Gene Ontology and KEGG Pathway, to formally test for subtle but coordinated changes at a system level. Higher power in gene class testing is gained by combining weak signals from a number of individual genes in each pathway. We propose an alternative approach for gene-class testing based on mixed models, a class of statistical models that
    corecore