13 research outputs found

    Simultaneous clustering of gene expression data with clinical chemistry and pathological evaluations reveals phenotypic prototypes

    Get PDF
    BACKGROUND: Commonly employed clustering methods for analysis of gene expression data do not directly incorporate phenotypic data about the samples. Furthermore, clustering of samples with known phenotypes is typically performed in an informal fashion. The inability of clustering algorithms to incorporate biological data in the grouping process can limit proper interpretation of the data and its underlying biology. RESULTS: We present a more formal approach, the modk-prototypes algorithm, for clustering biological samples based on simultaneously considering microarray gene expression data and classes of known phenotypic variables such as clinical chemistry evaluations and histopathologic observations. The strategy involves constructing an objective function with the sum of the squared Euclidean distances for numeric microarray and clinical chemistry data and simple matching for histopathology categorical values in order to measure dissimilarity of the samples. Separate weighting terms are used for microarray, clinical chemistry and histopathology measurements to control the influence of each data domain on the clustering of the samples. The dynamic validity index for numeric data was modified with a category utility measure for determining the number of clusters in the data sets. A cluster's prototype, formed from the mean of the values for numeric features and the mode of the categorical values of all the samples in the group, is representative of the phenotype of the cluster members. The approach is shown to work well with a simulated mixed data set and two real data examples containing numeric and categorical data types. One from a heart disease study and another from acetaminophen (an analgesic) exposure in rat liver that causes centrilobular necrosis. CONCLUSION: The modk-prototypes algorithm partitioned the simulated data into clusters with samples in their respective class group and the heart disease samples into two groups (sick and buff denoting samples having pain type representative of angina and non-angina respectively) with an accuracy of 79%. This is on par with, or better than, the assignment accuracy of the heart disease samples by several well-known and successful clustering algorithms. Following modk-prototypes clustering of the acetaminophen-exposed samples, informative genes from the cluster prototypes were identified that are descriptive of, and phenotypically anchored to, levels of necrosis of the centrilobular region of the rat liver. The biological processes cell growth and/or maintenance, amine metabolism, and stress response were shown to discern between no and moderate levels of acetaminophen-induced centrilobular necrosis. The use of well-known and traditional measurements directly in the clustering provides some guarantee that the resulting clusters will be meaningfully interpretable

    Analysing delays between time course gene expression data and biomarkers

    Get PDF
    Associating time course gene expression data to biomarkers can help to understand disease progression or response to therapy. However, detecting associations between these expression profiles is not a trivial task. Often expression changes occur not simultaneously but delayed in time and common used methods to detect correlation will fail to identify these associations. We have developed an efficient approach, DynOmics, based on Fast Fourier Transform to identify coordinated response dynamics between time course 'omics' experiments and specific biomarkers of interest while taking time shift into account. We applied DynOmics to a rat study investigating molecular response dynamics to different dosages of acetaminophen ('paracetamol'). We show how DynOmics can extract relevant molecule expression profiles that enables a better understanding of the molecular pathways related to acetaminophen toxic dosage and renal damage

    Determining the number of clusters in CROKI2 algorithm

    Get PDF
    One of the major problems in clustering is the needof specifying the optimal number of clusters in someclustering algorithms. Some block clusteringalgorithms suffer from the same limitation that thenumber of clusters needs to be specified by a humanuser. This problem has been subject of wide research.Numerous indices were proposed in order to findreasonable number of clusters. In this paper, we aim toextend the use of these indices to block clusteringalgorithms. Therefore, an exami nation of some indicesfor determining the number of clusters in CROKI2algorithm is conducted on both real data extractedfrom Metz web site and synthetic data sets beinggenerated according to a methodology that will beexplained later. The purpose of the paper is to test theperformance and ability of some indices to detect theproper number of clusters on rows and columns and tocompare our new index with some other indexes

    A comparison of data mining approaches in the categorization of oral anticoagulation patients

    Get PDF
    Oral anticoagulation therapy, largely performed bywarfarin-based drugs, is commonly used for patientswith a high risk of blood clotting which can lead to stroke or thrombosis. The state of the patient, with respect to anticoagulation, is captured by the index INR, which is to be kept within a therapeutic range. The patients\u2019 response is marked by high interindividual and inter-temporal variability, which can lead to serious adverse events. Polymorphisms of two genes CYP2C9 and VKORC1, considered markers of lower dosage requirements, still account for a relatively minor part of this variability. In this work, authors show that classification methods can identify groups of patients homogeneous with respect to the dynamics of INR. In particular, authors use classification methods in order to characterize patients according to their warfarin metabolism and hence their sensitivity to different doses. Finally a Markov model to capture the dynamics of the patient\u2019sresponse over the years is propose

    A sparse PLS for variable selection when integrating omics data

    Get PDF
    Recent biotechnology advances allow for multiple types of omics data, such as transcriptomic, proteomic or metabolomic data sets to be integrated. The problem of feature selection has been addressed several times in the context of classification, but needs to be handled in a specific manner when integrating data. In this study, we focus on the integration of two-block data that are measured on the same samples. Our goal is to combine integration and simultaneous variable selection of the two data sets in a one-step procedure using a Partial Least Squares regression (PLS) variant to facilitate the biologists' interpretation. A novel computational methodology called "sparse PLS" is introduced for a predictive analysis to deal with these newly arisen problems. The sparsity of our approach is achieved with a Lasso penalization of the PLS loading vectors when computing the Singular Value Decomposition. Sparse PLS is shown to be effective and biologically meaningful. Comparisons with classical PLS are performed on a simulated data set and on real data sets. On one data set, a thorough biological interpretation of the obtained results is provided. We show that sparse PLS provides a valuable variable selection tool for highly dimensional data sets. Copyright Β©2008 The Berkeley Electronic Press. All rights reserved

    On the Use of Correlation and MI as a Measure of Metaboliteβ€”Metabolite Association for Network Differential Connectivity Analysis

    Get PDF
    Metabolite differential connectivity analysis has been successful in investigating potentialmolecular mechanisms underlying different conditions in biological systems. Correlation and MutualInformation (MI) are two of the most common measures to quantify association and for buildingmetaboliteβ€”metabolite association networks and to calculate differential connectivity. In this study,we investigated the performance of correlation and MI to identify significantly differentially connected metabolites. These association measures were compared on (i) 23 publicly available metabolomic data sets and 7 data sets from other fields, (ii) simulated data with known correlation structures,and (iii) data generated using a dynamic metabolic model to simulate real-life observed metabolite concentration profiles. In all cases, we found more differentially connected metabolites when using correlation indices as a measure for association than MI.We also observed that different MI estimation algorithms resulted in difference in performance when applied to data generated using a dynamic model. We concluded that there is no significant benefit in using MI as a replacement for standard Pearson’s or Spearman’s correlation when the application is to quantify and detect differentially connected metabolites

    Toxicogenomic Biomarkers for Liver Toxicity

    Get PDF
    Toxicogenomics (TGx) is a widely used technique in the preclinical stage of drug development to investigate the molecular mechanisms of toxicity. A number of candidate TGx biomarkers have now been identified and are utilized for both assessing and predicting toxicities. Further accumulation of novel TGx biomarkers will lead to more efficient, appropriate and cost effective drug risk assessment, reinforcing the paradigm of the conventional toxicology system with a more profound understanding of the molecular mechanisms of drug-induced toxicity. In this paper, we overview some practical strategies as well as obstacles for identifying and utilizing TGx biomarkers based on microarray analysis. Since clinical hepatotoxicity is one of the major causes of drug development attrition, the liver has been the best documented target organ for TGx studies to date, and we therefore focused on information from liver TGx studies. In this review, we summarize the current resources in the literature in regard to TGx studies of the liver, from which toxicologists could extract potential TGx biomarker gene sets for better hepatotoxicity risk assessment
    corecore