240 research outputs found

    Algorithms for hierarchical clustering of gene expression data

    Get PDF
    Genes are parts of the genome which encode for proteins in an organism. Proteins play an important part in many biologicl processes in any organism. Measuring expression level of a gene helps biologists estimate the amount of protein produced by that gene. Microarrays can be used to measure the expression levels of thousands of genes in a single experiment. Using additional techniques such as clustering various correlations among genes of interest can be found. The most commonly used clustering technique for microarray data analysis is hierarchical clustering. Various metrics such ad Euclidean, Manhattan, Pearson correlation coefficient have been used to measure (dis)similarity between genes. A commonly used software for hierarchical clustering based on Pearson correlation coefficient takes O(N[Arrow pointing up]3) for clustering N genes, even though there are algorithms which can reduce the runtime to O(N[Arrow pointing up]2). In this thesis, we show how the runtime can be reduced to O(N log N) by using a geometric interpretation of the Pearson correlation coeffcient and show that it is optimal

    FGF-trapping hampers cancer stem-like cells in uveal melanoma

    Get PDF
    Background: Cancer stem-like cells (CSCs) are a subpopulation of tumor cells responsible for tumor initiation, metastasis, chemoresistance, and relapse. Recently, CSCs have been identified in Uveal Melanoma (UM), which represents the most common primary tumor of the eye. UM is highly resistant to systemic chemotherapy and effective therapies aimed at improving overall survival of patients are eagerly required. Methods: Herein, taking advantage from a pan Fibroblast Growth Factor (FGF)-trap molecule, we singled out and analyzed a UM-CSC subset with marked stem-like properties. A hierarchical clustering of gene expression data publicly available on The Cancer Genome Atlas (TCGA) was performed to identify patients' clusters. Results: By disrupting the FGF/FGF receptor (FGFR)-mediated signaling, we unmasked an FGF-sensitive UM population characterized by increased expression of numerous stemness-related transcription factors, enhanced aldehyde dehydrogenase (ALDH) activity, and tumor-sphere formation capacity. Moreover, FGF inhibition deeply affected UM-CSC survival in vivo in a chorioallantoic membrane (CAM) tumor graft assay, resulting in the reduction of tumor growth. At clinical level, hierarchical clustering of TCGA gene expression data revealed a strong correlation between FGFs/FGFRs and stemness-related genes, allowing the identification of three distinct clusters characterized by different clinical outcomes. Conclusions: Our findings support the evidence that the FGF/FGFR axis represents a master regulator of cancer stemness in primary UM tumors and point to anti-FGF treatments as a novel therapeutic strategy to hit the CSC component in UM

    Phasing of muscle gene expression with fasting-induced recovery growth in Atlantic salmon

    Get PDF
    Background: Many fish species experience long periods of fasting in nature often associated with seasonal reductions in water temperature and prey availability or spawning migrations. During periods of nutrient restriction, changes in metabolism occur to provide cellular energy via catabolic processes. Muscle is particularly affected by prolonged fasting as myofibrillar proteins act as a major energy source. To investigate the mechanisms of metabolic reorganisation with fasting and refeeding in a saltwater stage of Atlantic salmon (Salmo salar L.) we analysed the expression of genes involved in myogenesis, growth signalling, lipid biosynthesis and myofibrillar protein degradation and synthesis pathways using qPCR. Results: Hierarchical clustering of gene expression data revealed three clusters. The first cluster comprised genes involved in lipid metabolism and triacylglycerol synthesis (ALDOB, DGAT1 and LPL) which had peak expression 3-14d after refeeding. The second cluster comprised ADIPOQ, MLC2, IGF-I and TALDO1, with peak expression 14-32d after refeeding. Cluster III contained genes strongly down regulated as an initial response to feeding and included the ubiquitin ligases MuRF1 and MAFbx, myogenic regulatory factors and some metabolic genes. Conclusion: Early responses to refeeding in fasted salmon included the synthesis of triacylglycerols and activation of the adipogenic differentiation program. Inhibition of MuRF1 and MAFbx respectively may result in decreased degradation and concomitant increased production of myofibrillar proteins. Both of these processes preceded any increase in expression of myogenic regulatory factors and IGF-I. These responses could be a necessary strategy for an animal adapted to long periods of food deprivation whereby energy reserves are replenished prior to the resumption of myogenesis.Publisher PDFPeer reviewe

    Understanding Clusters in Multidimensional Spaces: Making Meaning by Combining Insights from Coordinated Views of Domain Knowledge (2004)

    Get PDF
    Cluster analysis of multidimensional data is widely used in many research areas including financial, economical, sociological, and biological analyses. Finding natural subclasses in a data set not only reveals interesting patterns but also serves as a basis for further analyses. One of the troubles with cluster analysis is that evaluating how interesting a clustering result is to researchers is subjective, application-dependent, and even difficult to measure. This problem generally gets worse as dimensionality and the number of items grows. The remedy is to enable researchers to apply domain knowledge to facilitate insight about the significance of the clustering result. This article presents a way to better understand a clustering result by combining insights from two interactively coordinated visual displays of domain knowledge. The first is a parallel coordinates view powered by a direct-manipulation search. The second is a domain knowledge view containing a well-understood and meaningful tabular or hierarchical information for the same data set. Our examples depend on hierarchical clustering of gene expression data, coordinated with a parallel coordinates view and with the gene annotation and gene ontology

    Bayesian hierarchical clustering for studying cancer gene expression data with unknown statistics

    Get PDF
    Clustering analysis is an important tool in studying gene expression data. The Bayesian hierarchical clustering (BHC) algorithm can automatically infer the number of clusters and uses Bayesian model selection to improve clustering quality. In this paper, we present an extension of the BHC algorithm. Our Gaussian BHC (GBHC) algorithm represents data as a mixture of Gaussian distributions. It uses normal-gamma distribution as a conjugate prior on the mean and precision of each of the Gaussian components. We tested GBHC over 11 cancer and 3 synthetic datasets. The results on cancer datasets show that in sample clustering, GBHC on average produces a clustering partition that is more concordant with the ground truth than those obtained from other commonly used algorithms. Furthermore, GBHC frequently infers the number of clusters that is often close to the ground truth. In gene clustering, GBHC also produces a clustering partition that is more biologically plausible than several other state-of-the-art methods. This suggests GBHC as an alternative tool for studying gene expression data. The implementation of GBHC is available at https://sites. google.com/site/gaussianbhc

    HiPart: Hierarchical Divisive Clustering Toolbox

    Full text link
    This paper presents the HiPart package, an open-source native python library that provides efficient and interpret-able implementations of divisive hierarchical clustering algorithms. HiPart supports interactive visualizations for the manipulation of the execution steps allowing the direct intervention of the clustering outcome. This package is highly suited for Big Data applications as the focus has been given to the computational efficiency of the implemented clustering methodologies. The dependencies used are either Python build-in packages or highly maintained stable external packages. The software is provided under the MIT license. The package's source code and documentation can be found at https://github.com/panagiotisanagnostou/HiPart

    Comparison of Gene Expression and Genome-Wide DNA Methylation Profiling between Phenotypically Normal Cloned Pigs and Conventionally Bred Controls

    Get PDF
    Animal breeding via Somatic Cell Nuclear Transfer (SCNT) has enormous potential in agriculture and biomedicine. However, concerns about whether SCNT animals are as healthy or epigenetically normal as conventionally bred ones are raised as the efficiency of cloning by SCNT is much lower than natural breeding or In-vitro fertilization (IVF). Thus, we have conducted a genome-wide gene expression and DNA methylation profiling between phenotypically normal cloned pigs and control pigs in two tissues (muscle and liver), using Affymetrix Porcine expression array as well as modified methylation-specific digital karyotyping (MMSDK) and Solexa sequencing technology. Typical tissue-specific differences with respect to both gene expression and DNA methylation were observed in muscle and liver from cloned as well as control pigs. Gene expression profiles were highly similar between cloned pigs and controls, though a small set of genes showed altered expression. Cloned pigs presented a more different pattern of DNA methylation in unique sequences in both tissues. Especially a small set of genomic sites had different DNA methylation status with a trend towards slightly increased methylation levels in cloned pigs. Molecular network analysis of the genes that contained such differential methylation loci revealed a significant network related to tissue development. In conclusion, our study showed that phenotypically normal cloned pigs were highly similar with normal breeding pigs in their gene expression, but moderate alteration in DNA methylation aspects still exists, especially in certain unique genomic regions

    Inflammation and oxidative stress transcription profiles due to in vitro supply of methionine with or without choline in unstimulated blood polymorphonuclear leukocytes from lactating Holstein cows.

    Get PDF
    Neutrophils are the most important polymorphonuclear leukocytes (PMNL), representing the front-line defense involved in pathogen clearance upon invasion. As such, they play a pivotal role in immune and inflammatory responses. Isolated PMNL from 5 mid-lactating Holstein dairy cows were used to evaluate the in vitro effect of methionine (Met) and choline (Chol) supplementation on mRNA expression of genes related to the Met cycle and innate immunity. The target genes are associated with the Met cycle, cell signaling, inflammation, antimicrobial and killing mechanisms, and pathogen recognition. Treatments were allocated in a 3 × 3 factorial arrangement, including 3 Lys-to-Met ratios (L:M, 3.6:1, 2.9:1, or 2.4:1) and 3 levels of supplemental Chol (0, 400, or 800 μg/mL). Three replicates per treatment group were incubated for 2 h at 37°C and 5% atmospheric CO2. Both betaine-homocysteine S-methyltransferase and choline dehydrogenase were undetectable, indicating that PMNL (at least in vitro) cannot generate Met from Chol through the betaine pathway. The PMNL incubated without Chol experienced a specific state of inflammatory mediation [greater interleukin-1β (IL1B), myeloperoxidase (MPO), IL10, and IL6] and oxidative stress [greater cysteine sulfinic acid decarboxylase (CSAD), cystathionine gamma-lyase (CTH), glutathione reductase (GSR), and glutathione synthase (GSS)]. However, data from the interaction L:M × Chol indicated that this negative state could be overcome by supplementing additional Met. This was reflected in the upregulation of methionine synthase (MTR) and toll-like receptor 2 (TLR2); that is, pathogen detection ability. At the lowest level of supplemental Chol, Met downregulated GSS, GSR, IL1B, and IL6, suggesting it could reduce cellular inflammation and enhance antioxidant status. At 400 µg/mL Chol, supplemental Met upregulated PMNL recognition capacity [higher TLR4 and L-selectin (SELL)]. Overall, enhancing the supply of methyl donors to isolated unstimulated PMNL from mid-lactating dairy cows leads to a low level of PMNL activation and upregulates a cytoprotective mechanism against oxidative stress. Enhancing the supply of Met coupled with adequate Chol levels enhances the gene expression of PMNL pathogen-recognition mechanism. These data suggest that Chol supply to PMNL exposed to low levels of Met effectively downregulated the entire repertoire of innate inflammatory-responsive genes. Thus, Met availability in PMNL during an inflammatory challenge may be sufficient for mounting an appropriate biologic response

    A new molecular breast cancer subclass defined from a large scale real-time quantitative RT-PCR study

    Get PDF
    BACKGROUND: Current histo-pathological prognostic factors are not very helpful in predicting the clinical outcome of breast cancer due to the disease's heterogeneity. Molecular profiling using a large panel of genes could help to classify breast tumours and to define signatures which are predictive of their clinical behaviour. METHODS: To this aim, quantitative RT-PCR amplification was used to study the RNA expression levels of 47 genes in 199 primary breast tumours and 6 normal breast tissues. Genes were selected on the basis of their potential implication in hormonal sensitivity of breast tumours. Normalized RT-PCR data were analysed in an unsupervised manner by pairwise hierarchical clustering, and the statistical relevance of the defined subclasses was assessed by Chi2 analysis. The robustness of the selected subgroups was evaluated by classifying an external and independent set of tumours using these Chi2-defined molecular signatures. RESULTS: Hierarchical clustering of gene expression data allowed us to define a series of tumour subgroups that were either reminiscent of previously reported classifications, or represented putative new subtypes. The Chi2 analysis of these subgroups allowed us to define specific molecular signatures for some of them whose reliability was further demonstrated by using the validation data set. A new breast cancer subclass, called subgroup 7, that we defined in that way, was particularly interesting as it gathered tumours with specific bioclinical features including a low rate of recurrence during a 5 year follow-up. CONCLUSION: The analysis of the expression of 47 genes in 199 primary breast tumours allowed classifying them into a series of molecular subgroups. The subgroup 7, which has been highlighted by our study, was remarkable as it gathered tumours with specific bioclinical features including a low rate of recurrence. Although this finding should be confirmed by using a larger tumour cohort, it suggests that gene expression profiling using a minimal set of genes may allow the discovery of new subclasses of breast cancer that are characterized by specific molecular signatures and exhibit specific bioclinical features

    Gene ordering in partitive clustering using microarray expressions

    Get PDF
    A central step in the analysis of gene expression data is the identification of groups of genes that exhibit similar expression patterns. Clustering and ordering the genes using gene expression data into homogeneous groups was shown to be useful in functional annotation, tissue classification, regulatory motif identification, and other applications. Although there is a rich literature on gene ordering in hierarchical clustering framework for gene expression analysis, there is no work addressing and evaluating the importance of gene ordering in partitive clustering framework, to the best knowledge of the authors. Outside the framework of hierarchical clustering, different gene ordering algorithms are applied on the whole data set, and the domain of partitive clustering is still unexplored with gene ordering approaches. A new hybrid method is proposed for ordering genes in each of the clusters obtained from partitive clustering solution, using microarry gene expressions. Two existing algorithms for optimally ordering cities in travelling salesman problem (TSP), namely, FRAG_GALK and Concorde, are hybridized individually with self organizing MAP to show the importance of gene ordering in partitive clustering framework. We validated our hybrid approach using yeast and fibroblast data and showed that our approach improves the result quality of partitive clustering solution, by identifying subclusters within big clusters, grouping functionally correlated genes within clusters, minimization of summation of gene expression distances, and the maximization of biological gene ordering using MIPS categorization. Moreover, the new hybrid approach, finds comparable or sometimes superior biological gene order in less computation time than those obtained by optimal leaf ordering in hierarchical clustering solution
    • …
    corecore