7,228 research outputs found

    Beyond Hartigan Consistency: Merge Distortion Metric for Hierarchical Clustering

    Get PDF
    Hierarchical clustering is a popular method for analyzing data which associates a tree to a dataset. Hartigan consistency has been used extensively as a framework to analyze such clustering algorithms from a statistical point of view. Still, as we show in the paper, a tree which is Hartigan consistent with a given density can look very different than the correct limit tree. Specifically, Hartigan consistency permits two types of undesirable configurations which we term over-segmentation and improper nesting. Moreover, Hartigan consistency is a limit property and does not directly quantify difference between trees. In this paper we identify two limit properties, separation and minimality, which address both over-segmentation and improper nesting and together imply (but are not implied by) Hartigan consistency. We proceed to introduce a merge distortion metric between hierarchical clusterings and show that convergence in our distance implies both separation and minimality. We also prove that uniform separation and minimality imply convergence in the merge distortion metric. Furthermore, we show that our merge distortion metric is stable under perturbations of the density. Finally, we demonstrate applicability of these concepts by proving convergence results for two clustering algorithms. First, we show convergence (and hence separation and minimality) of the recent robust single linkage algorithm of Chaudhuri and Dasgupta (2010). Second, we provide convergence results on manifolds for topological split tree clustering

    kk-MLE: A fast algorithm for learning statistical mixture models

    Full text link
    We describe kk-MLE, a fast and efficient local search algorithm for learning finite statistical mixtures of exponential families such as Gaussian mixture models. Mixture models are traditionally learned using the expectation-maximization (EM) soft clustering technique that monotonically increases the incomplete (expected complete) likelihood. Given prescribed mixture weights, the hard clustering kk-MLE algorithm iteratively assigns data to the most likely weighted component and update the component models using Maximum Likelihood Estimators (MLEs). Using the duality between exponential families and Bregman divergences, we prove that the local convergence of the complete likelihood of kk-MLE follows directly from the convergence of a dual additively weighted Bregman hard clustering. The inner loop of kk-MLE can be implemented using any kk-means heuristic like the celebrated Lloyd's batched or Hartigan's greedy swap updates. We then show how to update the mixture weights by minimizing a cross-entropy criterion that implies to update weights by taking the relative proportion of cluster points, and reiterate the mixture parameter update and mixture weight update processes until convergence. Hard EM is interpreted as a special case of kk-MLE when both the component update and the weight update are performed successively in the inner loop. To initialize kk-MLE, we propose kk-MLE++, a careful initialization of kk-MLE guaranteeing probabilistically a global bound on the best possible complete likelihood.Comment: 31 pages, Extend preliminary paper presented at IEEE ICASSP 201

    We are Americans, too: Interracial Relations in Detroit\u27s Postwar Auto Industry

    Full text link
    This analysis looks at the interracial relations and conflicts within the postwar Detroit auto industry. In doing so, it examines the role the UAW, the government, the corporations, and the workers themselves played, and how race and/or gender contributed to interactive negotiations within the employment sector at the time

    Spartan Daily, May 15, 1995

    Get PDF
    Volume 104, Issue 69https://scholarworks.sjsu.edu/spartandaily/8713/thumbnail.jp

    Single cell measurement of telomerase expression and splicing using microfluidic emulsion cultures.

    Get PDF
    Telomerase is a reverse transcriptase that maintains telomeres on the ends of chromosomes, allowing rapidly dividing cells to proliferate while avoiding senescence and apoptosis. Understanding telomerase gene expression and splicing at the single cell level could yield insights into the roles of telomerase during normal cell growth as well as cancer development. Here we use droplet-based single cell culture followed by single cell or colony transcript abundance analysis to investigate the relationship between cell growth and transcript abundance of the telomerase genes encoding the RNA component (hTR) and protein component (hTERT) as well as hTERT splicing. Jurkat and K562 cells were examined under normal cell culture conditions and during exposure to curcumin, a natural compound with anti-carcinogenic and telomerase activity-reducing properties. Individual cells predominantly express single hTERT splice variants, with the α+/β- variant exhibiting significant transcript abundance bimodality that is sustained through cell division. Sub-lethal curcumin exposure results in reduced bimodality of all hTERT splice variants and significant upregulation of alpha splicing, suggesting a possible role in cellular stress response. The single cell culture and transcript abundance analysis method presented here provides the tools necessary for multiparameter single cell analysis which will be critical for understanding phenotypes of heterogeneous cell populations, disease cell populations and their drug response
    corecore