7,228 research outputs found
Beyond Hartigan Consistency: Merge Distortion Metric for Hierarchical Clustering
Hierarchical clustering is a popular method for analyzing data which
associates a tree to a dataset. Hartigan consistency has been used extensively
as a framework to analyze such clustering algorithms from a statistical point
of view. Still, as we show in the paper, a tree which is Hartigan consistent
with a given density can look very different than the correct limit tree.
Specifically, Hartigan consistency permits two types of undesirable
configurations which we term over-segmentation and improper nesting. Moreover,
Hartigan consistency is a limit property and does not directly quantify
difference between trees.
In this paper we identify two limit properties, separation and minimality,
which address both over-segmentation and improper nesting and together imply
(but are not implied by) Hartigan consistency. We proceed to introduce a merge
distortion metric between hierarchical clusterings and show that convergence in
our distance implies both separation and minimality. We also prove that uniform
separation and minimality imply convergence in the merge distortion metric.
Furthermore, we show that our merge distortion metric is stable under
perturbations of the density.
Finally, we demonstrate applicability of these concepts by proving
convergence results for two clustering algorithms. First, we show convergence
(and hence separation and minimality) of the recent robust single linkage
algorithm of Chaudhuri and Dasgupta (2010). Second, we provide convergence
results on manifolds for topological split tree clustering
-MLE: A fast algorithm for learning statistical mixture models
We describe -MLE, a fast and efficient local search algorithm for learning
finite statistical mixtures of exponential families such as Gaussian mixture
models. Mixture models are traditionally learned using the
expectation-maximization (EM) soft clustering technique that monotonically
increases the incomplete (expected complete) likelihood. Given prescribed
mixture weights, the hard clustering -MLE algorithm iteratively assigns data
to the most likely weighted component and update the component models using
Maximum Likelihood Estimators (MLEs). Using the duality between exponential
families and Bregman divergences, we prove that the local convergence of the
complete likelihood of -MLE follows directly from the convergence of a dual
additively weighted Bregman hard clustering. The inner loop of -MLE can be
implemented using any -means heuristic like the celebrated Lloyd's batched
or Hartigan's greedy swap updates. We then show how to update the mixture
weights by minimizing a cross-entropy criterion that implies to update weights
by taking the relative proportion of cluster points, and reiterate the mixture
parameter update and mixture weight update processes until convergence. Hard EM
is interpreted as a special case of -MLE when both the component update and
the weight update are performed successively in the inner loop. To initialize
-MLE, we propose -MLE++, a careful initialization of -MLE guaranteeing
probabilistically a global bound on the best possible complete likelihood.Comment: 31 pages, Extend preliminary paper presented at IEEE ICASSP 201
We are Americans, too: Interracial Relations in Detroit\u27s Postwar Auto Industry
This analysis looks at the interracial relations and conflicts within the postwar Detroit auto industry. In doing so, it examines the role the UAW, the government, the corporations, and the workers themselves played, and how race and/or gender contributed to interactive negotiations within the employment sector at the time
Spartan Daily, May 15, 1995
Volume 104, Issue 69https://scholarworks.sjsu.edu/spartandaily/8713/thumbnail.jp
Single cell measurement of telomerase expression and splicing using microfluidic emulsion cultures.
Telomerase is a reverse transcriptase that maintains telomeres on the ends of chromosomes, allowing rapidly dividing cells to proliferate while avoiding senescence and apoptosis. Understanding telomerase gene expression and splicing at the single cell level could yield insights into the roles of telomerase during normal cell growth as well as cancer development. Here we use droplet-based single cell culture followed by single cell or colony transcript abundance analysis to investigate the relationship between cell growth and transcript abundance of the telomerase genes encoding the RNA component (hTR) and protein component (hTERT) as well as hTERT splicing. Jurkat and K562 cells were examined under normal cell culture conditions and during exposure to curcumin, a natural compound with anti-carcinogenic and telomerase activity-reducing properties. Individual cells predominantly express single hTERT splice variants, with the α+/β- variant exhibiting significant transcript abundance bimodality that is sustained through cell division. Sub-lethal curcumin exposure results in reduced bimodality of all hTERT splice variants and significant upregulation of alpha splicing, suggesting a possible role in cellular stress response. The single cell culture and transcript abundance analysis method presented here provides the tools necessary for multiparameter single cell analysis which will be critical for understanding phenotypes of heterogeneous cell populations, disease cell populations and their drug response
- …