39 research outputs found
Linear, Deterministic, and Order-Invariant Initialization Methods for the K-Means Clustering Algorithm
Over the past five decades, k-means has become the clustering algorithm of
choice in many application domains primarily due to its simplicity, time/space
efficiency, and invariance to the ordering of the data points. Unfortunately,
the algorithm's sensitivity to the initial selection of the cluster centers
remains to be its most serious drawback. Numerous initialization methods have
been proposed to address this drawback. Many of these methods, however, have
time complexity superlinear in the number of data points, which makes them
impractical for large data sets. On the other hand, linear methods are often
random and/or sensitive to the ordering of the data points. These methods are
generally unreliable in that the quality of their results is unpredictable.
Therefore, it is common practice to perform multiple runs of such methods and
take the output of the run that produces the best results. Such a practice,
however, greatly increases the computational requirements of the otherwise
highly efficient k-means algorithm. In this chapter, we investigate the
empirical performance of six linear, deterministic (non-random), and
order-invariant k-means initialization methods on a large and diverse
collection of data sets from the UCI Machine Learning Repository. The results
demonstrate that two relatively unknown hierarchical initialization methods due
to Su and Dy outperform the remaining four methods with respect to two
objective effectiveness criteria. In addition, a recent method due to Erisoglu
et al. performs surprisingly poorly.Comment: 21 pages, 2 figures, 5 tables, Partitional Clustering Algorithms
(Springer, 2014). arXiv admin note: substantial text overlap with
arXiv:1304.7465, arXiv:1209.196
Expansion in CD39(+) CD4(+) Immunoregulatory T Cells and Rarity of Th17 Cells in HTLV-1 Infected Patients Is Associated with Neurological Complications
HTLV-1 infection is associated with several inflammatory disorders, including the neurodegenerative condition HTLV-1-associated myelopathy/tropical spastic paraparesis (HAM/TSP). It is unclear why a minority of infected subjects develops HAM/TSP. CD4(+) T cells are the main target of infection and play a pivotal role in regulating immunity to HTLV and are hypothesized to participate in the pathogenesis of HAM/TSP. the CD39 ectonucleotidase receptor is expressed on CD4(+) T cells and based on co-expression with CD25, marks T cells with distinct regulatory (CD39(+)CD25(+)) and effector (CD39(+)CD25(-)) function. Here, we investigated the expression of CD39 on CD4(+) T cells from a cohort of HAM/TSP patients, HTLV-1 asymptomatic carriers (AC), and matched uninfected controls. the frequency of CD39(+)CD4(+) T cells was increased in HTLV-1 infected patients, regardless of clinical status. More importantly, the proportion of the immunostimulatory CD39(+)CD25(-) CD4+ T-cell subset was significantly elevated in HAM/TSP patients as compared to AC and phenotypically had lower levels of the immunoinhibitory receptor, PD-1. We saw no difference in the frequency of CD39(+)CD25(+) regulatory (Treg) cells between AC and HAM/TSP patients. However, these cells transition from being anergic to displaying a polyfunctional cytokine response following HTLV-1 infection. CD39(-)CD25(+) T cell subsets predominantly secreted the inflammatory cytokine IL-17. We found that HAM/TSP patients had significantly fewer numbers of IL-17 secreting CD4(+) T cells compared to uninfected controls. Taken together, we show that the expression of CD39 is upregulated on CD4(+) T cells HAM/TSP patients. This upregulation may play a role in the development of the proinflammatory milieu through pathways both distinct and separate among the different CD39 T cell subsets. CD39 upregulation may therefore serve as a surrogate diagnostic marker of progression and could potentially be a target for interventions to reduce the development of HAM/TSP.National Institute of Allergies and Infectious DiseasesNational Institutes of HealthUniversity of CaliforniaSan Francisco-Gladstone Institute of Virology & Immunology Center for AIDS ResearchFundação de Amparo à Pesquisa do Estado de São Paulo (FAPESP)John E. Fogarty International CenterNational Center for Research ResourcesNational Institute of General Medical Sciences from the National Institutes of HealthUniv Calif San Francisco, Dept Med, Div Expt Med, San Francisco, CA 94143 USAUniv Hawaii, John A Burns Sch Med, Dept Trop Med, Hawaii Ctr AIDS, Honolulu, HI 96822 USAUniv São Paulo, Sch Med, Deparment Infect Dis, São Paulo, BrazilUniv São Paulo, Sch Med, Div Clin Immunol & Allergy, São Paulo, BrazilFuncacao Prosangue, Hemoctr São Paulo, Mol Biol Lab, São Paulo, BrazilUniversidade Federal de São Paulo, Dept Translat Med, São Paulo, BrazilUniversidade Federal de São Paulo, Dept Translat Med, São Paulo, BrazilSan Francisco-Gladstone Institute of Virology & Immunology Center for AIDS Research: P30 AI027763FAPESP: 04/15856-9/KallasFAPESP: 2010/05845-0/KallasFAPESP: 11/12297-2/SanabaniJohn E. Fogarty International Center: D43 TW00003National Center for Research Resources: 5P20RR016467-11National Institute of General Medical Sciences from the National Institutes of Health: 8P20GM103466-11Web of Scienc
The Excess Mass Approach and the Analysis of Multi-Modality
Summary: The excess mass approach is a general approach to statistical analysis. It can be used to formulate a probabilistic model for clustering and can be applied to the analysis of multi-modality. Intuitively, a mode is present where an excess of probability mass is concentrated. This intuitive idea can be formalized directly by means of the excess mass functional. There is no need for intervening steps like initial density estimation. The excess mass measures the local difference of a given distribution to a reference model, usually the uniform distribution. The excess mass defines a functional which can be estimated efficiently from the data and can be used to test for multi-modality. 1. The problem of multi-modality We want to find the number of modes of a distribution in R k, based on a sample of n independent observations. There are many approaches to this problem. Any approach has to face an inherent difficulty of the modality-problem: the functional which associates the number of modes to a distribution is only semi-continuous. In any neighbourhood (with respect to the testing topology) of a give