13 research outputs found

    Linear, Deterministic, and Order-Invariant Initialization Methods for the K-Means Clustering Algorithm

    Full text link
    Over the past five decades, k-means has become the clustering algorithm of choice in many application domains primarily due to its simplicity, time/space efficiency, and invariance to the ordering of the data points. Unfortunately, the algorithm's sensitivity to the initial selection of the cluster centers remains to be its most serious drawback. Numerous initialization methods have been proposed to address this drawback. Many of these methods, however, have time complexity superlinear in the number of data points, which makes them impractical for large data sets. On the other hand, linear methods are often random and/or sensitive to the ordering of the data points. These methods are generally unreliable in that the quality of their results is unpredictable. Therefore, it is common practice to perform multiple runs of such methods and take the output of the run that produces the best results. Such a practice, however, greatly increases the computational requirements of the otherwise highly efficient k-means algorithm. In this chapter, we investigate the empirical performance of six linear, deterministic (non-random), and order-invariant k-means initialization methods on a large and diverse collection of data sets from the UCI Machine Learning Repository. The results demonstrate that two relatively unknown hierarchical initialization methods due to Su and Dy outperform the remaining four methods with respect to two objective effectiveness criteria. In addition, a recent method due to Erisoglu et al. performs surprisingly poorly.Comment: 21 pages, 2 figures, 5 tables, Partitional Clustering Algorithms (Springer, 2014). arXiv admin note: substantial text overlap with arXiv:1304.7465, arXiv:1209.196

    Multidimensional group analysis

    No full text

    The Results of the “Positive Action for Today’s Health” (PATH) Trial for Increasing Walking and Physical Activity in Underserved African-American Communities

    No full text
    BACKGROUND: The “Positive Action for Today’s Health” (PATH) trial tested an environmental intervention to increase walking in underserved communities. METHODS: Three matched communities were randomized to a police-patrolled walking plus social marketing, a police-patrolled walking-only, or a no-walking intervention. The 24-month intervention addressed safety and access for physical activity (PA) and utilized social marketing to enhance environmental supports for PA. African-Americans (N=434; 62 % females; aged 51±16 years) provided accelerometry and psychosocial measures at baseline and 12, 18, and 24 months. Walking attendance and trail use were obtained over 24 months. RESULTS: There were no significant differences across communities over 24 months for moderate-to-vigorous PA. Walking attendance in the social marketing community showed an increase from 40 to 400 walkers per month at 9 months and sustained ~200 walkers per month through 24 months. No change in attendance was observed in the walking-only community. CONCLUSIONS: Findings support integrating social marketing strategies to increase walking in underserved African-Americans (ClinicalTrials.gov #NCT01025726)
    corecore