51 research outputs found

    Linear, Deterministic, and Order-Invariant Initialization Methods for the K-Means Clustering Algorithm

    Full text link
    Over the past five decades, k-means has become the clustering algorithm of choice in many application domains primarily due to its simplicity, time/space efficiency, and invariance to the ordering of the data points. Unfortunately, the algorithm's sensitivity to the initial selection of the cluster centers remains to be its most serious drawback. Numerous initialization methods have been proposed to address this drawback. Many of these methods, however, have time complexity superlinear in the number of data points, which makes them impractical for large data sets. On the other hand, linear methods are often random and/or sensitive to the ordering of the data points. These methods are generally unreliable in that the quality of their results is unpredictable. Therefore, it is common practice to perform multiple runs of such methods and take the output of the run that produces the best results. Such a practice, however, greatly increases the computational requirements of the otherwise highly efficient k-means algorithm. In this chapter, we investigate the empirical performance of six linear, deterministic (non-random), and order-invariant k-means initialization methods on a large and diverse collection of data sets from the UCI Machine Learning Repository. The results demonstrate that two relatively unknown hierarchical initialization methods due to Su and Dy outperform the remaining four methods with respect to two objective effectiveness criteria. In addition, a recent method due to Erisoglu et al. performs surprisingly poorly.Comment: 21 pages, 2 figures, 5 tables, Partitional Clustering Algorithms (Springer, 2014). arXiv admin note: substantial text overlap with arXiv:1304.7465, arXiv:1209.196

    Twelve-month observational study of children with cancer in 41 countries during the COVID-19 pandemic

    Get PDF
    Introduction Childhood cancer is a leading cause of death. It is unclear whether the COVID-19 pandemic has impacted childhood cancer mortality. In this study, we aimed to establish all-cause mortality rates for childhood cancers during the COVID-19 pandemic and determine the factors associated with mortality. Methods Prospective cohort study in 109 institutions in 41 countries. Inclusion criteria: children <18 years who were newly diagnosed with or undergoing active treatment for acute lymphoblastic leukaemia, non-Hodgkin's lymphoma, Hodgkin lymphoma, retinoblastoma, Wilms tumour, glioma, osteosarcoma, Ewing sarcoma, rhabdomyosarcoma, medulloblastoma and neuroblastoma. Of 2327 cases, 2118 patients were included in the study. The primary outcome measure was all-cause mortality at 30 days, 90 days and 12 months. Results All-cause mortality was 3.4% (n=71/2084) at 30-day follow-up, 5.7% (n=113/1969) at 90-day follow-up and 13.0% (n=206/1581) at 12-month follow-up. The median time from diagnosis to multidisciplinary team (MDT) plan was longest in low-income countries (7 days, IQR 3-11). Multivariable analysis revealed several factors associated with 12-month mortality, including low-income (OR 6.99 (95% CI 2.49 to 19.68); p<0.001), lower middle income (OR 3.32 (95% CI 1.96 to 5.61); p<0.001) and upper middle income (OR 3.49 (95% CI 2.02 to 6.03); p<0.001) country status and chemotherapy (OR 0.55 (95% CI 0.36 to 0.86); p=0.008) and immunotherapy (OR 0.27 (95% CI 0.08 to 0.91); p=0.035) within 30 days from MDT plan. Multivariable analysis revealed laboratory-confirmed SARS-CoV-2 infection (OR 5.33 (95% CI 1.19 to 23.84); p=0.029) was associated with 30-day mortality. Conclusions Children with cancer are more likely to die within 30 days if infected with SARS-CoV-2. However, timely treatment reduced odds of death. This report provides crucial information to balance the benefits of providing anticancer therapy against the risks of SARS-CoV-2 infection in children with cancer

    Fine-mapping of common genetic variants associated with colorectal tumor risk identified potential functional variants

    Get PDF
    Genome-wide association studies (GWAS) have identified many common single nucleotide polymorphisms (SNPs) associated with colorectal cancer risk. These SNPs may tag correlated variants with biological importance. Fine-mapping around GWAS loci can facilitate detection of functional candidates and additional independent risk variants. We analyzed 11,900 cases and 14,311 controls in the Genetics and Epidemiology of Colorectal Cancer Consortium and the Colon Cancer Family Registry. To fine-map genomic regions containing all known common risk variants, we imputed high-density genetic data from the 1000 Genomes Project. We tested single-variant associations with colorectal tumor risk for all variants spanning genomic regions 250-kb upstream or downstream of 31 GWAS-identified SNPs (index SNPs). We queried the University of California, Santa Cruz Genome Browser to examine evidence for biological function. Index SNPs did not show the strongest association signals with colorectal tumor risk in their respective genomic regions. Bioinformatics analysis of SNPs showing smaller P-values in each region revealed 21 functional candidates in 12 loci (5q31.1, 8q24, 11q13.4, 11q23, 12p13.32, 12q24.21, 14q22.2, 15q13, 18q21, 19q13.1, 20p12.3, and 20q13.33). We did not observe evidence of additional independent association signals in GWAS-identified regions. Our results support the utility of integrating data from comprehensive fine-mapping with expanding publicly available genomic databases to help clarify GWAS associations and identify functional candidates that warrant more onerous laboratory follow-up. Such efforts may aid the eventual discovery of disease-causing variant(s).National Institutes of Health; National Cancer Institute; U.S. Department of Health and Human Services
    • 

    corecore