468 research outputs found

    Linear, Deterministic, and Order-Invariant Initialization Methods for the K-Means Clustering Algorithm

    Full text link
    Over the past five decades, k-means has become the clustering algorithm of choice in many application domains primarily due to its simplicity, time/space efficiency, and invariance to the ordering of the data points. Unfortunately, the algorithm's sensitivity to the initial selection of the cluster centers remains to be its most serious drawback. Numerous initialization methods have been proposed to address this drawback. Many of these methods, however, have time complexity superlinear in the number of data points, which makes them impractical for large data sets. On the other hand, linear methods are often random and/or sensitive to the ordering of the data points. These methods are generally unreliable in that the quality of their results is unpredictable. Therefore, it is common practice to perform multiple runs of such methods and take the output of the run that produces the best results. Such a practice, however, greatly increases the computational requirements of the otherwise highly efficient k-means algorithm. In this chapter, we investigate the empirical performance of six linear, deterministic (non-random), and order-invariant k-means initialization methods on a large and diverse collection of data sets from the UCI Machine Learning Repository. The results demonstrate that two relatively unknown hierarchical initialization methods due to Su and Dy outperform the remaining four methods with respect to two objective effectiveness criteria. In addition, a recent method due to Erisoglu et al. performs surprisingly poorly.Comment: 21 pages, 2 figures, 5 tables, Partitional Clustering Algorithms (Springer, 2014). arXiv admin note: substantial text overlap with arXiv:1304.7465, arXiv:1209.196

    Competing risk and heterogeneity of treatment effect in clinical trials

    Get PDF
    It has been demonstrated that patients enrolled in clinical trials frequently have a large degree of variation in their baseline risk for the outcome of interest. Thus, some have suggested that clinical trial results should routinely be stratified by outcome risk using risk models, since the summary results may otherwise be misleading. However, variation in competing risk is another dimension of risk heterogeneity that may also underlie treatment effect heterogeneity. Understanding the effects of competing risk heterogeneity may be especially important for pragmatic comparative effectiveness trials, which seek to include traditionally excluded patients, such as the elderly or complex patients with multiple comorbidities. Indeed, the observed effect of an intervention is dependent on the ratio of outcome risk to competing risk, and these risks – which may or may not be correlated – may vary considerably in patients enrolled in a trial. Further, the effects of competing risk on treatment effect heterogeneity can be amplified by even a small degree of treatment related harm. Stratification of trial results along both the competing and the outcome risk dimensions may be necessary if pragmatic comparative effectiveness trials are to provide the clinically useful information their advocates intend

    Genome-Wide Analysis of Neuroblastomas using High-Density Single Nucleotide Polymorphism Arrays

    Get PDF
    BACKGROUND: Neuroblastomas are characterized by chromosomal alterations with biological and clinical significance. We analyzed paired blood and primary tumor samples from 22 children with high-risk neuroblastoma for loss of heterozygosity (LOH) and DNA copy number change using the Affymetrix 10K single nucleotide polymorphism (SNP) array. FINDINGS: Multiple areas of LOH and copy number gain were seen. The most commonly observed area of LOH was on chromosome arm 11q (15/22 samples; 68%). Chromosome 11q LOH was highly associated with occurrence of chromosome 3p LOH: 9 of the 15 samples with 11q LOH had concomitant 3p LOH (P = 0.016). Chromosome 1p LOH was seen in one-third of cases. LOH events on chromosomes 11q and 1p were generally accompanied by copy number loss, indicating hemizygous deletion within these regions. The one exception was on chromosome 11p, where LOH in all four cases was accompanied by normal copy number or diploidy, implying uniparental disomy. Gain of copy number was most frequently observed on chromosome arm 17q (21/22 samples; 95%) and was associated with allelic imbalance in six samples. Amplification of MYCN was also noted, and also amplification of a second gene, ALK, in a single case. CONCLUSIONS: This analysis demonstrates the power of SNP arrays for high-resolution determination of LOH and DNA copy number change in neuroblastoma, a tumor in which specific allelic changes drive clinical outcome and selection of therapy

    Genetic identification of cytomegaloviruses in a rural population of Côte d'Ivoire.

    Get PDF
    BACKGROUND: Cytomegaloviruses (CMVs) are herpesviruses that infect many mammalian species, including humans. Infection generally passes undetected, but the virus can cause serious disease in individuals with impaired immune function. Human CMV (HCMV) is circulating with high seroprevalence (60-100 %) on all continents. However, little information is available on HCMV genoprevalence and genetic diversity in subsaharan Africa, especially in rural areas of West Africa that are at high risk of human-to-human HCMV transmission. In addition, there is a potential for zoonotic spillover of pathogens through bushmeat hunting and handling in these areas as shown for various retroviruses. Although HCMV and nonhuman CMVs are regarded as species-specific, potential human infection with CMVs of non-human primate (NHP) origin, shown to circulate in the local NHP population, has not been studied. FINDINGS: Analysis of 657 human oral swabs and fecal samples collected from 518 individuals living in 8 villages of Côte d'Ivoire with generic PCR for identification of human and NHP CMVs revealed shedding of HCMV in 2.5 % of the individuals. Determination of glycoprotein B sequences showed identity with strains Towne, AD169 and Toledo, respectively. NHP CMV sequences were not detected. CONCLUSIONS: HCMV is actively circulating in a proportion of the rural Côte d'Ivoire human population with circulating strains being closely related to those previously identified in non-African countries. The lack of NHP CMVs in human populations in an environment conducive to cross-species infection supports zoonotic transmission of CMVs to humans being at most a rare event

    A Biomedically Enriched Collection of 7000 Human ORF Clones

    Get PDF
    We report the production and availability of over 7000 fully sequence verified plasmid ORF clones representing over 3400 unique human genes. These ORF clones were derived using the human MGC collection as template and were produced in two formats: with and without stop codons. Thus, this collection supports the production of either native protein or proteins with fusion tags added to either or both ends. The template clones used to generate this collection were enriched in three ways. First, gene redundancy was removed. Second, clones were selected to represent the best available GenBank reference sequence. Finally, a literature-based software tool was used to evaluate the list of target genes to ensure that it broadly reflected biomedical research interests. The target gene list was compared with 4000 human diseases and over 8500 biological and chemical MeSH classes in ∼15 Million publications recorded in PubMed at the time of analysis. The outcome of this analysis revealed that relative to the genome and the MGC collection, this collection is enriched for the presence of genes with published associations with a wide range of diseases and biomedical terms without displaying a particular bias towards any single disease or concept. Thus, this collection is likely to be a powerful resource for researchers who wish to study protein function in a set of genes with documented biomedical significance

    CCL5 regulation of mucosal chlamydial immunity and infection

    Get PDF
    <p>Abstract</p> <p>Background</p> <p>Following genital chlamydial infection, an early T helper type 1 (Th1)-associated immune response precedes the activation and recruitment of specific Th1 cells bearing distinct chemokine receptors, subsequently leading to the clearance of <it>Chlamydia</it>. We have shown that CCR5, a receptor for CCL5, is crucial for protective chlamydial immunity. Our laboratory and others have also demonstrated that CCL5 deficiencies found in man and animals can increase the susceptibility and progression of infectious diseases by modulating mucosal immunity. These findings suggest the CCR5-CCL5 axis is necessary for optimal chlamydial immunity. We hypothesized CCL5 is required for protective humoral and cellular immunity against <it>Chlamydia</it>.</p> <p>Results</p> <p>The present study revealed that CCR5 and CCL5 mRNAs are elevated in the spleen, iliac lymph nodes (ILNs), and genital mucosa following <it>Chlamydia muriduram </it>challenge. Antibody (Ab)-mediated inhibition of CCL5 during genital chlamydial infection suppressed humoral and Th1 > Th2 cellular responses by splenic-, ILN-, and genital mucosa-derived lymphocytes. Antigen (Ag)-specific proliferative responses of CD4<sup>+ </sup>T cells from spleen, ILNs, and genital organs also declined after CCL5 inhibition.</p> <p>Conclusion</p> <p>The suppression of these responses correlated with delayed clearance of <it>C. muriduram</it>, which indicate chlamydial immunity is mediated by Th1 immune responses driven in part by CCL5. Taken together with other studies, the data show that CCL5 mediates the temporal recruitment and activation of leukocytes to mitigate chlamydial infection through enhancing adaptive mucosal humoral and cellular immunity.</p
    • …
    corecore