51 research outputs found

    Linear, Deterministic, and Order-Invariant Initialization Methods for the K-Means Clustering Algorithm

    Full text link
    Over the past five decades, k-means has become the clustering algorithm of choice in many application domains primarily due to its simplicity, time/space efficiency, and invariance to the ordering of the data points. Unfortunately, the algorithm's sensitivity to the initial selection of the cluster centers remains to be its most serious drawback. Numerous initialization methods have been proposed to address this drawback. Many of these methods, however, have time complexity superlinear in the number of data points, which makes them impractical for large data sets. On the other hand, linear methods are often random and/or sensitive to the ordering of the data points. These methods are generally unreliable in that the quality of their results is unpredictable. Therefore, it is common practice to perform multiple runs of such methods and take the output of the run that produces the best results. Such a practice, however, greatly increases the computational requirements of the otherwise highly efficient k-means algorithm. In this chapter, we investigate the empirical performance of six linear, deterministic (non-random), and order-invariant k-means initialization methods on a large and diverse collection of data sets from the UCI Machine Learning Repository. The results demonstrate that two relatively unknown hierarchical initialization methods due to Su and Dy outperform the remaining four methods with respect to two objective effectiveness criteria. In addition, a recent method due to Erisoglu et al. performs surprisingly poorly.Comment: 21 pages, 2 figures, 5 tables, Partitional Clustering Algorithms (Springer, 2014). arXiv admin note: substantial text overlap with arXiv:1304.7465, arXiv:1209.196

    Crystal Structures of Cif from Bacterial Pathogens Photorhabdus luminescens and Burkholderia pseudomallei

    Get PDF
    A pre-requisite for bacterial pathogenesis is the successful interaction of a pathogen with a host. One mechanism used by a broad range of Gram negative bacterial pathogens is to deliver effector proteins directly into host cells through a dedicated type III secretion system where they modulate host cell function. The cycle inhibiting factor (Cif) family of effector proteins, identified in a growing number of pathogens that harbour functional type III secretion systems and have a wide host range, arrest the eukaryotic cell cycle. Here, the crystal structures of Cifs from the insect pathogen/nematode symbiont Photorhabdus luminescens (a γ-proteobacterium) and human pathogen Burkholderia pseudomallei (a β-proteobacterium) are presented. Both of these proteins adopt an overall fold similar to the papain sub-family of cysteine proteases, as originally identified in the structure of a truncated form of Cif from Enteropathogenic E. coli (EPEC), despite sharing only limited sequence identity. The structure of an N-terminal region, referred to here as the ‘tail-domain’ (absent in the EPEC Cif structure), suggests a surface likely to be involved in host-cell substrate recognition. The conformation of the Cys-His-Gln catalytic triad is retained, and the essential cysteine is exposed to solvent and addressable by small molecule reagents. These structures and biochemical work contribute to the rapidly expanding literature on Cifs, and direct further studies to better understand the molecular details of the activity of these proteins

    Enriching news events with meta-knowledge information

    Get PDF
    Given the vast amounts of data available in digitised textual form, it is important to provide mechanisms that allow users to extract nuggets of relevant information from the ever growing volumes of potentially important documents. Text mining techniques can help, through their ability to automatically extract relevant event descriptions, which link entities with situations described in the text. However, correct and complete interpretation of these event descriptions is not possible without considering additional contextual information often present within the surrounding text. This information, which we refer to as meta-knowledge, can include (but is not restricted to) the modality, subjectivity, source, polarity and specificity of the event. We have developed a meta-knowledge annotation scheme specifically tailored for news events, which includes six aspects of event interpretation. We have applied this annotation scheme to the ACE 2005 corpus, which contains 599 documents from various written and spoken news sources. We have also identified and annotated the words and phrases evoking the different types of meta-knowledge. Evaluation of the annotated corpus shows high levels of inter-annotator agreement for five meta-knowledge attributes, and moderate level of agreement for the sixth attribute. Detailed analysis of the annotated corpus has revealed further insights into the expression mechanisms of different types of meta-knowledge, their relative frequencies and mutual correlations

    Mineral Type and Solution Chemistry Affect the Structure and Composition of Actively Growing Bacterial Communities as Revealed by Bromodeoxyuridine Immunocapture and 16S rRNA Pyrosequencing

    Get PDF
    © 2016, Springer Science+Business Media New York. Understanding how minerals affect bacterial communities and their in situ activities in relation to environmental conditions are central issues in soil microbial ecology, as minerals represent essential reservoirs of inorganic nutrients for the biosphere. To determine the impact of mineral type and solution chemistry on soil bacterial communities, we compared the diversity, composition, and functional abilities of a soil bacterial community incubated in presence/absence of different mineral types (apatite, biotite, obsidian). Microcosms were prepared containing different liquid culture media devoid of particular essential nutrients, the nutrients provided only in the introduced minerals and therefore only available to the microbial community through mineral dissolution by biotic and/or abiotic processes. By combining functional screening of bacterial isolates and community analysis by bromodeoxyuridine DNA immunocapture and 16S rRNA gene pyrosequencing, we demonstrated that bacterial communities were mainly impacted by the solution chemistry at the taxonomic level and by the mineral type at the functional level. Metabolically active bacterial communities varied with solution chemistry and mineral type. Burkholderia were significantly enriched in the obsidian treatment compared to the biotite treatment and were the most effective isolates at solubilizing phosphorous or mobilizing iron, in all the treatments. A detailed analysis revealed that the 16S rRNA gene sequences of the OTUs or isolated strains assigned as Burkholderia in our study showed high homology with effective mineral-weathering bacteria previously recovered from the same experimental site
    • …
    corecore