196 research outputs found

    An efficient Particle Swarm Optimization approach to cluster short texts

    Get PDF
    This is the author’s version of a work that was accepted for publication in Information Sciencies. Changes resulting from the publishing process, such as peer review, editing, corrections, structural formatting, and other quality control mechanisms may not be reflected in this document. Changes may have been made to this work since it was submitted for publication. A definitive version was subsequently published in Information Sciences, VOL 265, MAY 1 2014 DOI 10.1016/j.ins.2013.12.010.Short texts such as evaluations of commercial products, news, FAQ's and scientific abstracts are important resources on the Web due to the constant requirements of people to use this on line information in real life. In this context, the clustering of short texts is a significant analysis task and a discrete Particle Swarm Optimization (PSO) algorithm named CLUDIPSO has recently shown a promising performance in this type of problems. CLUDIPSO obtained high quality results with small corpora although, with larger corpora, a significant deterioration of performance was observed. This article presents CLUDIPSO*, an improved version of CLUDIPSO, which includes a different representation of particles, a more efficient evaluation of the function to be optimized and some modifications in the mutation operator. Experimental results with corpora containing scientific abstracts, news and short legal documents obtained from the Web, show that CLUDIPSO* is an effective clustering method for short-text corpora of small and medium size. (C) 2013 Elsevier Inc. All rights reserved.The research work is partially funded by the European Commission as part of the WIQ-EI IRSES research project (Grant No. 269180) within the FP 7 Marie Curie People Framework and it has been developed in the framework of the Microcluster VLC/Campus (International Campus of Excellence) on Multimodal Intelligent Systems. The research work of the first author is partially funded by the program PAID-02-10 2257 (Universitat Politecnica de Valencia) and CONICET (Argentina).Cagnina, L.; Errecalde, M.; Ingaramo, D.; Rosso, P. (2014). An efficient Particle Swarm Optimization approach to cluster short texts. Information Sciences. 265:36-49. https://doi.org/10.1016/j.ins.2013.12.010S364926

    k-Means

    Get PDF
    The k-means clustering algorithm (k-means for short) provides a method offinding structure in input examples. It is also called the Lloyd–Forgy algorithm as it was independently introduced by both Stuart Lloyd and Edward Forgy. k-means, like other algorithms you will study in this part of the book, is an unsupervised learning algorithm and, as such, does not require labels associated with input examples. Recall that unsupervised learning algorithms provide a way of discovering some inherent structure in the input examples. This is in contrast with supervised learning algorithms, which require input examples and associated labels so as to fit a hypothesis function that maps input examples to one or more output variables

    k-Means

    Get PDF
    The k-means clustering algorithm (k-means for short) provides a method offinding structure in input examples. It is also called the Lloyd–Forgy algorithm as it was independently introduced by both Stuart Lloyd and Edward Forgy. k-means, like other algorithms you will study in this part of the book, is an unsupervised learning algorithm and, as such, does not require labels associated with input examples. Recall that unsupervised learning algorithms provide a way of discovering some inherent structure in the input examples. This is in contrast with supervised learning algorithms, which require input examples and associated labels so as to fit a hypothesis function that maps input examples to one or more output variables

    Bioinformatics Applications Based On Machine Learning

    Get PDF
    The great advances in information technology (IT) have implications for many sectors, such as bioinformatics, and has considerably increased their possibilities. This book presents a collection of 11 original research papers, all of them related to the application of IT-related techniques within the bioinformatics sector: from new applications created from the adaptation and application of existing techniques to the creation of new methodologies to solve existing problems

    A field-based computing approach to sensing-driven clustering in robot swarms

    Get PDF
    Swarm intelligence leverages collective behaviours emerging from interaction and activity of several “simple” agents to solve problems in various environments. One problem of interest in large swarms featuring a variety of sub-goals is swarm clustering, where the individuals of a swarm are assigned or choose to belong to zero or more groups, also called clusters. In this work, we address the sensing-based swarm clustering problem, where clusters are defined based on both the values sensed from the environment and the spatial distribution of the values and the agents. Moreover, we address it in a setting characterised by decentralisation of computation and interaction, and dynamicity of values and mobility of agents. For the solution, we propose to use the field-based computing paradigm, where computation and interaction are expressed in terms of a functional manipulation of fields, distributed and evolving data structures mapping each individual of the system to values over time. We devise a solution to sensing-based swarm clustering leveraging multiple concurrent field computations with limited domain and evaluate the approach experimentally by means of simulations, showing that the programmed swarms form clusters that well reflect the underlying environmental phenomena dynamics

    A review of quantum-inspired metaheuristic algorithms for automatic clustering

    Get PDF
    In real-world scenarios, identifying the optimal number of clusters in a dataset is a difficult task due to insufficient knowledge. Therefore, the indispensability of sophisticated automatic clus tering algorithms for this purpose has been contemplated by some researchers. Several automatic clustering algorithms assisted by quantum-inspired metaheuristics have been developed in recent years. However, the literature lacks definitive documentation of the state-of-the-art quantum-inspired metaheuristic algorithms for automatically clustering datasets. This article presents a brief overview of the automatic clustering process to establish the importance of making the clustering process automatic. The fundamental concepts of the quantum computing paradigm are also presented to highlight the utility of quantum-inspired algorithms. This article thoroughly analyses some algo rithms employed to address the automatic clustering of various datasets. The reviewed algorithms were classified according to their main sources of inspiration. In addition, some representative works of each classification were chosen from the existing works. Thirty-six such prominent algorithms were further critically analysed based on their aims, used mechanisms, data specifications, merits and demerits. Comparative results based on the performance and optimal computational time are also presented to critically analyse the reviewed algorithms. As such, this article promises to provide a detailed analysis of the state-of-the-art quantum-inspired metaheuristic algorithms, while highlighting their merits and demerits.Web of Science119art. no. 201

    An Adjectival Interface for procedural content generation

    Get PDF
    Includes abstract.Includes bibliographical references.In this thesis, a new interface for the generation of procedural content is proposed, in which the user describes the content that they wish to create by using adjectives. Procedural models are typically controlled by complex parameters and often require expert technical knowledge. Since people communicate with each other using language, an adjectival interface to the creation of procedural content is a natural step towards addressing the needs of non-technical and non-expert users. The key problem addressed is that of establishing a mapping between adjectival descriptors, and the parameters employed by procedural models. We show how this can be represented as a mapping between two multi-dimensional spaces, adjective space and parameter space, and approximate the mapping by applying novel function approximation techniques to points of correspondence between the two spaces. These corresponding point pairs are established through a training phase, in which random procedural content is generated and then described, allowing one to map from parameter space to adjective space. Since we ultimately seek a means of mapping from adjective space to parameter space, particle swarm optimisation is employed to select a point in parameter space that best matches any given point in adjective space. The overall result, is a system in which the user can specify adjectives that are then used to create appropriate procedural content, by mapping the adjectives to a suitable set of procedural parameters and employing the standard procedural technique using those parameters as inputs. In this way, none of the control offered by procedural modelling is sacrificed â although the adjectival interface is simpler, it can at any point be stripped away to reveal the standard procedural model and give users access to the full set of procedural parameters. As such, the adjectival interface can be used for rapid prototyping to create an approximation of the content desired, after which the procedural parameters can be used to fine-tune the result. The adjectival interface also serves as a means of intermediate bridging, affording users a more comfortable interface until they are fully conversant with the technicalities of the underlying procedural parameters. Finally, the adjectival interface is compared and contrasted to an interface that allows for direct specification of the procedural parameters. Through user experiments, it is found that the adjectival interface presented in this thesis is not only easier to use and understand, but also that it produces content which more accurately reflects usersâ intentions
    corecore