60,562 research outputs found

    Evaluation of classification quality and comparative analysis of clustering and self-organization

    Get PDF
    AbstractClustering is a way of classifying a multi-dimensional dataset by the similarities of its dimensions. The results from clustering must be analyzed to test the accuracy of the algorithm and its implementation. This analysis is sometimes done by a visual representation of the clustered dataset. However, it is impossible to visually represent a dataset with more than four dimensions. Statistical analysis makes this feasible. The analysis performed on the output calculates the centroid of each cluster and the cluster's relation to that centroid. We have investigated two modes of hierarchical clustering and spectral clustering. The standard deviation of each dimension from the centroid, the maximum Euclidean distance from the centroid, and the dimensions that elements of each cluster have in common are also computed. The performed experiments demonstrate which clustering algorithm presents most accurate results under certain circumstances through the use of a synthesis of visual representation and the statistical analysis proposed above

    Finding groups in data: Cluster analysis with ants

    Get PDF
    Wepresent in this paper a modification of Lumer and Faietaā€™s algorithm for data clustering. This approach mimics the clustering behavior observed in real ant colonies. This algorithm discovers automatically clusters in numerical data without prior knowledge of possible number of clusters. In this paper we focus on ant-based clustering algorithms, a particular kind of a swarm intelligent system, and on the effects on the final clustering by using during the classification differentmetrics of dissimilarity: Euclidean, Cosine, and Gower measures. Clustering with swarm-based algorithms is emerging as an alternative to more conventional clustering methods, such as e.g. k-means, etc. Among the many bio-inspired techniques, ant clustering algorithms have received special attention, especially because they still require much investigation to improve performance, stability and other key features that would make such algorithms mature tools for data mining. As a case study, this paper focus on the behavior of clustering procedures in those new approaches. The proposed algorithm and its modifications are evaluated in a number of well-known benchmark datasets. Empirical results clearly show that ant-based clustering algorithms performs well when compared to another techniques

    Benchmarking in cluster analysis: A white paper

    Get PDF
    To achieve scientific progress in terms of building a cumulative body of knowledge, careful attention to benchmarking is of the utmost importance. This means that proposals of new methods of data pre-processing, new data-analytic techniques, and new methods of output post-processing, should be extensively and carefully compared with existing alternatives, and that existing methods should be subjected to neutral comparison studies. To date, benchmarking and recommendations for benchmarking have been frequently seen in the context of supervised learning. Unfortunately, there has been a dearth of guidelines for benchmarking in an unsupervised setting, with the area of clustering as an important subdomain. To address this problem, discussion is given to the theoretical conceptual underpinnings of benchmarking in the field of cluster analysis by means of simulated as well as empirical data. Subsequently, the practicalities of how to address benchmarking questions in clustering are dealt with, and foundational recommendations are made

    The Extraction of Community Structures from Publication Networks to Support Ethnographic Observations of Field Differences in Scientific Communication

    Full text link
    The scientific community of researchers in a research specialty is an important unit of analysis for understanding the field specific shaping of scientific communication practices. These scientific communities are, however, a challenging unit of analysis to capture and compare because they overlap, have fuzzy boundaries, and evolve over time. We describe a network analytic approach that reveals the complexities of these communities through examination of their publication networks in combination with insights from ethnographic field studies. We suggest that the structures revealed indicate overlapping sub- communities within a research specialty and we provide evidence that they differ in disciplinary orientation and research practices. By mapping the community structures of scientific fields we aim to increase confidence about the domain of validity of ethnographic observations as well as of collaborative patterns extracted from publication networks thereby enabling the systematic study of field differences. The network analytic methods presented include methods to optimize the delineation of a bibliographic data set in order to adequately represent a research specialty, and methods to extract community structures from this data. We demonstrate the application of these methods in a case study of two research specialties in the physical and chemical sciences.Comment: Accepted for publication in JASIS

    Application of remote sensing to state and regional problems

    Get PDF
    The methods and procedures used, accomplishments, current status, and future plans are discussed for each of the following applications of LANDSAT in Mississippi: (1) land use planning in Lowndes County; (2) strip mine inventory and reclamation; (3) white-tailed deer habitat evaluation; (4) remote sensing data analysis support systems; (5) discrimination of unique forest habitats in potential lignite areas; (6) changes in gravel operations; and (7) determining freshwater wetlands for inventory and monitoring. The documentation of all existing software and the integration of the image analysis and data base software into a single package are now considered very high priority items

    Weakening organizational ties? A classification of styles of volunteering in the Flemish red cross

    Get PDF
    This article presents an initial empirical assessment of a new analytical framework of styles of volunteering (SOV). The framework suggests that volunteering can be categorized in terms of a multidimensional set of cultural and structural indicators that cohere in systematic and varying ways. With data drawn from a survey of 652 Flemish Red Cross volunteers, a multivariate analysis reveals ļ¬ve different SOV categories of volunteers: episodic contributors, established administrators, reliable coworkers, service-oriented core volunteers, and critical key ļ¬gures. The research ļ¬ndings indicate that the volunteer reality is far more complex than suggested by conventional approaches to the study of volunteering

    Experiments in Clustering Homogeneous XML Documents to Validate an Existing Typology

    Get PDF
    This paper presents some experiments in clustering homogeneous XMLdocuments to validate an existing classification or more generally anorganisational structure. Our approach integrates techniques for extracting knowledge from documents with unsupervised classification (clustering) of documents. We focus on the feature selection used for representing documents and its impact on the emerging classification. We mix the selection of structured features with fine textual selection based on syntactic characteristics.We illustrate and evaluate this approach with a collection of Inria activity reports for the year 2003. The objective is to cluster projects into larger groups (Themes), based on the keywords or different chapters of these activity reports. We then compare the results of clustering using different feature selections, with the official theme structure used by Inria.Comment: (postprint); This version corrects a couple of errors in authors' names in the bibliograph

    Energy Regulation, Roll Call Votes and Regional Resources: Evidence from Russia

    Get PDF
    This paper investigates the relative impact of regional energy production on the legislative choices of Russian Duma deputies on energy regulation between 1994 and 2003. We apply Pooleā€™s optimal classification method of roll call votes using an ordered probit model to explain energy law reform in the first decade of Russiaā€™s democratic transition. Our goal is to analyze the relative importance of home energy on deputiesā€™ behavior, controlling for other factors such as party affiliation, electoral mandate, committee membership and socio-demographic parameters. We observe that energy resource factors have a considerable effect on deputiesā€™ voting behavior. On the other hand, we concurrently find that regional economic preferences are constrained by the public policy priorities of the federal center that continue to set the tone in energy law reform in post-Soviet Russia.Energy Regulation, Energy Roll Law Reform, Energy Resources, Roll Call Votes, Legislative Politics, State Duma, Russia
    • ā€¦
    corecore