Search CORE

60,562 research outputs found

Evaluation of classification quality and comparative analysis of clustering and self-organization

Author: Larocque Aaron
Valova Iren
Publication venue: Published by Elsevier B.V.
Publication date: 31/12/2011
Field of study

AbstractClustering is a way of classifying a multi-dimensional dataset by the similarities of its dimensions. The results from clustering must be analyzed to test the accuracy of the algorithm and its implementation. This analysis is sometimes done by a visual representation of the clustered dataset. However, it is impossible to visually represent a dataset with more than four dimensions. Statistical analysis makes this feasible. The analysis performed on the output calculates the centroid of each cluster and the cluster's relation to that centroid. We have investigated two modes of hierarchical clustering and spectral clustering. The standard deviation of each dimension from the centroid, the maximum Euclidean distance from the centroid, and the dimensions that elements of each cluster have in common are also computed. The performed experiments demonstrate which clustering algorithm presents most accurate results under certain circumstances through the use of a synthesis of visual representation and the statistical analysis proposed above

Elsevier - Publisher Connector

Finding groups in data: Cluster analysis with ants

Author: Berger
Bonabeau
Bonabeau
Brito
Brucker
Chu
Deneubourg
Deneubourg
Dorigo
Dubes
Ester
Franks
Ganti
Gibson
Guha
Halkidi
Handl
Hansen
Jain
Karypis
Kaufman
Kennedy
Lee
Lumer
MacQueen
Ng
Oprisan
Rijsbergen
Urszula Boryczka
Welch
Zait
Publication venue: 'Elsevier BV'
Publication date: 01/01/2009
Field of study

Wepresent in this paper a modification of Lumer and Faieta’s algorithm for data clustering. This approach mimics the clustering behavior observed in real ant colonies. This algorithm discovers automatically clusters in numerical data without prior knowledge of possible number of clusters. In this paper we focus on ant-based clustering algorithms, a particular kind of a swarm intelligent system, and on the effects on the final clustering by using during the classification differentmetrics of dissimilarity: Euclidean, Cosine, and Gower measures. Clustering with swarm-based algorithms is emerging as an alternative to more conventional clustering methods, such as e.g. k-means, etc. Among the many bio-inspired techniques, ant clustering algorithms have received special attention, especially because they still require much investigation to improve performance, stability and other key features that would make such algorithms mature tools for data mining. As a case study, this paper focus on the behavior of clustering procedures in those new approaches. The proposed algorithm and its modifications are evaluated in a number of well-known benchmark datasets. Empirical results clearly show that ant-based clustering algorithms performs well when compared to another techniques

Crossref

Bournemouth University Research Online

Benchmarking in cluster analysis: A white paper

Author: Boulesteix Anne-Laure
Dangl Rainer
Dean Nema
Guyon Isabelle
Hennig Christian
Leisch Friedrich
Steinley Douglas
Van Mechelen Iven
Publication venue
Publication date: 01/10/2018
Field of study

To achieve scientific progress in terms of building a cumulative body of knowledge, careful attention to benchmarking is of the utmost importance. This means that proposals of new methods of data pre-processing, new data-analytic techniques, and new methods of output post-processing, should be extensively and carefully compared with existing alternatives, and that existing methods should be subjected to neutral comparison studies. To date, benchmarking and recommendations for benchmarking have been frequently seen in the context of supervised learning. Unfortunately, there has been a dearth of guidelines for benchmarking in an unsupervised setting, with the area of clustering as an important subdomain. To address this problem, discussion is given to the theoretical conceptual underpinnings of benchmarking in the field of cluster analysis by means of simulated as well as empirical data. Subsequently, the practicalities of how to address benchmarking questions in clustering are dealt with, and foundational recommendations are made

arXiv.org e-Print Archive

Proceedings - University of Groningen

ARTS repository - University of Groningen

Enlighten

Dissertations of the University of Groningen

The Extraction of Community Structures from Publication Networks to Support Ethnographic Observations of Field Differences in Scientific Communication

Author: Baus
Beaulieu
Birnholtz
Boyack
Börner
Cambrosio
Crane
Cronin
Fry
Fry
Galison
Geels
Gläser
Gläser
Gläser
Guimera
Guimera
Hellsten
Hine
Howard
Huang
Jansen
Kling
Kling
Kling
Knorr Cetina
Kretschmer
Lambiotte
Lancichinetti
Laurens
Lievrouw
Lievrouw
Melin
Mogoutov
Moran-Ellis
Morris
Mulkay
Nentwich
Rafols
Rosvall
Seglen
Shibata
Small
Strotmann
Van den Besselaar
Van House
Velden
Veugelers
Walsh
Whitley
Zitt
Zitt
Zuccala
Publication venue
Publication date: 09/01/2013
Field of study

The scientific community of researchers in a research specialty is an important unit of analysis for understanding the field specific shaping of scientific communication practices. These scientific communities are, however, a challenging unit of analysis to capture and compare because they overlap, have fuzzy boundaries, and evolve over time. We describe a network analytic approach that reveals the complexities of these communities through examination of their publication networks in combination with insights from ethnographic field studies. We suggest that the structures revealed indicate overlapping sub- communities within a research specialty and we provide evidence that they differ in disciplinary orientation and research practices. By mapping the community structures of scientific fields we aim to increase confidence about the domain of validity of ethnographic observations as well as of collaborative patterns extracted from publication networks thereby enabling the systematic study of field differences. The network analytic methods presented include methods to optimize the delineation of a bibliographic data set in order to adequately represent a research specialty, and methods to extract community structures from this data. We demonstrate the application of these methods in a case study of two research specialties in the physical and chemical sciences.Comment: Accepted for publication in JASIS

arXiv.org e-Print Archive

Crossref

Deep Blue Documents at the University of Michigan

Application of remote sensing to state and regional problems

Author: Clark J. R.
Duffy B.
Miller W. F.
Minchew K.
Solomon J. L.
Wright L. H.
Publication venue
Publication date
Field of study

The methods and procedures used, accomplishments, current status, and future plans are discussed for each of the following applications of LANDSAT in Mississippi: (1) land use planning in Lowndes County; (2) strip mine inventory and reclamation; (3) white-tailed deer habitat evaluation; (4) remote sensing data analysis support systems; (5) discrimination of unique forest habitats in potential lignite areas; (6) changes in gravel operations; and (7) determining freshwater wetlands for inventory and monitoring. The documentation of all existing software and the integration of the image analysis and data base software into a single package are now considered very high priority items

NASA Technical Reports Server

Weakening organizational ties? A classification of styles of volunteering in the Flemish red cross

Author: Hustinx Lesley
Publication venue
Publication date: 01/01/2005
Field of study

This article presents an initial empirical assessment of a new analytical framework of styles of volunteering (SOV). The framework suggests that volunteering can be categorized in terms of a multidimensional set of cultural and structural indicators that cohere in systematic and varying ways. With data drawn from a survey of 652 Flemish Red Cross volunteers, a multivariate analysis reveals ﬁve different SOV categories of volunteers: episodic contributors, established administrators, reliable coworkers, service-oriented core volunteers, and critical key ﬁgures. The research ﬁndings indicate that the volunteer reality is far more complex than suggested by conventional approaches to the study of volunteering

Lirias

Ghent University Academic Bibliography

Experiments in Clustering Homogeneous XML Documents to Validate an Existing Typology

Author: Despeyroux Thierry
Lechevallier Yves
Trousse Brigitte
Vercoustre Anne-Marie
Publication venue
Publication date: 01/01/2005
Field of study

This paper presents some experiments in clustering homogeneous XMLdocuments to validate an existing classification or more generally anorganisational structure. Our approach integrates techniques for extracting knowledge from documents with unsupervised classification (clustering) of documents. We focus on the feature selection used for representing documents and its impact on the emerging classification. We mix the selection of structured features with fine textual selection based on syntactic characteristics.We illustrate and evaluate this approach with a collection of Inria activity reports for the year 2003. The objective is to cluster projects into larger groups (Themes), based on the keywords or different chapters of these activity reports. We then compare the results of clustering using different feature selections, with the official theme structure used by Inria.Comment: (postprint); This version corrects a couple of errors in authors' names in the bibliograph

arXiv.org e-Print Archive

CiteSeerX

INRIA a CCSD electronic archive server

Energy Regulation, Roll Call Votes and Regional Resources: Evidence from Russia

Author: Benno Torgler
Theocharis N. Grigoriadis
Publication venue
Publication date
Field of study

This paper investigates the relative impact of regional energy production on the legislative choices of Russian Duma deputies on energy regulation between 1994 and 2003. We apply Poole’s optimal classification method of roll call votes using an ordered probit model to explain energy law reform in the first decade of Russia’s democratic transition. Our goal is to analyze the relative importance of home energy on deputies’ behavior, controlling for other factors such as party affiliation, electoral mandate, committee membership and socio-demographic parameters. We observe that energy resource factors have a considerable effect on deputies’ voting behavior. On the other hand, we concurrently find that regional economic preferences are constrained by the public policy priorities of the federal center that continue to set the tone in energy law reform in post-Soviet Russia.Energy Regulation, Energy Roll Law Reform, Energy Resources, Roll Call Votes, Legislative Politics, State Duma, Russia

Research Papers in Economics