60,562 research outputs found
Evaluation of classification quality and comparative analysis of clustering and self-organization
AbstractClustering is a way of classifying a multi-dimensional dataset by the similarities of its dimensions. The results from clustering must be analyzed to test the accuracy of the algorithm and its implementation. This analysis is sometimes done by a visual representation of the clustered dataset. However, it is impossible to visually represent a dataset with more than four dimensions. Statistical analysis makes this feasible. The analysis performed on the output calculates the centroid of each cluster and the cluster's relation to that centroid. We have investigated two modes of hierarchical clustering and spectral clustering. The standard deviation of each dimension from the centroid, the maximum Euclidean distance from the centroid, and the dimensions that elements of each cluster have in common are also computed. The performed experiments demonstrate which clustering algorithm presents most accurate results under certain circumstances through the use of a synthesis of visual representation and the statistical analysis proposed above
Finding groups in data: Cluster analysis with ants
Wepresent in this paper a modification of Lumer and Faietaās algorithm for data clustering. This approach
mimics the clustering behavior observed in real ant colonies. This algorithm discovers automatically
clusters in numerical data without prior knowledge of possible number of clusters. In this paper we focus
on ant-based clustering algorithms, a particular kind of a swarm intelligent system, and on the effects on
the final clustering by using during the classification differentmetrics of dissimilarity: Euclidean, Cosine,
and Gower measures. Clustering with swarm-based algorithms is emerging as an alternative to more
conventional clustering methods, such as e.g. k-means, etc. Among the many bio-inspired techniques, ant
clustering algorithms have received special attention, especially because they still require much
investigation to improve performance, stability and other key features that would make such algorithms
mature tools for data mining.
As a case study, this paper focus on the behavior of clustering procedures in those new approaches.
The proposed algorithm and its modifications are evaluated in a number of well-known benchmark
datasets. Empirical results clearly show that ant-based clustering algorithms performs well when
compared to another techniques
Benchmarking in cluster analysis: A white paper
To achieve scientific progress in terms of building a cumulative body of
knowledge, careful attention to benchmarking is of the utmost importance. This
means that proposals of new methods of data pre-processing, new data-analytic
techniques, and new methods of output post-processing, should be extensively
and carefully compared with existing alternatives, and that existing methods
should be subjected to neutral comparison studies. To date, benchmarking and
recommendations for benchmarking have been frequently seen in the context of
supervised learning. Unfortunately, there has been a dearth of guidelines for
benchmarking in an unsupervised setting, with the area of clustering as an
important subdomain. To address this problem, discussion is given to the
theoretical conceptual underpinnings of benchmarking in the field of cluster
analysis by means of simulated as well as empirical data. Subsequently, the
practicalities of how to address benchmarking questions in clustering are dealt
with, and foundational recommendations are made
The Extraction of Community Structures from Publication Networks to Support Ethnographic Observations of Field Differences in Scientific Communication
The scientific community of researchers in a research specialty is an
important unit of analysis for understanding the field specific shaping of
scientific communication practices. These scientific communities are, however,
a challenging unit of analysis to capture and compare because they overlap,
have fuzzy boundaries, and evolve over time. We describe a network analytic
approach that reveals the complexities of these communities through examination
of their publication networks in combination with insights from ethnographic
field studies. We suggest that the structures revealed indicate overlapping
sub- communities within a research specialty and we provide evidence that they
differ in disciplinary orientation and research practices. By mapping the
community structures of scientific fields we aim to increase confidence about
the domain of validity of ethnographic observations as well as of collaborative
patterns extracted from publication networks thereby enabling the systematic
study of field differences. The network analytic methods presented include
methods to optimize the delineation of a bibliographic data set in order to
adequately represent a research specialty, and methods to extract community
structures from this data. We demonstrate the application of these methods in a
case study of two research specialties in the physical and chemical sciences.Comment: Accepted for publication in JASIS
Application of remote sensing to state and regional problems
The methods and procedures used, accomplishments, current status, and future plans are discussed for each of the following applications of LANDSAT in Mississippi: (1) land use planning in Lowndes County; (2) strip mine inventory and reclamation; (3) white-tailed deer habitat evaluation; (4) remote sensing data analysis support systems; (5) discrimination of unique forest habitats in potential lignite areas; (6) changes in gravel operations; and (7) determining freshwater wetlands for inventory and monitoring. The documentation of all existing software and the integration of the image analysis and data base software into a single package are now considered very high priority items
Weakening organizational ties? A classification of styles of volunteering in the Flemish red cross
This article presents an initial empirical assessment of a new analytical framework of styles of volunteering (SOV). The framework suggests that volunteering can be categorized in terms of a multidimensional set of cultural and structural indicators that cohere in systematic and varying ways. With data drawn from a survey of 652 Flemish Red Cross volunteers, a multivariate analysis reveals ļ¬ve different SOV categories of volunteers: episodic contributors, established administrators, reliable coworkers, service-oriented core volunteers, and critical key ļ¬gures. The research ļ¬ndings indicate that the volunteer reality is far more complex than suggested by conventional approaches to the study of volunteering
Experiments in Clustering Homogeneous XML Documents to Validate an Existing Typology
This paper presents some experiments in clustering homogeneous XMLdocuments
to validate an existing classification or more generally anorganisational
structure. Our approach integrates techniques for extracting knowledge from
documents with unsupervised classification (clustering) of documents. We focus
on the feature selection used for representing documents and its impact on the
emerging classification. We mix the selection of structured features with fine
textual selection based on syntactic characteristics.We illustrate and evaluate
this approach with a collection of Inria activity reports for the year 2003.
The objective is to cluster projects into larger groups (Themes), based on the
keywords or different chapters of these activity reports. We then compare the
results of clustering using different feature selections, with the official
theme structure used by Inria.Comment: (postprint); This version corrects a couple of errors in authors'
names in the bibliograph
Energy Regulation, Roll Call Votes and Regional Resources: Evidence from Russia
This paper investigates the relative impact of regional energy production on the legislative choices of Russian Duma deputies on energy regulation between 1994 and 2003. We apply Pooleās optimal classification method of roll call votes using an ordered probit model to explain energy law reform in the first decade of Russiaās democratic transition. Our goal is to analyze the relative importance of home energy on deputiesā behavior, controlling for other factors such as party affiliation, electoral mandate, committee membership and socio-demographic parameters. We observe that energy resource factors have a considerable effect on deputiesā voting behavior. On the other hand, we concurrently find that regional economic preferences are constrained by the public policy priorities of the federal center that continue to set the tone in energy law reform in post-Soviet Russia.Energy Regulation, Energy Roll Law Reform, Energy Resources, Roll Call Votes, Legislative Politics, State Duma, Russia
- ā¦