249,022 research outputs found
A novel clustering methodology based on modularity optimisation for detecting authorship affinities in Shakespearean era plays
© 2016 Naeni et al. This is an open access article distributed under the terms of the Creative Commons Attribution License, which permits unrestricted use, distribution, and reproduction in any medium, provided the original author and source are credited. In this study we propose a novel, unsupervised clustering methodology for analyzing large datasets. This new, efficient methodology converts the general clustering problem into the community detection problem in graph by using the Jensen-Shannon distance, a dissimilarity measure originating in Information Theory. Moreover, we use graph theoretic concepts for the generation and analysis of proximity graphs. Our methodology is based on a newly proposed memetic algorithm (iMA-Net) for discovering clusters of data elements by maximizing the modularity function in proximity graphs of literary works. To test the effectiveness of this general methodology, we apply it to a text corpus dataset, which contains frequencies of approximately 55,114 unique words across all 168 written in the Shakespearean era (16th and 17th centuries), to analyze and detect clusters of similar plays. Experimental results and comparison with state-of-the-art clustering methods demonstrate the remarkable performance of our new method for identifying high quality clusters which reflect the commonalities in the literary style of the plays
Graph ambiguity
In this paper, we propose a rigorous way to define the concept of ambiguity in the domain of graphs. In past studies, the classical definition of ambiguity has been derived starting from fuzzy set and fuzzy information theories. Our aim is to show that also in the domain of the graphs it is possible to derive a formulation able to capture the same semantic and mathematical concept. To strengthen the theoretical results, we discuss the application of the graph ambiguity concept to the graph classification setting, conceiving a new kind of inexact graph matching procedure. The results prove that the graph ambiguity concept is a characterizing and discriminative property of graphs. (C) 2013 Elsevier B.V. All rights reserved
Spanning Trees and bootstrap reliability estimation in correlation based networks
We introduce a new technique to associate a spanning tree to the average
linkage cluster analysis. We term this tree as the Average Linkage Minimum
Spanning Tree. We also introduce a technique to associate a value of
reliability to links of correlation based graphs by using bootstrap replicas of
data. Both techniques are applied to the portfolio of the 300 most capitalized
stocks traded at New York Stock Exchange during the time period 2001-2003. We
show that the Average Linkage Minimum Spanning Tree recognizes economic sectors
and sub-sectors as communities in the network slightly better than the Minimum
Spanning Tree does. We also show that the average reliability of links in the
Minimum Spanning Tree is slightly greater than the average reliability of links
in the Average Linkage Minimum Spanning Tree.Comment: 17 pages, 3 figure
A comparative study of the AHP and TOPSIS methods for implementing load shedding scheme in a pulp mill system
The advancement of technology had encouraged mankind to design and create useful
equipment and devices. These equipment enable users to fully utilize them in various
applications. Pulp mill is one of the heavy industries that consumes large amount of
electricity in its production. Due to this, any malfunction of the equipment might
cause mass losses to the company. In particular, the breakdown of the generator
would cause other generators to be overloaded. In the meantime, the subsequence
loads will be shed until the generators are sufficient to provide the power to other
loads. Once the fault had been fixed, the load shedding scheme can be deactivated.
Thus, load shedding scheme is the best way in handling such condition. Selected load
will be shed under this scheme in order to protect the generators from being
damaged. Multi Criteria Decision Making (MCDM) can be applied in determination
of the load shedding scheme in the electric power system. In this thesis two methods
which are Analytic Hierarchy Process (AHP) and Technique for Order Preference by
Similarity to Ideal Solution (TOPSIS) were introduced and applied. From this thesis,
a series of analyses are conducted and the results are determined. Among these two
methods which are AHP and TOPSIS, the results shown that TOPSIS is the best
Multi criteria Decision Making (MCDM) for load shedding scheme in the pulp mill
system. TOPSIS is the most effective solution because of the highest percentage
effectiveness of load shedding between these two methods. The results of the AHP
and TOPSIS analysis to the pulp mill system are very promising
Information based clustering
In an age of increasingly large data sets, investigators in many different
disciplines have turned to clustering as a tool for data analysis and
exploration. Existing clustering methods, however, typically depend on several
nontrivial assumptions about the structure of data. Here we reformulate the
clustering problem from an information theoretic perspective which avoids many
of these assumptions. In particular, our formulation obviates the need for
defining a cluster "prototype", does not require an a priori similarity metric,
is invariant to changes in the representation of the data, and naturally
captures non-linear relations. We apply this approach to different domains and
find that it consistently produces clusters that are more coherent than those
extracted by existing algorithms. Finally, our approach provides a way of
clustering based on collective notions of similarity rather than the
traditional pairwise measures.Comment: To appear in Proceedings of the National Academy of Sciences USA, 11
pages, 9 figure
- …