249,022 research outputs found

    A novel clustering methodology based on modularity optimisation for detecting authorship affinities in Shakespearean era plays

    Full text link
    © 2016 Naeni et al. This is an open access article distributed under the terms of the Creative Commons Attribution License, which permits unrestricted use, distribution, and reproduction in any medium, provided the original author and source are credited. In this study we propose a novel, unsupervised clustering methodology for analyzing large datasets. This new, efficient methodology converts the general clustering problem into the community detection problem in graph by using the Jensen-Shannon distance, a dissimilarity measure originating in Information Theory. Moreover, we use graph theoretic concepts for the generation and analysis of proximity graphs. Our methodology is based on a newly proposed memetic algorithm (iMA-Net) for discovering clusters of data elements by maximizing the modularity function in proximity graphs of literary works. To test the effectiveness of this general methodology, we apply it to a text corpus dataset, which contains frequencies of approximately 55,114 unique words across all 168 written in the Shakespearean era (16th and 17th centuries), to analyze and detect clusters of similar plays. Experimental results and comparison with state-of-the-art clustering methods demonstrate the remarkable performance of our new method for identifying high quality clusters which reflect the commonalities in the literary style of the plays

    Graph ambiguity

    Get PDF
    In this paper, we propose a rigorous way to define the concept of ambiguity in the domain of graphs. In past studies, the classical definition of ambiguity has been derived starting from fuzzy set and fuzzy information theories. Our aim is to show that also in the domain of the graphs it is possible to derive a formulation able to capture the same semantic and mathematical concept. To strengthen the theoretical results, we discuss the application of the graph ambiguity concept to the graph classification setting, conceiving a new kind of inexact graph matching procedure. The results prove that the graph ambiguity concept is a characterizing and discriminative property of graphs. (C) 2013 Elsevier B.V. All rights reserved

    Spanning Trees and bootstrap reliability estimation in correlation based networks

    Get PDF
    We introduce a new technique to associate a spanning tree to the average linkage cluster analysis. We term this tree as the Average Linkage Minimum Spanning Tree. We also introduce a technique to associate a value of reliability to links of correlation based graphs by using bootstrap replicas of data. Both techniques are applied to the portfolio of the 300 most capitalized stocks traded at New York Stock Exchange during the time period 2001-2003. We show that the Average Linkage Minimum Spanning Tree recognizes economic sectors and sub-sectors as communities in the network slightly better than the Minimum Spanning Tree does. We also show that the average reliability of links in the Minimum Spanning Tree is slightly greater than the average reliability of links in the Average Linkage Minimum Spanning Tree.Comment: 17 pages, 3 figure

    A comparative study of the AHP and TOPSIS methods for implementing load shedding scheme in a pulp mill system

    Get PDF
    The advancement of technology had encouraged mankind to design and create useful equipment and devices. These equipment enable users to fully utilize them in various applications. Pulp mill is one of the heavy industries that consumes large amount of electricity in its production. Due to this, any malfunction of the equipment might cause mass losses to the company. In particular, the breakdown of the generator would cause other generators to be overloaded. In the meantime, the subsequence loads will be shed until the generators are sufficient to provide the power to other loads. Once the fault had been fixed, the load shedding scheme can be deactivated. Thus, load shedding scheme is the best way in handling such condition. Selected load will be shed under this scheme in order to protect the generators from being damaged. Multi Criteria Decision Making (MCDM) can be applied in determination of the load shedding scheme in the electric power system. In this thesis two methods which are Analytic Hierarchy Process (AHP) and Technique for Order Preference by Similarity to Ideal Solution (TOPSIS) were introduced and applied. From this thesis, a series of analyses are conducted and the results are determined. Among these two methods which are AHP and TOPSIS, the results shown that TOPSIS is the best Multi criteria Decision Making (MCDM) for load shedding scheme in the pulp mill system. TOPSIS is the most effective solution because of the highest percentage effectiveness of load shedding between these two methods. The results of the AHP and TOPSIS analysis to the pulp mill system are very promising

    Information based clustering

    Full text link
    In an age of increasingly large data sets, investigators in many different disciplines have turned to clustering as a tool for data analysis and exploration. Existing clustering methods, however, typically depend on several nontrivial assumptions about the structure of data. Here we reformulate the clustering problem from an information theoretic perspective which avoids many of these assumptions. In particular, our formulation obviates the need for defining a cluster "prototype", does not require an a priori similarity metric, is invariant to changes in the representation of the data, and naturally captures non-linear relations. We apply this approach to different domains and find that it consistently produces clusters that are more coherent than those extracted by existing algorithms. Finally, our approach provides a way of clustering based on collective notions of similarity rather than the traditional pairwise measures.Comment: To appear in Proceedings of the National Academy of Sciences USA, 11 pages, 9 figure
    • …
    corecore