8 research outputs found

    Concept Learning from Triadic Data

    Get PDF
    AbstractWe propose extensions of the classical JSM-method and the Näıve Bayesian classifier for the case of triadic relational data. We performed a series of experiments on various types of data (both real and synthetic) to estimate quality of classification techniques and compare them with other classification algorithms that generate hypotheses, e.g. ID3 and Random Forest. In addition to classification precision and recall we also evaluated the time performance of the proposed methods

    Clustering Boolean Tensors

    Full text link
    Tensor factorizations are computationally hard problems, and in particular, are often significantly harder than their matrix counterparts. In case of Boolean tensor factorizations -- where the input tensor and all the factors are required to be binary and we use Boolean algebra -- much of that hardness comes from the possibility of overlapping components. Yet, in many applications we are perfectly happy to partition at least one of the modes. In this paper we investigate what consequences does this partitioning have on the computational complexity of the Boolean tensor factorizations and present a new algorithm for the resulting clustering problem. This algorithm can alternatively be seen as a particularly regularized clustering algorithm that can handle extremely high-dimensional observations. We analyse our algorithms with the goal of maximizing the similarity and argue that this is more meaningful than minimizing the dissimilarity. As a by-product we obtain a PTAS and an efficient 0.828-approximation algorithm for rank-1 binary factorizations. Our algorithm for Boolean tensor clustering achieves high scalability, high similarity, and good generalization to unseen data with both synthetic and real-world data sets

    Clustering {Boolean} Tensors

    Get PDF
    Tensor factorizations are computationally hard problems, and in particular, are often significantly harder than their matrix counterparts. In case of Boolean tensor factorizations -- where the input tensor and all the factors are required to be binary and we use Boolean algebra -- much of that hardness comes from the possibility of overlapping components. Yet, in many applications we are perfectly happy to partition at least one of the modes. In this paper we investigate what consequences does this partitioning have on the computational complexity of the Boolean tensor factorizations and present a new algorithm for the resulting clustering problem. This algorithm can alternatively be seen as a particularly regularized clustering algorithm that can handle extremely high-dimensional observations. We analyse our algorithms with the goal of maximizing the similarity and argue that this is more meaningful than minimizing the dissimilarity. As a by-product we obtain a PTAS and an efficient 0.828-approximation algorithm for rank-1 binary factorizations. Our algorithm for Boolean tensor clustering achieves high scalability, high similarity, and good generalization to unseen data with both synthetic and real-world data sets

    Proceedings of the ECAI Workshop on Formal Concept Analysis for Artificial Intelligence (FCA4AI)

    Get PDF
    International audienceFormal Concept Analysis (FCA) is aimed at data analysis and classification. FCA proposes various efficient tools for concept lattice design and visualization, and is related to many research fields and application domains, including several fields of Artificial Intelligence (AI), e.g. knowledge discovery, knowledge representation and reasoning. In recent years, a series of work emerged for extending the possibilities of FCA w.r.t. knowledge processing, e.g. pattern structures and relational context analysis. Such extensions should allow FCA to deal with complex data from the knowledge discovery and the knowledge representation points of view. Moreover, these extensions of the capabilities of FCA offer new possibilities for AI activities in the framework of FCA. Accordingly, this workshop will be interested in two main issues: (i) how can FCA support AI activities and especially knowledge processing and (ii) how can FCA be extended for solving new and complex problems in AI

    FCAIR 2012 Formal Concept Analysis Meets Information Retrieval Workshop co-located with the 35th European Conference on Information Retrieval (ECIR 2013) March 24, 2013, Moscow, Russia

    Get PDF
    International audienceFormal Concept Analysis (FCA) is a mathematically well-founded theory aimed at data analysis and classifiation. The area came into being in the early 1980s and has since then spawned over 10000 scientific publications and a variety of practically deployed tools. FCA allows one to build from a data table with objects in rows and attributes in columns a taxonomic data structure called concept lattice, which can be used for many purposes, especially for Knowledge Discovery and Information Retrieval. The Formal Concept Analysis Meets Information Retrieval (FCAIR) workshop collocated with the 35th European Conference on Information Retrieval (ECIR 2013) was intended, on the one hand, to attract researchers from FCA community to a broad discussion of FCA-based research on information retrieval, and, on the other hand, to promote ideas, models, and methods of FCA in the community of Information Retrieval

    Structural building blocks in graph data : characterised by hyperbolic communities and uncovered by Boolean tensor clustering

    Get PDF
    Graph data nowadays easily become so large that it is infeasible to study the underlying structures manually. Thus, computational methods are needed to uncover large-scale structural information. In this thesis, we present methods to understand and summarise large networks. We propose the hyperbolic community model to describe groups of more densely connected nodes within networks using very intuitive parameters. The model accounts for a frequent connectivity pattern in real data: a few community members are highly interconnected; most members mainly have ties to this core. Our model fits real data much better than previously-proposed models. Our corresponding random graph generator, HyGen, creates graphs with realistic intra-community structure. Using the hyperbolic model, we conduct a large-scale study of the temporal evolution of communities on online question–answer sites. We observe that the user activity within a community is constant with respect to its size throughout its lifetime, and a small group of users is responsible for the majority of the social interactions. We propose an approach for Boolean tensor clustering. This special tensor factorisation is restricted to binary data and assumes that one of the tensor directions has only non-overlapping factors. These assumptions – valid for many real-world data, in particular time-evolving networks – enable the use of bitwise operators and lift much of the computational complexity from the task.Netzwerke sind heutzutage oft so groß und unübersichtlich, dass manuelle Analysen nicht reichen, um sie zu verstehen. Um zugrundeliegende Strukturen im großen Maßstab zu identifizieren, bedarf es computergestützter Methoden. Unser Modell für hyperbolische Gemeinschaften beschreibt die innere Struktur eng verknüpfter Knotengruppen in Netzwerken mit sehr eingängigen Parametern. Es basiert auf der Beobachtung, dass oft ein kleiner Teil der Knoten einer Gruppe eng miteinander verknüpft ist und die Mehrheit der Gruppenmitglieder nur Verbindungen zu diesem Zentrum aufweist. Unser Modell bildet echte Daten besser ab als bisherige Modelle. Der entsprechende Zufallsgraphgenerator, HyGen, erzeugt Graphen mit realistischen innergemeinschaftlichen Strukturen. Anhand unseres Modells analysieren wir die Bildung von Gemeinschaften in online Frage-und-Antwort-Netzwerken. Wir beobachten, dass die Aktivität der Mitglieder über die Zeit konstant ist, bezogen auf die Größe der jeweiligen Gemeinschaft. Außerdem ist stets eine kleine Gruppe von Mitgliedern verantwortlich für den Großteil der Aktivität. Wir schlagen eine Methode für Boolesches Tensor Clustering vor. Diese spezielle Tensorfaktorisierung ist beschränkt auf binäre Daten und wir nehmen an, dass es entlang einer Richtung des Tensors keinen nennenswerten Überlapp der Faktoren gibt. Diese Annahmen ermöglichen die Nutzung von Bitoperationen, mindern den Rechenaufwand erheblich und passen gut zu dem, was in echten Daten zu beobachten ist.Max-Planck-Institut für Informati
    corecore