5,082 research outputs found

    Generalized pattern extraction from concept lattices

    Get PDF

    Expressive generalized itemsets

    Get PDF
    Generalized itemset mining is a powerful tool to discover multiple-level correlations among the analyzed data. A taxonomy is used to aggregate data items into higher-level concepts and to discover frequent recurrences among data items at different granularity levels. However, since traditional high-level itemsets may also represent the knowledge covered by their lower-level frequent descendant itemsets, the expressiveness of high-level itemsets can be rather limited. To overcome this issue, this article proposes two novel itemset types, called Expressive Generalized Itemset (EGI) and Maximal Expressive Generalized Itemset (Max-EGI), in which the frequency of occurrence of a high-level itemset is evaluated only on the portion of data not yet covered by any of its frequent descendants. Specifically, EGI s represent, at a high level of abstraction, the knowledge associated with sets of infrequent itemsets, while Max-EGIs compactly represent all the infrequent descendants of a generalized itemset. Furthermore, we also propose an algorithm to discover Max-EGIs at the top of the traditionally mined itemsets. Experiments, performed on both real and synthetic datasets, demonstrate the effectiveness, efficiency, and scalability of the proposed approac

    Data mining by means of generalized patterns

    Get PDF
    The thesis is mainly focused on the study and the application of pattern discovery algorithms that aggregate database knowledge to discover and exploit valuable correlations, hidden in the analyzed data, at different abstraction levels. The aim of the research effort described in this work is two-fold: the discovery of associations, in the form of generalized patterns, from large data collections and the inference of semantic models, i.e., taxonomies and ontologies, suitable for driving the mining proces

    ADBSCAN: Adaptive Density-Based Spatial Clustering of Applications with Noise for Identifying Clusters with Varying Densities

    Full text link
    Density-based spatial clustering of applications with noise (DBSCAN) is a data clustering algorithm which has the high-performance rate for dataset where clusters have the constant density of data points. One of the significant attributes of this algorithm is noise cancellation. However, DBSCAN demonstrates reduced performances for clusters with different densities. Therefore, in this paper, an adaptive DBSCAN is proposed which can work significantly well for identifying clusters with varying densities.Comment: To be published in the 4th IEEE International Conference on Electrical Engineering and Information & Communication Technology (iCEEiCT 2018

    On the Representation and Use of Semantic Categories: A Survey and Prospectus

    Get PDF
    This report describes research conducted at the Artificial Intelligence Laboratory of the Massachusetts Institute of Technology. Support for the Laboratory's artificial intelligence research is provided in part by the Advanced Research Projects Agency of the Department of Defense under Office of Naval Research contract number N00014-75-C-0643.This paper is intended as a brief introduction to several issues concerning semantic categories. These are the everyday, factual groupings of world knowledge according to some similarity in characteristics. Some psychological data concerning the structure, formation, and use of categories is surveyed. Then several psychological models (set-theoretic and network) are considered. Various artificial intelligence representations (concerning the symbol mapping and recognition problems) dealing with similar issues are also reviewed. It is argued that these data and representations approach semantic categories at too abstract a level and a set of guidelines which may be helpful in constructing a microworld are given.MIT Artificial Intelligence Laboratory Department of Defense Advanced Research Projects Agenc

    Exploring Data Hierarchies to Discover Knowledge in Different Domains

    Get PDF
    L'abstract è presente nell'allegato / the abstract is in the attachmen

    Making RBAC Work in Dynamic, Fast-Changing Corporate Environments

    Get PDF
    In large organizations with tens of thousands of employees, managing individual people\u27s permissions is tedious and error prone, and thus a possible source of security risks. Role-Based Access Control addresses this problem by grouping users into roles, which reflect job functions in the corporation. Permissions are assigned to roles instead of directly to users, which means that all users assigned to a role have the same set of permissions with respect to that role. However, adoption of RBAC in organizations such as investment banks is hindered by two main factors: first, it is costly and time-consuming to define roles. Second, there are certain job functions (such as consultant) that cannot be expressed as RBAC roles, because their users need to have different permission sets. The topic of this thesis is to investigate whether roles can be applied to domains that exhibit the peculiarities of the investment bank example. We introduce a new framework for roles that allows us to separately represent what the role means as a job function, and what permissions its individual users have. That way we maintain the key property of RBAC - that the number of roles is small, while allowing for variations among users. We have also investigated machine learning approaches in order to figure out whether roles are concepts that can be learned or approximated by a function. We present our findings that certain learning schemes, such as Probably Approximately Correct (PAC) earning and Instance-based learning are not applicable to roles, while others - such as decision-tree learning, might be useful

    Relational clustering models for knowledge discovery and recommender systems

    Get PDF
    Cluster analysis is a fundamental research field in Knowledge Discovery and Data Mining (KDD). It aims at partitioning a given dataset into some homogeneous clusters so as to reflect the natural hidden data structure. Various heuristic or statistical approaches have been developed for analyzing propositional datasets. Nevertheless, in relational clustering the existence of multi-type relationships will greatly degrade the performance of traditional clustering algorithms. This issue motivates us to find more effective algorithms to conduct the cluster analysis upon relational datasets. In this thesis we comprehensively study the idea of Representative Objects for approximating data distribution and then design a multi-phase clustering framework for analyzing relational datasets with high effectiveness and efficiency. The second task considered in this thesis is to provide some better data models for people as well as machines to browse and navigate a dataset. The hierarchical taxonomy is widely used for this purpose. Compared with manually created taxonomies, automatically derived ones are more appealing because of their low creation/maintenance cost and high scalability. Up to now, the taxonomy generation techniques are mainly used to organize document corpus. We investigate the possibility of utilizing them upon relational datasets and then propose some algorithmic improvements. Another non-trivial problem is how to assign suitable labels for the taxonomic nodes so as to credibly summarize the content of each node. Unfortunately, this field has not been investigated sufficiently to the best of our knowledge, and so we attempt to fill the gap by proposing some novel approaches. The final goal of our cluster analysis and taxonomy generation techniques is to improve the scalability of recommender systems that are developed to tackle the problem of information overload. Recent research in recommender systems integrates the exploitation of domain knowledge to improve the recommendation quality, which however reduces the scalability of the whole system at the same time. We address this issue by applying the automatically derived taxonomy to preserve the pair-wise similarities between items, and then modeling the user visits by another hierarchical structure. Experimental results show that the computational complexity of the recommendation procedure can be greatly reduced and thus the system scalability be improved
    • …
    corecore