5,082 research outputs found
Expressive generalized itemsets
Generalized itemset mining is a powerful tool to discover multiple-level correlations among the analyzed data. A taxonomy is used to aggregate data items into higher-level concepts and to discover frequent recurrences among data items at different granularity levels. However, since traditional high-level itemsets may also represent the knowledge covered by their lower-level frequent descendant itemsets, the expressiveness of high-level itemsets can be rather limited. To overcome this issue, this article proposes two novel itemset types, called Expressive Generalized Itemset (EGI) and Maximal Expressive Generalized Itemset (Max-EGI), in which the frequency of occurrence of a high-level itemset is evaluated only on the portion of data not yet covered by any of its frequent descendants. Specifically, EGI s represent, at a high level of abstraction, the knowledge associated with sets of infrequent itemsets, while Max-EGIs compactly represent all the infrequent descendants of a generalized itemset. Furthermore, we also propose an algorithm to discover Max-EGIs at the top of the traditionally mined itemsets. Experiments, performed on both real and synthetic datasets, demonstrate the effectiveness, efficiency, and scalability of the proposed approac
Data mining by means of generalized patterns
The thesis is mainly focused on the study and the application of pattern discovery algorithms that aggregate database knowledge to discover and exploit valuable correlations, hidden in the analyzed data, at different abstraction levels. The aim of the research effort described in this work is two-fold: the discovery of associations, in the form of generalized patterns, from large data collections and the inference of semantic models, i.e., taxonomies and ontologies, suitable for driving the mining proces
ADBSCAN: Adaptive Density-Based Spatial Clustering of Applications with Noise for Identifying Clusters with Varying Densities
Density-based spatial clustering of applications with noise (DBSCAN) is a
data clustering algorithm which has the high-performance rate for dataset where
clusters have the constant density of data points. One of the significant
attributes of this algorithm is noise cancellation. However, DBSCAN
demonstrates reduced performances for clusters with different densities.
Therefore, in this paper, an adaptive DBSCAN is proposed which can work
significantly well for identifying clusters with varying densities.Comment: To be published in the 4th IEEE International Conference on
Electrical Engineering and Information & Communication Technology (iCEEiCT
2018
On the Representation and Use of Semantic Categories: A Survey and Prospectus
This report describes research conducted at the Artificial Intelligence Laboratory of the Massachusetts Institute of Technology. Support for the Laboratory's artificial intelligence research is provided in part by the Advanced Research Projects Agency of the Department of Defense under Office of Naval Research contract number N00014-75-C-0643.This paper is intended as a brief introduction to several issues concerning semantic categories. These are the everyday, factual groupings of world knowledge according to some similarity in characteristics. Some psychological data concerning the structure, formation, and use of categories is surveyed. Then several psychological models (set-theoretic and network) are considered. Various artificial intelligence representations (concerning the symbol mapping and recognition problems) dealing with similar issues are also reviewed. It is argued that these data and representations approach semantic categories at too abstract a level and a set of guidelines which may be helpful in constructing a microworld are given.MIT Artificial Intelligence Laboratory
Department of Defense Advanced Research Projects Agenc
Exploring Data Hierarchies to Discover Knowledge in Different Domains
L'abstract è presente nell'allegato / the abstract is in the attachmen
Recommended from our members
Machine learning : techniques and foundations
The field of machine learning studies computational methods for acquiring new knowledge, new skills, and new ways to organize existing knowledge. In this paper we present some of the basic techniques and principles that underlie AI research on learning, including methods for learning from examples, learning in problem solving, learning by analogy, grammar acquisition, and machine discovery. In each case, we illustrate the techniques with paradigmatic examples
Making RBAC Work in Dynamic, Fast-Changing Corporate Environments
In large organizations with tens of thousands of employees, managing individual people\u27s permissions is tedious and error prone, and thus a possible source of security risks. Role-Based Access Control addresses this problem by grouping users into roles, which reflect job functions in the corporation. Permissions are assigned to roles instead of directly to users, which means that all users assigned to a role have the same set of permissions with respect to that role. However, adoption of RBAC in organizations such as investment banks is hindered by two main factors: first, it is costly and time-consuming to define roles. Second, there are certain job functions (such as consultant) that cannot be expressed as RBAC roles, because their users need to have different permission sets. The topic of this thesis is to investigate whether roles can be applied to domains that exhibit the peculiarities of the investment bank example. We introduce a new framework for roles that allows us to separately represent what the role means as a job function, and what permissions its individual users have. That way we maintain the key property of RBAC - that the number of roles is small, while allowing for variations among users. We have also investigated machine learning approaches in order to figure out whether roles are concepts that can be learned or approximated by a function. We present our findings that certain learning schemes, such as Probably Approximately Correct (PAC) earning and Instance-based learning are not applicable to roles, while others - such as decision-tree learning, might be useful
Relational clustering models for knowledge discovery and recommender systems
Cluster analysis is a fundamental research field in Knowledge Discovery and Data Mining
(KDD). It aims at partitioning a given dataset into some homogeneous clusters so as
to reflect the natural hidden data structure. Various heuristic or statistical approaches
have been developed for analyzing propositional datasets. Nevertheless, in relational
clustering the existence of multi-type relationships will greatly degrade the performance
of traditional clustering algorithms. This issue motivates us to find more effective algorithms
to conduct the cluster analysis upon relational datasets. In this thesis we
comprehensively study the idea of Representative Objects for approximating data distribution
and then design a multi-phase clustering framework for analyzing relational
datasets with high effectiveness and efficiency.
The second task considered in this thesis is to provide some better data models for
people as well as machines to browse and navigate a dataset. The hierarchical taxonomy
is widely used for this purpose. Compared with manually created taxonomies, automatically
derived ones are more appealing because of their low creation/maintenance cost
and high scalability. Up to now, the taxonomy generation techniques are mainly used
to organize document corpus. We investigate the possibility of utilizing them upon relational
datasets and then propose some algorithmic improvements. Another non-trivial
problem is how to assign suitable labels for the taxonomic nodes so as to credibly summarize
the content of each node. Unfortunately, this field has not been investigated
sufficiently to the best of our knowledge, and so we attempt to fill the gap by proposing
some novel approaches.
The final goal of our cluster analysis and taxonomy generation techniques is
to improve the scalability of recommender systems that are developed to tackle the
problem of information overload. Recent research in recommender systems integrates
the exploitation of domain knowledge to improve the recommendation quality, which
however reduces the scalability of the whole system at the same time. We address this
issue by applying the automatically derived taxonomy to preserve the pair-wise similarities
between items, and then modeling the user visits by another hierarchical structure.
Experimental results show that the computational complexity of the recommendation
procedure can be greatly reduced and thus the system scalability be improved
- …