2,665 research outputs found
Discovering Knowledge from Local Patterns with Global Constraints
It is well known that local patterns are at the core of a lot of
knowledge which may be discovered from data. Nevertheless, use of local
patterns is limited by
their huge number and computational costs. Several approaches (e.g.,
condensed representations, pattern set discovery) aim at grouping or
synthesizing local patterns to provide a global view of the data. A
global pattern is a pattern which is a set or a synthesis of local
patterns coming from the data. In this paper, we propose the idea of
global constraints to write queries addressing global patterns. A key
point is the ability to bias the designing of global patterns according
to the expectation of the user. For instance, a global pattern can be
oriented towards the search of exceptions or a clustering. It requires
to write queries taking into account such biases. Open issues are to
design a generic framework to express powerful global constraints and
solvers to mine them. We think that global constraints are a promising
way to discover relevant global patterns
Learning what matters - Sampling interesting patterns
In the field of exploratory data mining, local structure in data can be
described by patterns and discovered by mining algorithms. Although many
solutions have been proposed to address the redundancy problems in pattern
mining, most of them either provide succinct pattern sets or take the interests
of the user into account-but not both. Consequently, the analyst has to invest
substantial effort in identifying those patterns that are relevant to her
specific interests and goals. To address this problem, we propose a novel
approach that combines pattern sampling with interactive data mining. In
particular, we introduce the LetSIP algorithm, which builds upon recent
advances in 1) weighted sampling in SAT and 2) learning to rank in interactive
pattern mining. Specifically, it exploits user feedback to directly learn the
parameters of the sampling distribution that represents the user's interests.
We compare the performance of the proposed algorithm to the state-of-the-art in
interactive pattern mining by emulating the interests of a user. The resulting
system allows efficient and interleaved learning and sampling, thus
user-specific anytime data exploration. Finally, LetSIP demonstrates favourable
trade-offs concerning both quality-diversity and exploitation-exploration when
compared to existing methods.Comment: PAKDD 2017, extended versio
Knowledge data discovery and data mining in a design environment
Designers, in the process of satisfying design requirements, generally encounter difficulties in, firstly, understanding the problem and secondly, finding a solution [Cross 1998]. Often the process of understanding the problem and developing a feasible solution are developed simultaneously by proposing a solution to gauge the extent to which the solution satisfies the specific requirements. Support for future design activities has long been recognised to exist in the form of past design cases, however the varying degrees of similarity and dissimilarity found between previous and current design requirements and solutions has restrained the effectiveness of utilising past design solutions. The knowledge embedded within past designs provides a source of experience with the potential to be utilised in future developments provided that the ability to structure and manipulate that knowledgecan be made a reality. The importance of providing the ability to manipulate past design knowledge, allows the ranging viewpoints experienced by a designer, during a design process, to be reflected and supported. Data Mining systems are gaining acceptance in several domains but to date remain largely unrecognised in terms of the potential to support design activities. It is the focus of this paper to introduce the functionality possessed within the realm of Data Mining tools, and to evaluate the level of support that may be achieved in manipulating and utilising experiential knowledge to satisfy designers' ranging perspectives throughout a product's development
Query Rewriting in Itemset Mining
Abstract. In recent years, researchers have begun to study inductive databases, a new generation of databases for leveraging decision support applications. In this context, the user interacts with the DBMS using advanced, constraint-based languages for data mining where constraints have been specifically introduced to increase the relevance of the results and, at the same time, to reduce its volume. In this paper we study the problem of mining frequent itemsets using an inductive database 1 . We propose a technique for query answering which consists in rewriting the query in terms of union and intersection of the result sets of other queries, previously executed and materialized. Unfortunately, the exploitation of past queries is not always applicable. We then present sufficient conditions for the optimization to apply and show that these conditions are strictly connected with the presence of functional dependencies between the attributes involved in the queries. We show some experiments on an initial prototype of an optimizer which demonstrates that this approach to query answering is not only viable but in many practical cases absolutely necessary since it reduces drastically the execution time
Mining Patterns in Networks using Homomorphism
In recent years many algorithms have been developed for finding patterns in
graphs and networks. A disadvantage of these algorithms is that they use
subgraph isomorphism to determine the support of a graph pattern; subgraph
isomorphism is a well-known NP complete problem. In this paper, we propose an
alternative approach which mines tree patterns in networks by using subgraph
homomorphism. The advantage of homomorphism is that it can be computed in
polynomial time, which allows us to develop an algorithm that mines tree
patterns in arbitrary graphs in incremental polynomial time. Homomorphism
however entails two problems not found when using isomorphism: (1) two patterns
of different size can be equivalent; (2) patterns of unbounded size can be
frequent. In this paper we formalize these problems and study solutions that
easily fit within our algorithm
07181 Abstracts Collection -- Parallel Universes and Local Patterns
From 1 May 2007 to 4 May 2007 the Dagstuhl Seminar 07181 ``Parallel
Universes and Local Patterns\u27\u27
was held in the International Conference and Research Center (IBFI),
Schloss Dagstuhl. During the seminar, several participants
presented their current research, and ongoing work and open problems
were discussed. Abstracts of the presentations given during the
seminar as well as abstracts of seminar results and ideas are put
together in this paper. The first section describes the seminar
topics and goals in general. Links to extended abstracts or full
papers are provided, if available
Probabilistic Inductive Querying Using ProbLog
We study how probabilistic reasoning and inductive querying can be combined within ProbLog, a recent probabilistic extension of Prolog. ProbLog can be regarded as a database system that supports both probabilistic and inductive reasoning through a variety of querying mechanisms. After a short introduction to ProbLog, we provide a survey of the different types of inductive queries that ProbLog supports, and show how it can be applied to the mining of large biological networks.Peer reviewe
- …