16,177 research outputs found

    Pattern Set Mining with Schema-based Constraint

    Get PDF
    Pattern set mining entails discovering groups of frequent itemsets that represent potentially relevant knowledge. Global constraints are commonly enforced to focus the analysis on most interesting pattern sets. However, these constraints evaluate and select each pattern set individually based on its itemset characteristics. This paper extends traditional global constraints by proposing a novel constraint, called schema-based constraint, tailored to relational data. When coping with relational data itemsets consist of sets of items belonging to distinct data attributes, which constitute the itemset schema. The schema-based constraint allows us to effectively combine all the itemsets that are semantically correlated with each other into a unique pattern set, while filtering out those pattern sets covering a mixture of different data facets or giving a partial view of a single facet. Specifically, it selects all the pattern sets that are (i) composed only of frequent itemsets with the same schema and (ii) characterized by maximal size among those corresponding to that schema. Since existing approaches are unable to select one representative pattern set per schema in a single extraction, we propose a new Apriori-based algorithm to efficiently mine pattern sets satisfying the schema-based constraint. The experimental results achieved on both real and synthetic datasets demonstrate the efficiency and effectiveness of our approach

    Direct mining of subjectively interesting relational patterns

    Get PDF
    Data is typically complex and relational. Therefore, the development of relational data mining methods is an increasingly active topic of research. Recent work has resulted in new formalisations of patterns in relational data and in a way to quantify their interestingness in a subjective manner, taking into account the data analyst's prior beliefs about the data. Yet, a scalable algorithm to find such most interesting patterns is lacking. We introduce a new algorithm based on two notions: (1) the use of Constraint Programming, which results in a notably shorter development time, faster runtimes, and more flexibility for extensions such as branch-and-bound search, and (2), the direct search for the most interesting patterns only, instead of exhaustive enumeration of patterns before ranking them. Through empirical evaluation, we find that our novel bounds yield speedups up to several orders of magnitude, especially on dense data with a simple schema. This makes it possible to mine the most subjectively-interesting relational patterns present in databases where this was previously impractical or impossible

    XML Schema Clustering with Semantic and Hierarchical Similarity Measures

    Get PDF
    With the growing popularity of XML as the data representation language, collections of the XML data are exploded in numbers. The methods are required to manage and discover the useful information from them for the improved document handling. We present a schema clustering process by organising the heterogeneous XML schemas into various groups. The methodology considers not only the linguistic and the context of the elements but also the hierarchical structural similarity. We support our findings with experiments and analysis

    A Progressive Clustering Algorithm to Group the XML Data by Structural and Semantic Similarity

    Get PDF
    Since the emergence in the popularity of XML for data representation and exchange over the Web, the distribution of XML documents has rapidly increased. It has become a challenge for researchers to turn these documents into a more useful information utility. In this paper, we introduce a novel clustering algorithm PCXSS that keeps the heterogeneous XML documents into various groups according to their similar structural and semantic representations. We develop a global criterion function CPSim that progressively measures the similarity between a XML document and existing clusters, ignoring the need to compute the similarity between two individual documents. The experimental analysis shows the method to be fast and accurate

    Constraining the Search Space in Temporal Pattern Mining

    Get PDF
    Agents in dynamic environments have to deal with complex situations including various temporal interrelations of actions and events. Discovering frequent patterns in such scenes can be useful in order to create prediction rules which can be used to predict future activities or situations. We present the algorithm MiTemP which learns frequent patterns based on a time intervalbased relational representation. Additionally the problem has also been transfered to a pure relational association rule mining task which can be handled by WARMR. The two approaches are compared in a number of experiments. The experiments show the advantage of avoiding the creation of impossible or redundant patterns with MiTemP. While less patterns have to be explored on average with MiTemP more frequent patterns are found at an earlier refinement level

    Ontology of core data mining entities

    Get PDF
    In this article, we present OntoDM-core, an ontology of core data mining entities. OntoDM-core defines themost essential datamining entities in a three-layered ontological structure comprising of a specification, an implementation and an application layer. It provides a representational framework for the description of mining structured data, and in addition provides taxonomies of datasets, data mining tasks, generalizations, data mining algorithms and constraints, based on the type of data. OntoDM-core is designed to support a wide range of applications/use cases, such as semantic annotation of data mining algorithms, datasets and results; annotation of QSAR studies in the context of drug discovery investigations; and disambiguation of terms in text mining. The ontology has been thoroughly assessed following the practices in ontology engineering, is fully interoperable with many domain resources and is easy to extend

    Change Support in Process-Aware Information Systems - A Pattern-Based Analysis

    Get PDF
    In today's dynamic business world the economic success of an enterprise increasingly depends on its ability to react to changes in its environment in a quick and flexible way. Process-aware information systems (PAIS) offer promising perspectives in this respect and are increasingly employed for operationally supporting business processes. To provide effective business process support, flexible PAIS are needed which do not freeze existing business processes, but allow for loosely specified processes, which can be detailed during run-time. In addition, PAIS should enable authorized users to flexibly deviate from the predefined processes if required (e.g., by allowing them to dynamically add, delete, or move process activities) and to evolve business processes over time. At the same time PAIS must ensure consistency and robustness. The emergence of different process support paradigms and the lack of methods for comparing existing change approaches have made it difficult for PAIS engineers to choose the adequate technology. In this paper we suggest a set of changes patterns and change support features to foster the systematic comparison of existing process management technology with respect to process change support. Based on these change patterns and features, we provide a detailed analysis and evaluation of selected systems from both academia and industry. The identified change patterns and change support features facilitate the comparison of change support frameworks, and consequently will support PAIS engineers in selecting the right technology for realizing flexible PAIS. In addition, this work can be used as a reference for implementing more flexible PAIS
    corecore