2,937 research outputs found

    Constraint-based Sequential Pattern Mining with Decision Diagrams

    Full text link
    Constrained sequential pattern mining aims at identifying frequent patterns on a sequential database of items while observing constraints defined over the item attributes. We introduce novel techniques for constraint-based sequential pattern mining that rely on a multi-valued decision diagram representation of the database. Specifically, our representation can accommodate multiple item attributes and various constraint types, including a number of non-monotone constraints. To evaluate the applicability of our approach, we develop an MDD-based prefix-projection algorithm and compare its performance against a typical generate-and-check variant, as well as a state-of-the-art constraint-based sequential pattern mining algorithm. Results show that our approach is competitive with or superior to these other methods in terms of scalability and efficiency.Comment: AAAI201

    Extending the state-of-the-art of constraint-based pattern discovery, In:

    Get PDF
    Abstract The constraint-based pattern discovery paradigm was introduced with the aim of providing to the user a tool to drive the discovery process towards potentially interesting patterns, with the positive side effect of achieving a more efficient computation. In this paper we review and extend the state-of-the-art of the constraints that can be pushed in a frequent pattern computation. We introduce novel data reduction techniques which are able to exploit convertible anti-monotone constraints (e.g., constraints on average or median) as well as tougher constraints (e.g., constraints on variance or standard deviation). A thorough experimental study is performed and it confirms that our framework outperforms previous algorithms for convertible constraints, and exploit the tougher ones with the same effectiveness. Finally, we highlight that the main advantage of our approach, i.e., pushing constraints by means of data reduction in a level-wise framework, is that different properties of different constraints can be exploited all together, and the total benefit is always greater than the sum of the individual benefits. This consideration leads to the definition of a general Apriori-like algorithm which is able to exploit all possible kinds of constraints studied so far

    gPrune: A Constraint Pushing Framework for Graph Pattern Mining

    Get PDF
    Abstract. In graph mining applications, there has been an increasingly strong urge for imposing user-specified constraints on the mining results. However, unlike most traditional itemset constraints, structural constraints, such as density and diameter of a graph, are very hard to be pushed deep into the mining process. In this paper, we give the first comprehensive study on the pruning properties of both traditional and structural constraints aiming to reduce not only the pattern search space but the data search space as well. A new general framework, called gPrune, is proposed to incorporate all the constraints in such a way that they recursively reinforce each other through the entire mining process. A new concept, Pattern-inseparable Data-antimonotonicity, is proposed to handle the structural constraints unique in the context of graph, which, combined with known pruning properties, provides a comprehensive and unified classification framework for structural constraints. The exploration of these antimonotonicities in the context of graph pattern mining is a significant extension to the known classification of constraints, and deepens our understanding of the pruning properties of structural graph constraints.

    Feature Extraction and Duplicate Detection for Text Mining: A Survey

    Get PDF
    Text mining, also known as Intelligent Text Analysis is an important research area. It is very difficult to focus on the most appropriate information due to the high dimensionality of data. Feature Extraction is one of the important techniques in data reduction to discover the most important features. Proce- ssing massive amount of data stored in a unstructured form is a challenging task. Several pre-processing methods and algo- rithms are needed to extract useful features from huge amount of data. The survey covers different text summarization, classi- fication, clustering methods to discover useful features and also discovering query facets which are multiple groups of words or phrases that explain and summarize the content covered by a query thereby reducing time taken by the user. Dealing with collection of text documents, it is also very important to filter out duplicate data. Once duplicates are deleted, it is recommended to replace the removed duplicates. Hence we also review the literature on duplicate detection and data fusion (remove and replace duplicates).The survey provides existing text mining techniques to extract relevant features, detect duplicates and to replace the duplicate data to get fine grained knowledge to the user

    A Constraint-based Querying System for Exploratory Pattern Discovery

    Get PDF
    In this article we present CONQUEST, a constraint-based querying system able to support the intrinsically exploratory (i.e., human-guided, interactive and iterative) nature of pattern discovery. Following the inductive database vision, our framework provides users with an expressive constraint-based query language, which allows the discovery process to be effectively driven toward potentially interesting patterns. Such constraints are also exploited to reduce the cost of pattern mining computation. CONQUEST is a comprehensive mining system that can access real-world relational databases from which to extract data. Through the interaction with a friendly graphical user interface (GUI), the user can define complex mining queries by means of few clicks. After a pre-processing step, mining queries are answered by an efficient and robust pattern mining engine which entails the state-of-the-art of data and search space reduction techniques. Resulting patterns are then presented to the user in a pattern browsing window, and possibly stored back in the underlying database as relations

    Fouille de règles d'annotation pour la reconnaissance d'entités nommées

    Get PDF
    National audienceComme pour de nombreuses autres problématiques TAL, la reconnaissance d'entités nommées met en jeu aussi bien des systèmes à base de connaissances que des systèmes guidés par les données. Dans cet article, nous proposons une approche médiane par l'adaptation de méthodes issues de l'extraction de connaissances. Notre système, mXS, intègre des techniques de fouille séquentielle hiérarchique pour la détection des entités nommées. Le système adopte une démarche centrée sur les données pour extraire des motifs symboliques. Il repose par ailleurs sur une stratégie originale qui consiste à rechercher séparément le début et la fin des entités. Cette approche présente l'intérêt de conserver une certaine robustesse par rapport aux bruit et disfluences. Elle est adaptée au cadre applicatif visé par le système : la détection d'entités nommées au sein de flux de parole conversationnelle transcrite automatiquement. À ce titre, mXS a participé à la campagne d'évaluation ETAPE où il a présenté de bons résultats. Cet article présente le fonctionnement de mXS et ses performances sur les jeux de données issus de deux campagnes d'évaluation francophones (ESTER 2 et ETAPE)

    ConQueSt: a Constraint-based Querying System for Exploratory Pattern Discovery

    Get PDF
    Il contributo di questa tesi è il disegno e lo sviluppo di un sistema di Knoledge Discovery denominato ConQueSt. Basato sul paradigma del Pattern Discovery guidato dai vincoli, ConQueSt segue la visione dell’Inductive Database: • il mining è visto come forma più complessa di querying, • il sistema quindi è equipaggiato con un data mining query language, e strettamente collegato con un DBMS • i pattern estratti con query di mining diventano cittadini di prima classe e, seguendo il principio di chiusura, vengono materializzati accanto ai dati nel DBMS. ConQueSt è già stato presentato con successo al workshop internazionale della comunità IDB, e alla prestigiosa conferenza IEEE International Conference on Data Mining Engineering (ICDE 2006). A giugno sarà presentato alla conferenaz italiana di basi di dati (SEBD 2006). E’ attualmente in corso la sottomissione ad una prestigiosa rivista

    WHO SETS THE AGENDA? ANALYZING ATTENTION DYNAMICS OF ECONOMIC DIVERSIFICATION AND VIOLENT CRIME ISSUES IN CANADA AND AUSTRALIA OVER THE PERIOD OF 2008-2015

    Get PDF
    The thesis puts the key question – who sets the agenda of two policy issues, economic diversification and violent crime, in Canada and Australia? This question remains critical in current scholarly debates. Among the major actors, media seems to exert predominant influence, though the public has grown in influence with emergence of internet. Finally, academia and think tanks are also found to exert agenda-setting influence for some issues, often socially controversial issues and those with scientific uncertainty. This research analyzes the contexts of Canada and Australia for two policy issues – economic diversification and violent crime – over the period from 2008 to 201
    corecore