Search CORE

36,934 research outputs found

Mining Representative Frequent Patterns in a Hierarchy of Contexts

Author: B. Bringmann
H. Mannila
J. Han
J. Rabatel
Publication venue: 'Springer Science and Business Media LLC'
Publication date: 01/01/2014
Field of study

On mining complex sequential data by means of FCA and pattern structures

Author: Buzmakov Aleksey
Egho Elias
Jay Nicolas
Kuznetsov Sergei O.
Napoli Amedeo
Raïssi Chedy
Publication venue
Publication date: 09/04/2015
Field of study

Nowadays data sets are available in very complex and heterogeneous ways. Mining of such data collections is essential to support many real-world applications ranging from healthcare to marketing. In this work, we focus on the analysis of "complex" sequential data by means of interesting sequential patterns. We approach the problem using the elegant mathematical framework of Formal Concept Analysis (FCA) and its extension based on "pattern structures". Pattern structures are used for mining complex data (such as sequences or graphs) and are based on a subsumption operation, which in our case is defined with respect to the partial order on sequences. We show how pattern structures along with projections (i.e., a data reduction of sequential structures), are able to enumerate more meaningful patterns and increase the computing efficiency of the approach. Finally, we show the applicability of the presented method for discovering and analyzing interesting patient patterns from a French healthcare data set on cancer. The quantitative and qualitative results (with annotations and analysis from a physician) are reported in this use case which is the main motivation for this work. Keywords: data mining; formal concept analysis; pattern structures; projections; sequences; sequential data.Comment: An accepted publication in International Journal of General Systems. The paper is created in the wake of the conference on Concept Lattice and their Applications (CLA'2013). 27 pages, 9 figures, 3 table

arXiv.org e-Print Archive

INRIA a CCSD electronic archive server

Context Trees: Augmenting Geospatial Trajectories with Context

Author: Griffiths Nathan
Sanchez Victor
Thomason Alasdair
Publication venue: 'Association for Computing Machinery (ACM)'
Publication date: 14/06/2016
Field of study

Exposing latent knowledge in geospatial trajectories has the potential to provide a better understanding of the movements of individuals and groups. Motivated by such a desire, this work presents the context tree, a new hierarchical data structure that summarises the context behind user actions in a single model. We propose a method for context tree construction that augments geospatial trajectories with land usage data to identify such contexts. Through evaluation of the construction method and analysis of the properties of generated context trees, we demonstrate the foundation for understanding and modelling behaviour afforded. Summarising user contexts into a single data structure gives easy access to information that would otherwise remain latent, providing the basis for better understanding and predicting the actions and behaviours of individuals and groups. Finally, we also present a method for pruning context trees, for use in applications where it is desirable to reduce the size of the tree while retaining useful information

arXiv.org e-Print Archive

Warwick Research Archives Portal Repository

Twitter data analysis by means of Strong Flipping Generalized Itemsets

Author: Aggarwal
Agrawal
Agrawal
Baralis
Barsky
Benevenuto
Bird
Brin
Cagliero
Cagliero
Cagliero
Cagliero
Cagliero
Cheong
DBDMG
Dean
Gharib
Glance
Guo
Han
Han
Han
Heymann
Hilderman
Kasneci
Kimball
Kumar Pal
Kunkle
Li
Li
Li
Lin
Luca Cagliero
Luigi Grimaudo
Mathioudakis
Pagano
Paolo Garza
Pasquier
Savasere
Srikant
Sriphaew
T.A.H. Project
T.A.M. Project
Tan
Tania Cerquitelli
Tian
Wu
Yin
Publication venue: Elsevier
Publication date: 01/01/2014
Field of study

Twitter data has recently been considered to perform a large variety of advanced analysis. Analysis ofTwitter data imposes new challenges because the data distribution is intrinsically sparse, due to a large number of messages post every day by using a wide vocabulary. Aimed at addressing this issue, generalized itemsets - sets of items at different abstraction levels - can be effectively mined and used todiscover interesting multiple-level correlations among data supplied with taxonomies. Each generalizeditemset is characterized by a correlation type (positive, negative, or null) according to the strength of thecorrelation among its items.This paper presents a novel data mining approach to supporting different and interesting targetedanalysis - topic trend analysis, context-aware service profiling - by analyzing Twitter posts. We aim atdiscovering contrasting situations by means of generalized itemsets. Specifically, we focus on comparingitemsets discovered at different abstraction levels and we select large subsets of specific (descendant)itemsets that show correlation type changes with respect to their common ancestor. To this aim, a novelkind of pattern, namely the Strong Flipping Generalized Itemset (SFGI), is extracted from Twitter mes-sages and contextual information supplied with taxonomy hierarchies. Each SFGI consists of a frequentgeneralized itemset X and the set of its descendants showing a correlation type change with respect to X. Experiments performed on both real and synthetic datasets demonstrate the effectiveness of the pro-posed approach in discovering interesting and hidden knowledge from Twitter dat

Crossref

PORTO@iris (Publications Open Repository TOrino - Politecnico di Torino)

PORTO Publications Open Repository TOrino

Itemset generalization with cardinality-based constraints

Author: Agrawal
Agrawal
Agrawal
Agrawal
Ayubi
Baldi
Baralis
Baralis
Baralis
Barsky
Bayardo
Brin
Cagliero
Chen
Chen
Crémilleux
Han
Hitzler
Jaroszewicz
Kunkle
Li
Luca Cagliero
Mampaey
Mansingh
Molloy
Paolo Garza
Pasquier
Sriphaew
Sriphaew
Tan
Tatti
Tseng
Uno
Zaki
Publication venue: ELSEVIER
Publication date: 01/01/2013
Field of study

Crossref

PORTO@iris (Publications Open Repository TOrino - Politecnico di Torino)

PORTO Publications Open Repository TOrino

Data mining by means of generalized patterns

Author: Cagliero Luca
Publication venue
Publication date: 01/01/2012
Field of study

The thesis is mainly focused on the study and the application of pattern discovery algorithms that aggregate database knowledge to discover and exploit valuable correlations, hidden in the analyzed data, at different abstraction levels. The aim of the research effort described in this work is two-fold: the discovery of associations, in the form of generalized patterns, from large data collections and the inference of semantic models, i.e., taxonomies and ontologies, suitable for driving the mining proces

PORTO@iris (Publications Open Repository TOrino - Politecnico di Torino)

PORTO Publications Open Repository TOrino