Search CORE

15,312 research outputs found

Graph-based Modelling of Concurrent Sequential Patterns

Author: Chen Weiru
Keech Malcolm
Lu Jing
Publication venue: Idea Group Inc
Publication date: 19/04/2010
Field of study

Structural relation patterns have been introduced recently to extend the search for complex patterns often hidden behind large sequences of data. This has motivated a novel approach to sequential patterns post-processing and a corresponding data mining method was proposed for Concurrent Sequential Patterns (ConSP). This article refines the approach in the context of ConSP modelling, where a companion graph-based model is devised as an extension of previous work. Two new modelling methods are presented here together with a construction algorithm, to complete the transformation of concurrent sequential patterns to a ConSP-Graph representation. Customer orders data is used to demonstrate the effectiveness of ConSP mining while synthetic sample data highlights the strength of the modelling technique, illuminating the theories developed

Mining XML Documents

Author: Candillier Laurent
Denoyer Ludovic
Gallinari Patrick
Rousset Marie-Christine
Termier Alexandre
Vercoustre Anne-Marie
Publication venue: 'IGI Global'
Publication date: 01/01/2007
Field of study

XML documents are becoming ubiquitous because of their rich and flexible format that can be used for a variety of applications. Giving the increasing size of XML collections as information sources, mining techniques that traditionally exist for text collections or databases need to be adapted and new methods to be invented to exploit the particular structure of XML documents. Basically XML documents can be seen as trees, which are well known to be complex structures. This chapter describes various ways of using and simplifying this tree structure to model documents and support efficient mining algorithms. We focus on three mining tasks: classification and clustering which are standard for text collections; discovering of frequent tree structure which is especially important for heterogeneous collection. This chapter presents some recent approaches and algorithms to support these tasks together with experimental evaluation on a variety of large XML collections

HAL - Lille 3

INRIA a CCSD electronic archive server

A survey of frequent subgraph mining algorithms

Author: Coenen Frans
Jiang Chuntao
Zito Michele
Publication venue: 'Cambridge University Press (CUP)'
Publication date: 01/03/2013
Field of study

University of Liverpool Repository

An introduction to Graph Data Management

Author: A Dries
A Gutiérrez
A Iosup
A Morari
A Poulovassilis
AD Zhu
AO Mendelzon
B Amann
B Elser
C Berge
C Vicknair
C Watters
C Weiss
CS Chang
D Conte
D Dominguez-Sal
D Theodoratos
DC Faye
DW Shipman
EF Codd
FW Tompa
G Malewicz
GM Kuper
H He
HS Kunii
IF Cruz
IF Cruz
J Hidders
J Paredaens
J Peckham
J. Hidders
Jonathan Hayes
K Zeng
L Kowalik
L Zou
M Atre
M Ciglan
M Consens
M Gemis
M Gyssens
M Han
M Levene
M Levene
M Levene
M Mainguenaud
M Schmidt
M Yannakakis
MA Bornea
MA Rodriguez
MA Rodriguez
Marc Andries
MP Consens
MP Consens
N Kiesel
N Roussopoulos
O Erling
P Barceló Baeza
P Buneman
P Yuan
Philippe Cudré-Mauroux
PPS Chen
PT Wood
PT Wood
R Agrawal
R Angles
R Angles
R Brijder
R Ronen
RH Güting
RS Xin
S Abiteboul
S Abiteboul
T Neumann
W Fan
W Kim
Y Guo
Y Low
Y Papakonstantinou
Y Tian
Y Zhao
YA Liu
Publication venue: 'Springer Science and Business Media LLC'
Publication date: 29/12/2017
Field of study

A graph database is a database where the data structures for the schema and/or instances are modeled as a (labeled)(directed) graph or generalizations of it, and where querying is expressed by graph-oriented operations and type constructors. In this article we present the basic notions of graph databases, give an historical overview of its main development, and study the main current systems that implement them

arXiv.org e-Print Archive

Crossref

A log mining approach for process monitoring in SCADA

Author: Bolzoni Damiano
Hadziosmanovic Dina
Hartel Pieter
Publication venue: Springer
Publication date: 01/01/2012
Field of study

SCADA (Supervisory Control and Data Acquisition) systems are used for controlling and monitoring industrial processes. We propose a methodology to systematically identify potential process-related threats in SCADA. Process-related threats take place when an attacker gains user access rights and performs actions, which look legitimate, but which are intended to disrupt the SCADA process. To detect such threats, we propose a semi-automated approach of log processing. We conduct experiments on a real-life water treatment facility. A preliminary case study suggests that our approach is effective in detecting anomalous events that might alter the regular process workflow

Springer - Publisher Connector

University of Twente Research Information

Tree model guided (TMG) enumeration as the basis for mining frequent patterns from XML documents

Author: Setiawan HT
Publication venue
Publication date: 01/01/2005
Field of study

University of Technology, Sydney. Faculty of Information Technology.Association mining consists of two important problems, namely frequent patterns discovery and rule construction. The former task is considered to be a more challenging problem to solve. Because of its importance and application in a number of data mining tasks, it has become the focus of many studies. A substantial amount of research has gone into the development of efficient algorithms for mining patterns from large structured or relational data. Compared with the fruitful achievements in mining structured data, mining in the semi-structured world still remains at a preliminary stage. The most popular representative of the semi-structured data is XML. Mining frequent patterns from XML poses more challenges in comparison to mining frequent patterns from relational data because XML is a tree-structured data and has an ordered data context. Moreover, XML data in general is larger in data size due to richer contents and more meta-data. Dealing with XML, thus involves greater unprecedented complexity in comparison to mining relational data. Mining frequent patterns from XML can be recast as mining frequent tree structures from a database of XML documents. The increase of XML data and the need for mining semi-structured data has sparked a lot of interest in finding frequent rooted trees in forests. In this thesis, we aim to develop a framework to mine frequent patterns from XML documents. The framework utilizes a structure-guided enumeration approach, Tree Model Guided (TMG), for efficient enumeration of tree structure and it makes use of novel structures for fast enumeration and frequency counting. By utilizing a novel array-based structure, an embedded list (EL), the framework offers a simple sequencelike tree enumeration technique. The effectiveness and extendibility of the framework is demonstrated in that it can be utilized not only for enumerating ordered subtrees but also for enumerating unordered subtrees and subsequences. Furthermore, the framework tackles the unprecedented complexity in mining frequent tree-structured patterns by generating only valid candidates with non-zero frequency count and employing a constraint-driven approach. Our experimental studies comparing the proposed framework with the state-of-the-art algorithms demonstrate the effectiveness and the efficiency of the proposed framework

OPUS - University of Technology Sydney

Feature Extraction and Duplicate Detection for Text Mining: A Survey

Author: Ramya R S
Venugopal K R
Publication venue: Global Journals Inc. (US)
Publication date: 22/04/2016
Field of study

Text mining, also known as Intelligent Text Analysis is an important research area. It is very difficult to focus on the most appropriate information due to the high dimensionality of data. Feature Extraction is one of the important techniques in data reduction to discover the most important features. Proce- ssing massive amount of data stored in a unstructured form is a challenging task. Several pre-processing methods and algo- rithms are needed to extract useful features from huge amount of data. The survey covers different text summarization, classi- fication, clustering methods to discover useful features and also discovering query facets which are multiple groups of words or phrases that explain and summarize the content covered by a query thereby reducing time taken by the user. Dealing with collection of text documents, it is also very important to filter out duplicate data. Once duplicates are deleted, it is recommended to replace the removed duplicates. Hence we also review the literature on duplicate detection and data fusion (remove and replace duplicates).The survey provides existing text mining techniques to extract relevant features, detect duplicates and to replace the duplicate data to get fine grained knowledge to the user

Global Journal of Computer Science and Technology (GJCST)