Search CORE

14,548 research outputs found

Discovering Restricted Regular Expressions with Interleaving

Author: A. Ignatiev
E.M. Gold
G.J. Bex
J. Pei
R.W. Bailey
S. Tsukiyama
Publication venue
Publication date: 01/04/2015
Field of study

Discovering a concise schema from given XML documents is an important problem in XML applications. In this paper, we focus on the problem of learning an unordered schema from a given set of XML examples, which is actually a problem of learning a restricted regular expression with interleaving using positive example strings. Schemas with interleaving could present meaningful knowledge that cannot be disclosed by previous inference techniques. Moreover, inference of the minimal schema with interleaving is challenging. The problem of finding a minimal schema with interleaving is shown to be NP-hard. Therefore, we develop an approximation algorithm and a heuristic solution to tackle the problem using techniques different from known inference algorithms. We do experiments on real-world data sets to demonstrate the effectiveness of our approaches. Our heuristic algorithm is shown to produce results that are very close to optimal.Comment: 12 page

arXiv.org e-Print Archive

Crossref

Datamining for Web-Enabled Electronic Business Applications

Author: Nayak Richi
Publication venue: Idea Group
Publication date: 01/01/2003
Field of study

Web-Enabled Electronic Business is generating massive amount of data on customer purchases, browsing patterns, usage times and preferences at an increasing rate. Data mining techniques can be applied to all the data being collected for obtaining useful information. This chapter attempts to present issues associated with data mining for web-enabled electronic-business

Queensland University of Technology ePrints Archive

Algorithms and implementation of functional dependency discovery in XML : a thesis presented in partial fulfilment of the requirements for the degree of Master of Information Sciences in Information Systems at Massey University

Author: Zhou Zheng
Publication venue: 'Massey University'
Publication date: 01/01/2006
Field of study

1.1 Background Following the advent of the web, there has been a great demand for data interchange between applications using internet infrastructure. XML (extensible Markup Language) provides a structured representation of data empowered by broad adoption and easy deployment. As a subset of SGML (Standard Generalized Markup Language), XML has been standardized by the World Wide Web Consortium (W3C) [Bray et al., 2004], XML is becoming the prevalent data exchange format on the World Wide Web and increasingly significant in storing semi-structured data. After its initial release in 1996, it has evolved and been applied extensively in all fields where the exchange of structured documents in electronic form is required. As with the growing popularity of XML, the issue of functional dependency in XML has recently received well deserved attention. The driving force for the study of dependencies in XML is it is as crucial to XML schema design, as to relational database(RDB) design [Abiteboul et al., 1995]

Massey Research Online

Efficient Incremental Breadth-Depth XML Event Mining

Author: Boussaïd Omar
Darmont Jérôme
Salem Rashed
Publication venue
Publication date: 01/01/2011
Field of study

Many applications log a large amount of events continuously. Extracting interesting knowledge from logged events is an emerging active research area in data mining. In this context, we propose an approach for mining frequent events and association rules from logged events in XML format. This approach is composed of two-main phases: I) constructing a novel tree structure called Frequency XML-based Tree (FXT), which contains the frequency of events to be mined; II) querying the constructed FXT using XQuery to discover frequent itemsets and association rules. The FXT is constructed with a single-pass over logged data. We implement the proposed algorithm and study various performance issues. The performance study shows that the algorithm is efficient, for both constructing the FXT and discovering association rules

arXiv.org e-Print Archive

Crossref

HAL Descartes

HAL

XML Schema Clustering with Semantic and Hierarchical Similarity Measures

Author: Iryadi Wina
Nayak Richi
Publication venue: 'Elsevier BV'
Publication date: 01/01/2007
Field of study

With the growing popularity of XML as the data representation language, collections of the XML data are exploded in numbers. The methods are required to manage and discover the useful information from them for the improved document handling. We present a schema clustering process by organising the heterogeneous XML schemas into various groups. The methodology considers not only the linguistic and the context of the elements but also the hierarchical structural similarity. We support our findings with experiments and analysis

Crossref

Queensland University of Technology ePrints Archive

A Progressive Clustering Algorithm to Group the XML Data by Structural and Semantic Similarity

Author: Nayak Richi
Tran Tien
Publication venue: 'World Scientific Pub Co Pte Lt'
Publication date: 01/01/2007
Field of study

Since the emergence in the popularity of XML for data representation and exchange over the Web, the distribution of XML documents has rapidly increased. It has become a challenge for researchers to turn these documents into a more useful information utility. In this paper, we introduce a novel clustering algorithm PCXSS that keeps the heterogeneous XML documents into various groups according to their similar structural and semantic representations. We develop a global criterion function CPSim that progressively measures the similarity between a XML document and existing clusters, ignoring the need to compute the similarity between two individual documents. The experimental analysis shows the method to be fast and accurate

CiteSeerX

Queensland University of Technology ePrints Archive

Contextualized B2B Registries

Author: Boniface M.J.
Radetzki U
Surridge M.
Publication venue
Publication date: 01/01/2007
Field of study

Abstract. Service discovery is a fundamental concept underpinning the move towards dynamic service-oriented business partnerships. The business process for integrating service discovery and underlying registry technologies into business relationships, procurement and project management functions has not been examined and hence existing Web Service registries lack capabilities required by business today. In this paper we present a novel contextualized B2B registry that supports dynamic registration and discovery of resources within management contexts to ensure that the search space is constrained to the scope of authorized and legitimate resources only. We describe how the registry has been deployed in three case studies from important economic sectors (aerospace, automotive, pharmaceutical) showing how contextualized discovery can support distributed product development processes

Southampton (e-Prints Soton)