Search CORE

9,774 research outputs found

A Progressive Clustering Algorithm to Group the XML Data by Structural and Semantic Similarity

Author: Nayak Richi
Tran Tien
Publication venue: 'World Scientific Pub Co Pte Lt'
Publication date: 01/01/2007
Field of study

Since the emergence in the popularity of XML for data representation and exchange over the Web, the distribution of XML documents has rapidly increased. It has become a challenge for researchers to turn these documents into a more useful information utility. In this paper, we introduce a novel clustering algorithm PCXSS that keeps the heterogeneous XML documents into various groups according to their similar structural and semantic representations. We develop a global criterion function CPSim that progressively measures the similarity between a XML document and existing clusters, ignoring the need to compute the similarity between two individual documents. The experimental analysis shows the method to be fast and accurate

CiteSeerX

Queensland University of Technology ePrints Archive

Streaming Algorithms for Submodular Function Maximization

Author: A Badanidiyuru Varadaraja
A Chakrabarti
A Goyal
A Gupta
A Kulik
G Calinescu
G Calinescu
GL Nemhauser
J Feigenbaum
J Lee
J Lee
J Vondrák
M Bateni
M Feldman
ML Fisher
N Bansal
Y Filmus
Publication venue
Publication date: 29/04/2015
Field of study

We consider the problem of maximizing a nonnegative submodular set function

f:2^{\mathcal{N}} \rightarrow \mathbb{R}^+

subject to a

p

-matchoid constraint in the single-pass streaming setting. Previous work in this context has considered streaming algorithms for modular functions and monotone submodular functions. The main result is for submodular functions that are {\em non-monotone}. We describe deterministic and randomized algorithms that obtain a

\Omega(\frac{1}{p})

-approximation using

O(k \log k)

-space, where

k

is an upper bound on the cardinality of the desired set. The model assumes value oracle access to

f

and membership oracles for the matroids defining the

p

-matchoid constraint.Comment: 29 pages, 7 figures, extended abstract to appear in ICALP 201

arXiv.org e-Print Archive

Crossref

Correcting Knowledge Base Assertions

Author: Arndt Dörthe
Auer Sören
Chen Jiaoyan
De Melo Gerard
Dimou Anastasia
Lertvittayakumjorn Piyawat
Melo André
Niklaus Christina
Omran Pouya Ghiasnezhad
Trouillon Théo
Vrandečić Denny
Zhang Wen
Publication venue: 'Association for Computing Machinery (ACM)'
Publication date: 01/01/2020
Field of study

The usefulness and usability of knowledge bases (KBs) is often limited by quality issues. One common issue is the presence of erroneous assertions, often caused by lexical or semantic confusion. We study the problem of correcting such assertions, and present a general correction framework which combines lexical matching, semantic embedding, soft constraint mining and semantic consistency checking. The framework is evaluated using DBpedia and an enterprise medical KB

arXiv.org e-Print Archive

City Research Online

Crossref

NIVA Open Access Archive

NORA - Norwegian Open Research Archives

XML Schema Clustering with Semantic and Hierarchical Similarity Measures

Author: Iryadi Wina
Nayak Richi
Publication venue: 'Elsevier BV'
Publication date: 01/01/2007
Field of study

With the growing popularity of XML as the data representation language, collections of the XML data are exploded in numbers. The methods are required to manage and discover the useful information from them for the improved document handling. We present a schema clustering process by organising the heterogeneous XML schemas into various groups. The methodology considers not only the linguistic and the context of the elements but also the hierarchical structural similarity. We support our findings with experiments and analysis

Crossref

Queensland University of Technology ePrints Archive

A review of associative classification mining

Author: Thabtah Fadi
Publication venue
Publication date: 01/01/2007
Field of study

Associative classification mining is a promising approach in data mining that utilizes the association rule discovery techniques to construct classification systems, also known as associative classifiers. In the last few years, a number of associative classification algorithms have been proposed, i.e. CPAR, CMAR, MCAR, MMAC and others. These algorithms employ several different rule discovery, rule ranking, rule pruning, rule prediction and rule evaluation methods. This paper focuses on surveying and comparing the state-of-the-art associative classification techniques with regards to the above criteria. Finally, future directions in associative classification, such as incremental learning and mining low-quality data sets, are also highlighted in this paper

CiteSeerX

University of Huddersfield Repository