Search CORE

9,375 research outputs found

Bayesian Network and Network Pruning Strategy for XML Duplicate Detection

Author: Ms. Trupti Patil, Siddheshwar Patil, Mis
Publication venue: 'Auricle Technologies, Pvt., Ltd.'
Publication date: 30/11/2014
Field of study

Data Duplication causes excess use of redundant storage, excess time and inconsistency. Duplicate detection will help to ensure accurate data by identifying and preventing identical or similar records. There is a long work on identifying duplicates in relational data, but only a slight solution focused on duplicate detection in more complex hierarchical structures, like XML data. Hierarchical data are defined as a set of data items that are related to each other by hierarchical relationships such as XML .In the world of XML there are not necessarily uniform and clearly defined structures like tables. Duplicate detection has been studied extensively for relational data. Methods devised for duplicate detection in a single relation do not directly apply to XML data. Therefore there is a need to develop a method to detect duplicate objects in nested XML data. In proposed system duplicates are detected by using duplicate detection algorithm called as XMLDup. Proposed XMLDup method will be using Bayesian network. It determine the probability of two XML elements being duplicates by considering the information within the elements and the structure of information. In order to improve the Bayesian Network evaluation time, pruning strategy is used. Finally work will be analyzed by measuring Precision and Recall value

International Journal on Recent and Innovation Trends in Computing and Communication

Duplicate Detection in Probabilistic Data

Author: Keijzer Ander de
Keulen Maurice van
Panse Fabian
Ritter Norbert
Publication venue: Centre for Telematics and Information Technology, University of Twente
Publication date: 01/01/2009
Field of study

Collected data often contains uncertainties. Probabilistic databases have been proposed to manage uncertain data. To combine data from multiple autonomous probabilistic databases, an integration of probabilistic data has to be performed. Until now, however, data integration approaches have focused on the integration of certain source data (relational or XML). There is no work on the integration of uncertain (esp. probabilistic) source data so far. In this paper, we present a first step towards a concise consolidation of probabilistic data. We focus on duplicate detection as a representative and essential step in an integration process. We present techniques for identifying multiple probabilistic representations of the same real-world entities. Furthermore, for increasing the efficiency of the duplicate detection process we introduce search space reduction methods adapted to probabilistic data

CiteSeerX

Crossref

University of Twente Research Information

Coreference detection in XML metadata

Author: De Tré Guy
Szymczak Marcin
Zadrozny Slawomir
Publication venue: 'Institute of Electrical and Electronics Engineers (IEEE)'
Publication date: 01/01/2013
Field of study

Preserving data quality is an important issue in data collection management. One of the crucial issues hereby is the detection of duplicate objects (called coreferent objects) which describe the same entity, but in different ways. In this paper we present a method for detecting coreferent objects in metadata, in particular in XML schemas. Our approach consists in comparing the paths from a root element to a given element in the schema. Each path precisely defines the context and location of a specific element in the schema. Path matching is based on the comparison of the different steps of which paths are composed. The uncertainty about the matching of steps is expressed with possibilistic truth values and aggregated using the Sugeno integral. The discovered coreference of paths can help for determining the coreference of different XML schemas

Ghent University Academic Bibliography

Automatic test cases generation from software specifications modules

Author: Alhroob Aysh
Dahal Keshav
Hossain Alamgir
Publication venue: Wroclaw University of Technology
Publication date: 01/01/2010
Field of study

A new technique is proposed in this paper to extend the Integrated Classification Tree Methodology (ICTM) developed by Chen et al. [13] This software assists testers to construct test cases from functional specifications. A Unified Modelling Language (UML) class diagram and Object Constraint Language (OCL) are used in this paper to represent the software specifications. Each classification and associated class in the software specification is represented by classes and attributes in the class diagram. Software specification relationships are represented by associated and hierarchical relationships in the class diagram. To ensure that relationships are consistent, an automatic methodology is proposed to capture and control the class relationships in a systematic way. This can help to reduce duplication and illegitimate test cases, which improves the testing efficiency and minimises the time and cost of the testing. The methodology introduced in this paper extracts only the legitimate test cases, by removing the duplicate test cases and those incomputable with the software specifications. Large amounts of time would have been needed to execute all of the test cases; therefore, a methodology was proposed which aimed to select a best testing path. This path guarantees the highest coverage of system units and avoids using all generated test cases. This path reduces the time and cost of the testing

CiteSeerX

Northumbria Research Link

Measuring the similarity of PML documents with RFID-based sensors

Author: Reza Malekian
Wang Ru-chuan
Wang Zhong-qin
Ye Ning
Zhao Ting-ting
Publication venue
Publication date: 12/09/2013
Field of study

The Electronic Product Code (EPC) Network is an important part of the Internet of Things. The Physical Mark-Up Language (PML) is to represent and de-scribe data related to objects in EPC Network. The PML documents of each component to exchange data in EPC Network system are XML documents based on PML Core schema. For managing theses huge amount of PML documents of tags captured by Radio frequency identification (RFID) readers, it is inevitable to develop the high-performance technol-ogy, such as filtering and integrating these tag data. So in this paper, we propose an approach for meas-uring the similarity of PML documents based on Bayesian Network of several sensors. With respect to the features of PML, while measuring the similarity, we firstly reduce the redundancy data except information of EPC. On the basis of this, the Bayesian Network model derived from the structure of the PML documents being compared is constructed.Comment: International Journal of Ad Hoc and Ubiquitous Computin

arXiv.org e-Print Archive

UPSpace at the University of Pretoria

Pairwise similarity of TopSig document signatures

Author: De Vries Christopher
Geva Shlomo
Publication venue: 'Association for Computing Machinery (ACM)'
Publication date: 01/01/2012
Field of study

This paper analyses the pairwise distances of signatures produced by the TopSig retrieval model on two document collections. The distribution of the distances are compared to purely random signatures. It explains why TopSig is only competitive with state of the art retrieval models at early precision. Only the local neighbourhood of the signatures is interpretable. We suggest this is a common property of vector space models

Crossref

Queensland University of Technology ePrints Archive

Cardinality heterogeneities in Web service composition: Issues and solutions

Author: Benslimane D.
Jacquet J-M.
Maamar Z.
Mrissa M.
Thiran Ph.
Publication venue
Publication date: 01/01/2008
Field of study

Data exchanges between Web services engaged in a composition raise several heterogeneities. In this paper, we address the problem of data cardinality heterogeneity in a composition. Firstly, we build a theoretical framework to describe different aspects of Web services that relate to data cardinality, and secondly, we solve this problem by developing a solution for cardinality mediation based on constraint logic programming

arXiv.org e-Print Archive

Repository of the University of Namur