Search CORE

43,257 research outputs found

A Comparative Analysis of ASCII and XML Logging Systems

Author: Hanington Eric C.
Publication venue: AFIT Scholar
Publication date: 10/09/2010
Field of study

This research compares XML and ASCII based event logging systems in terms of their storage and processing efficiency. XML has been an emerging technology, even for security. Therefore, it is researched as a logging system with the mitigation of its verbosity. Each system consists of source content, the network transmission, database storage, and querying which are all studied as individual parts. The ASCII logging system consists of the text file as source, FTP as transport, and a relational database system for storage and querying. The XML system has the XML files and XML files in binary form using Efficient XML Interchange encoding, FTP as transport using both XML and binary XML, and an XML database for storage and querying. Further comparisons are made between the XML itself and binary XML, as well as binary XML to ASCII text when comparing file sizes and transmission efficiency. XML itself is a poor choice for hard drive and network transport time compared to ASCII. However, in a binary form, it uses less hard drive space and network resources. Because no XML databases support a binary XML, it is loaded without any optimization. The ASCII loads into the relational database with less time than XML into its database. However, querying each database, neither outperforms the other as one query results in shorter time for one, and another query results in a shorter time for the other. Therefore, XML and/or its binary form, is a viable candidate for use as a comprehensive logging system

AFTI Scholar (Air Force Institute of Technology)

Duplicate Detection in Probabilistic Data

Author: Keijzer Ander de
Keulen Maurice van
Panse Fabian
Ritter Norbert
Publication venue: Centre for Telematics and Information Technology, University of Twente
Publication date: 01/01/2009
Field of study

Collected data often contains uncertainties. Probabilistic databases have been proposed to manage uncertain data. To combine data from multiple autonomous probabilistic databases, an integration of probabilistic data has to be performed. Until now, however, data integration approaches have focused on the integration of certain source data (relational or XML). There is no work on the integration of uncertain (esp. probabilistic) source data so far. In this paper, we present a first step towards a concise consolidation of probabilistic data. We focus on duplicate detection as a representative and essential step in an integration process. We present techniques for identifying multiple probabilistic representations of the same real-world entities. Furthermore, for increasing the efficiency of the duplicate detection process we introduce search space reduction methods adapted to probabilistic data

CiteSeerX

Crossref

University of Twente Research Information

XML data integrity based on concatenated hash function

Author: Liu B
Lu J
Yip YJ
Publication venue: International Journal of Computer Science and Information Security
Publication date: 01/05/2009
Field of study

Data integrity is the fundamental for data authentication. A major problem for XML data authentication is that signed XML data can be copied to another document but still keep signature valid. This is caused by XML data integrity protecting. Through investigation, the paper discovered that besides data content integrity, XML data integrity should also protect element location information, and context referential integrity under fine-grained security situation. The aim of this paper is to propose a model for XML data integrity considering XML data features. The paper presents an XML data integrity model named as CSR (content integrity, structure integrity, context referential integrity) based on a concatenated hash function. XML data content integrity is ensured using an iterative hash process, structure integrity is protected by hashing an absolute path string from root node, and context referential integrity is ensured by protecting context-related elements. Presented XML data integrity model can satisfy integrity requirements under situation of fine-grained security, and compatible with XML signature. Through evaluation, the integrity model presented has a higher efficiency on digest value-generation than the Merkle hash tree-based integrity model for XML data

University of Salford Institutional Repository

University of Huddersfield Repository

Archiving scientific data

Author: Altinel M.
Avilla-Campillo I.
Buneman P.
Chawathe S. S.
Chawathe S. S.
Chien S.
Cobena G.
Diao Y.
Keishi Tajima
Marian A.
Peter Buneman
Sanjeev Khanna
Schmidt A. R.
Tufte K.
Wang-Chiew Tan
Publication venue
Publication date: 01/01/2002
Field of study

We present an archiving technique for hierarchical data with key structure. Our approach is based on the notion of timestamps whereby an element appearing in multiple versions of the database is stored only once along with a compact description of versions in which it appears. The basic idea of timestamping was discovered by Driscoll et. al. in the context of persistent data structures where one wishes to track the sequences of changes made to a data structure. We extend this idea to develop an archiving tool for XML data that is capable of providing meaningful change descriptions and can also efficiently support a variety of basic functions concerning the evolution of data such as retrieval of any specific version from the archive and querying the temporal history of any element. This is in contrast to diff-based approaches where such operations may require undoing a large number of changes or significant reasoning with the deltas. Surprisingly, our archiving technique does not incur any significant space overhead when contrasted with other approaches. Our experimental results support this and also show that the compacted archive file interacts well with other compression techniques. Finally, another useful property of our approach is that the resulting archive is also in XML and hence can directly leverage existing XML tools

CiteSeerX

Crossref

Edinburgh Research Explorer

ScholarlyCommons@Penn

Using Element Clustering to Increase the Efficiency of XML Schema Matching

Author: Jonker Willem
Keulen Maurice van
Smiljanic Marko
Publication venue
Publication date: 01/01/2006
Field of study

Schema matching attempts to discover semantic mappings between elements of two schemas. Elements are cross compared using various heuristics (e.g., name, data-type, and structure similarity). Seen from a broader perspective, the schema matching problem is a combinatorial problem with an exponential complexity. This makes the naive matching algorithms for large schemas prohibitively inefficient. In this paper we propose a clustering based technique for improving the efficiency of large scale schema matching. The technique inserts clustering as an intermediate step into existing schema matching algorithms. Clustering partitions schemas and reduces the overall matching load, and creates a possibility to trade between the efficiency and effectiveness. The technique can be used in addition to other optimization techniques. In the paper we describe the technique, validate the performance of one implementation of the technique, and open directions for future research

Crossref

University of Twente Research Information