Search CORE

5,339 research outputs found

A Grammatical Inference Approach to Language-Based Anomaly Detection in XML

Author: Lampesberger Harald
Publication venue: 'Institute of Electrical and Electronics Engineers (IEEE)'
Publication date: 01/01/2013
Field of study

False-positives are a problem in anomaly-based intrusion detection systems. To counter this issue, we discuss anomaly detection for the eXtensible Markup Language (XML) in a language-theoretic view. We argue that many XML-based attacks target the syntactic level, i.e. the tree structure or element content, and syntax validation of XML documents reduces the attack surface. XML offers so-called schemas for validation, but in real world, schemas are often unavailable, ignored or too general. In this work-in-progress paper we describe a grammatical inference approach to learn an automaton from example XML documents for detecting documents with anomalous syntax. We discuss properties and expressiveness of XML to understand limits of learnability. Our contributions are an XML Schema compatible lexical datatype system to abstract content in XML and an algorithm to learn visibly pushdown automata (VPA) directly from a set of examples. The proposed algorithm does not require the tree representation of XML, so it can process large documents or streams. The resulting deterministic VPA then allows stream validation of documents to recognize deviations in the underlying tree structure or datatypes.Comment: Paper accepted at First Int. Workshop on Emerging Cyberthreats and Countermeasures ECTCM 201

arXiv.org e-Print Archive

CiteSeerX

Crossref

XML Schema-based Minification for Communication of Security Information and Event Management (SIEM) Systems in Cloud Environments

Author: El-Khouly Mahmoud
Mostafa Mahmoud
Moussa Bishoy
Publication venue: 'The Science and Information Organization'
Publication date: 01/09/2014
Field of study

XML-based communication governs most of today's systems communication, due to its capability of representing complex structural and hierarchical data. However, XML document structure is considered a huge and bulky data that can be reduced to minimize bandwidth usage, transmission time, and maximize performance. This contributes to a more efficient and utilized resource usage. In cloud environments, this affects the amount of money the consumer pays. Several techniques are used to achieve this goal. This paper discusses these techniques and proposes a new XML Schema-based Minification technique. The proposed technique works on XML Structure reduction using minification. The proposed technique provides a separation between the meaningful names and the underlying minified names, which enhances software/code readability. This technique is applied to Intrusion Detection Message Exchange Format (IDMEF) messages, as part of Security Information and Event Management (SIEM) system communication hosted on Microsoft Azure Cloud. Test results show message size reduction ranging from 8.15% to 50.34% in the raw message, without using time-consuming compression techniques. Adding GZip compression to the proposed technique produces 66.1% shorter message size compared to original XML messages.Comment: XML, JSON, Minification, XML Schema, Cloud, Log, Communication, Compression, XMill, GZip, Code Generation, Code Readability, 9 pages, 12 figures, 5 tables, Journal Articl

arXiv.org e-Print Archive

CiteSeerX

Directory of Open Access Journals

XML Schema Clustering with Semantic and Hierarchical Similarity Measures

Author: Iryadi Wina
Nayak Richi
Publication venue: 'Elsevier BV'
Publication date: 01/01/2007
Field of study

With the growing popularity of XML as the data representation language, collections of the XML data are exploded in numbers. The methods are required to manage and discover the useful information from them for the improved document handling. We present a schema clustering process by organising the heterogeneous XML schemas into various groups. The methodology considers not only the linguistic and the context of the elements but also the hierarchical structural similarity. We support our findings with experiments and analysis

Crossref

Queensland University of Technology ePrints Archive

HUDDL for description and archive of hydrographic binary data

Author: Calder Brian R.
Masetti Giuseppe
Publication venue: University of New Hampshire Scholars\u27 Repository
Publication date: 01/04/2014
Field of study

Many of the attempts to introduce a universal hydrographic binary data format have failed or have been only partially successful. In essence, this is because such formats either have to simplify the data to such an extent that they only support the lowest common subset of all the formats covered, or they attempt to be a superset of all formats and quickly become cumbersome. Neither choice works well in practice. This paper presents a different approach: a standardized description of (past, present, and future) data formats using the Hydrographic Universal Data Description Language (HUDDL), a descriptive language implemented using the Extensible Markup Language (XML). That is, XML is used to provide a structural and physical description of a data format, rather than the content of a particular file. Done correctly, this opens the possibility of automatically generating both multi-language data parsers and documentation for format specification based on their HUDDL descriptions, as well as providing easy version control of them. This solution also provides a powerful approach for archiving a structural description of data along with the data, so that binary data will be easy to access in the future. Intending to provide a relatively low-effort solution to index the wide range of existing formats, we suggest the creation of a catalogue of format descriptions, each of them capturing the logical and physical specifications for a given data format (with its subsequent upgrades). A C/C++ parser code generator is used as an example prototype of one of the possible advantages of the adoption of such a hydrographic data format catalogue

UNH Scholars' Repository

BSML: A Binding Schema Markup Language for Data Interchange in Problem Solving Environments (PSEs)

Author: Bae Kyung Kyoon
He Jian
Jiang Jing
Ramakrishnan Naren
Rappaport Theodore S.
Shaffer Clifford A.
Tranter William H.
Verstak Alex
Watson Layne T.
Publication venue
Publication date: 18/02/2002
Field of study

We describe a binding schema markup language (BSML) for describing data interchange between scientific codes. Such a facility is an important constituent of scientific problem solving environments (PSEs). BSML is designed to integrate with a PSE or application composition system that views model specification and execution as a problem of managing semistructured data. The data interchange problem is addressed by three techniques for processing semistructured data: validation, binding, and conversion. We present BSML and describe its application to a PSE for wireless communications system design

arXiv.org e-Print Archive

CiteSeerX

Directory of Open Access Journals

Huddl: the Hydrographic Universal Data Description Language

Author: Calder Brian R.
Masetti Giuseppe
Publication venue: University of New Hampshire Scholars\u27 Repository
Publication date: 01/05/2015
Field of study

Since many of the attempts to introduce a universal hydrographic data format have failed or have been only partially successful, a different approach is proposed. Our solution is the Hydrographic Universal Data Description Language (HUDDL), a descriptive XML-based language that permits the creation of a standardized description of (past, present, and future) data formats, and allows for applications like HUDDLER, a compiler that automatically creates drivers for data access and manipulation. HUDDL also represents a powerful solution for archiving data along with their structural description, as well as for cataloguing existing format specifications and their version control. HUDDL is intended to be an open, community-led initiative to simplify the issues involved in hydrographic data access

University of New Brunswick: Centre for Digital Scholarship Journals

UNH Scholars' Repository

Type-Based Detection of XML Query-Update Independence

Author: Bidoit-Tollu Nicole
Colazzo Dario
Ulliana Federico
Publication venue
Publication date: 01/01/2012
Field of study

This paper presents a novel static analysis technique to detect XML query-update independence, in the presence of a schema. Rather than types, our system infers chains of types. Each chain represents a path that can be traversed on a valid document during query/update evaluation. The resulting independence analysis is precise, although it raises a challenging issue: recursive schemas may lead to infer infinitely many chains. A sound and complete approximation technique ensuring a finite analysis in any case is presented, together with an efficient implementation performing the chain-based analysis in polynomial space and time.Comment: VLDB201

arXiv.org e-Print Archive

HAL-CentraleSupelec

CiteSeerX

INRIA a CCSD electronic archive server

Information Integration - the process of integration, evolution and versioning

Author: Keijzer Ander de
Keulen Maurice van
Publication venue: University of Twente, Centre for Telematica and Information Technology (CTIT)
Publication date: 01/01/2005
Field of study

At present, many information sources are available wherever you are. Most of the time, the information needed is spread across several of those information sources. Gathering this information is a tedious and time consuming job. Automating this process would assist the user in its task. Integration of the information sources provides a global information source with all information needed present. All of these information sources also change over time. With each change of the information source, the schema of this source can be changed as well. The data contained in the information source, however, cannot be changed every time, due to the huge amount of data that would have to be converted in order to conform to the most recent schema.\ud In this report we describe the current methods to information integration, evolution and versioning. We distinguish between integration of schemas and integration of the actual data. We also show some key issues when integrating XML data sources

University of Twente Research Information

Recommended from our members

XML-based genetic rules for scene boundary detection in a parallel processing environment

Author: Angelides MC
Parmar MJ
Publication venue
Publication date: 01/01/2007
Field of study

Genetic programming is based on Darwinian evolutionary theory that suggests that the best solution for a problem can be evolved by methods of natural selection of the fittest organisms in a population. These principles are translated into genetic programming by populating the solution space with an initial number of computer programs that can possibly solve the problem and then evolving the programs by means of mutation, reproduction and crossover until a candidate solution can be found that is close to or is the optimal solution for the problem. The computer programs are not fully formed source code but rather a derivative that is represented as a parse tree. The initial solutions are randomly generated and set to a certain population size that the system can compute efficiently. Research has shown that better solutions can be obtained if 1) the population size is increased and 2) if multiple runs are performed of each experiment. If multiple runs are initiated on many machines the probability of finding an optimal solution are increased exponentially and computed more efficiently. With the proliferation of the web and high speed bandwidth connections genetic programming can take advantage of grid computing to both increase population size and increasing the number of runs by utilising machines connected to the web. Using XML-Schema as a global referencing mechanism for defining the parameters and syntax of the evolvable computer programs all machines can synchronise ad-hoc to the ever changing environment of the solution space. Another advantage of using XML is that rules are constructed that can be transformed by XSLT or DOM tree viewers so they can be understood by the GP programmer. This allows the programmer to experiment by manipulating rules to increase the fitness of a rule and evaluate the selection of parameters used to define a solution

Brunel University Research Archive