336,278 research outputs found
XML Schema-based Minification for Communication of Security Information and Event Management (SIEM) Systems in Cloud Environments
XML-based communication governs most of today's systems communication, due to
its capability of representing complex structural and hierarchical data.
However, XML document structure is considered a huge and bulky data that can be
reduced to minimize bandwidth usage, transmission time, and maximize
performance. This contributes to a more efficient and utilized resource usage.
In cloud environments, this affects the amount of money the consumer pays.
Several techniques are used to achieve this goal. This paper discusses these
techniques and proposes a new XML Schema-based Minification technique. The
proposed technique works on XML Structure reduction using minification. The
proposed technique provides a separation between the meaningful names and the
underlying minified names, which enhances software/code readability. This
technique is applied to Intrusion Detection Message Exchange Format (IDMEF)
messages, as part of Security Information and Event Management (SIEM) system
communication hosted on Microsoft Azure Cloud. Test results show message size
reduction ranging from 8.15% to 50.34% in the raw message, without using
time-consuming compression techniques. Adding GZip compression to the proposed
technique produces 66.1% shorter message size compared to original XML
messages.Comment: XML, JSON, Minification, XML Schema, Cloud, Log, Communication,
Compression, XMill, GZip, Code Generation, Code Readability, 9 pages, 12
figures, 5 tables, Journal Articl
On the role of pre and post-processing in environmental data mining
The quality of discovered knowledge is highly depending on data quality. Unfortunately real data use to contain noise, uncertainty, errors, redundancies or even irrelevant information. The more complex is the reality to be analyzed, the higher the risk of getting low quality data. Knowledge Discovery from Databases (KDD) offers a global framework to prepare data in the right form to perform correct analyses. On the other hand, the quality of decisions taken upon KDD results, depend not only on the quality of the results themselves, but on the capacity of the system to communicate those results in an understandable form. Environmental systems are particularly complex and environmental users particularly require clarity in their results. In this paper some details about how this can be achieved are provided. The role of the pre and post processing in the whole process of Knowledge Discovery in environmental systems is discussed
Modelling airport and airline choice behaviour with the use of stated preference survey data
The majority of studies of air travel choice behavior make use of revealed preference (RP) data, generally in the form of survey data collected from departing passengers. While the use of RP data has certain methodological advantages over the use of stated preference (SP) data, major issues arise because of the often low quality of the data relating to the un-chosen alternatives, in terms of explanatory variables as well as availability. As such, studies using RP survey data often fail to recover a meaningful fare coefficient, and are generally not able to offer a treatment of the effects of airline allegiance. In this paper, we make use of SP data for airport and airline choice collected in the US in 2001. The analysis retrieves significant effects relating to factors such as airfare, access time, flight time and airline and airport allegiance, illustrating the advantages of SP data in this context. Additionally, the analysis explores the use of non-linear transforms of the explanatory variables, as well as the treatment of continuous variations in choice behavior across respondents
Monitoring land use changes using geo-information : possibilities, methods and adapted techniques
Monitoring land use with geographical databases is widely used in decision-making. This report presents the possibilities, methods and adapted techniques using geo-information in monitoring land use changes. The municipality of Soest was chosen as study area and three national land use databases, viz. Top10Vector, CBS land use statistics and LGN, were used. The restrictions of geo-information for monitoring land use changes are indicated. New methods and adapted techniques improve the monitoring result considerably. Providers of geo-information, however, should coordinate on update frequencies, semantic content and spatial resolution to allow better possibilities of monitoring land use by combining data sets
ERBlox: Combining Matching Dependencies with Machine Learning for Entity Resolution
Entity resolution (ER), an important and common data cleaning problem, is
about detecting data duplicate representations for the same external entities,
and merging them into single representations. Relatively recently, declarative
rules called "matching dependencies" (MDs) have been proposed for specifying
similarity conditions under which attribute values in database records are
merged. In this work we show the process and the benefits of integrating four
components of ER: (a) Building a classifier for duplicate/non-duplicate record
pairs built using machine learning (ML) techniques; (b) Use of MDs for
supporting the blocking phase of ML; (c) Record merging on the basis of the
classifier results; and (d) The use of the declarative language "LogiQL" -an
extended form of Datalog supported by the "LogicBlox" platform- for all
activities related to data processing, and the specification and enforcement of
MDs.Comment: Final journal version, with some minor technical corrections.
Extended version of arXiv:1508.0601
- âŠ