Search CORE

336,278 research outputs found

XML Schema-based Minification for Communication of Security Information and Event Management (SIEM) Systems in Cloud Environments

Author: El-Khouly Mahmoud
Mostafa Mahmoud
Moussa Bishoy
Publication venue: 'The Science and Information Organization'
Publication date: 01/09/2014
Field of study

XML-based communication governs most of today's systems communication, due to its capability of representing complex structural and hierarchical data. However, XML document structure is considered a huge and bulky data that can be reduced to minimize bandwidth usage, transmission time, and maximize performance. This contributes to a more efficient and utilized resource usage. In cloud environments, this affects the amount of money the consumer pays. Several techniques are used to achieve this goal. This paper discusses these techniques and proposes a new XML Schema-based Minification technique. The proposed technique works on XML Structure reduction using minification. The proposed technique provides a separation between the meaningful names and the underlying minified names, which enhances software/code readability. This technique is applied to Intrusion Detection Message Exchange Format (IDMEF) messages, as part of Security Information and Event Management (SIEM) system communication hosted on Microsoft Azure Cloud. Test results show message size reduction ranging from 8.15% to 50.34% in the raw message, without using time-consuming compression techniques. Adding GZip compression to the proposed technique produces 66.1% shorter message size compared to original XML messages.Comment: XML, JSON, Minification, XML Schema, Cloud, Log, Communication, Compression, XMill, GZip, Code Generation, Code Readability, 9 pages, 12 figures, 5 tables, Journal Articl

arXiv.org e-Print Archive

CiteSeerX

Directory of Open Access Journals

On the role of pre and post-processing in environmental data mining

Author: Athanasiadis Ioannis
Comas Joaquim
Gibert Karina
Holmes Geoffrey
Izquierdo Joaquin
Sanchez-Marre Miquel
Publication venue: International Environmental Modelling and Software Society
Publication date: 01/01/2008
Field of study

The quality of discovered knowledge is highly depending on data quality. Unfortunately real data use to contain noise, uncertainty, errors, redundancies or even irrelevant information. The more complex is the reality to be analyzed, the higher the risk of getting low quality data. Knowledge Discovery from Databases (KDD) offers a global framework to prepare data in the right form to perform correct analyses. On the other hand, the quality of decisions taken upon KDD results, depend not only on the quality of the results themselves, but on the capacity of the system to communicate those results in an understandable form. Environmental systems are particularly complex and environmental users particularly require clarity in their results. In this paper some details about how this can be achieved are provided. The role of the pre and post processing in the whole process of Knowledge Discovery in environmental systems is discussed

Research Commons@Waikato

Modelling airport and airline choice behaviour with the use of stated preference survey data

Author: Algers
Basar
Bradley
Hess
Hess
Hess
John W. Polak
Lijesen
Louviere
Ortiizar
Pels
Pels
Proussaloglou
Stephane Hess
Thomas Adler
Publication venue: 'Elsevier BV'
Publication date: 01/05/2007
Field of study

The majority of studies of air travel choice behavior make use of revealed preference (RP) data, generally in the form of survey data collected from departing passengers. While the use of RP data has certain methodological advantages over the use of stated preference (SP) data, major issues arise because of the often low quality of the data relating to the un-chosen alternatives, in terms of explanatory variables as well as availability. As such, studies using RP survey data often fail to recover a meaningful fare coefficient, and are generally not able to offer a treatment of the effects of airline allegiance. In this paper, we make use of SP data for airport and airline choice collected in the US in 2001. The analysis retrieves significant effects relating to factors such as airfare, access time, flight time and airline and airport allegiance, illustrating the advantages of SP data in this context. Additionally, the analysis explores the use of non-linear transforms of the explanatory variables, as well as the treatment of continuous variations in choice behavior across respondents

Crossref

White Rose Research Online

Monitoring land use changes using geo-information : possibilities, methods and adapted techniques

Author: Hazeu G.W.
Zeeuw C.J., de
Publication venue: Alterra
Publication date: 01/01/2001
Field of study

Monitoring land use with geographical databases is widely used in decision-making. This report presents the possibilities, methods and adapted techniques using geo-information in monitoring land use changes. The municipality of Soest was chosen as study area and three national land use databases, viz. Top10Vector, CBS land use statistics and LGN, were used. The restrictions of geo-information for monitoring land use changes are indicated. New methods and adapted techniques improve the monitoring result considerably. Providers of geo-information, however, should coordinate on update frequencies, semantic content and spatial resolution to allow better possibilities of monitoring land use by combining data sets

Wageningen University & Research Publications

ERBlox: Combining Matching Dependencies with Machine Learning for Entity Resolution

Author: Bahmani Zeinab
Bertossi Leopoldo
Vasiloglou Nikolaos
Publication venue
Publication date: 18/01/2017
Field of study

Entity resolution (ER), an important and common data cleaning problem, is about detecting data duplicate representations for the same external entities, and merging them into single representations. Relatively recently, declarative rules called "matching dependencies" (MDs) have been proposed for specifying similarity conditions under which attribute values in database records are merged. In this work we show the process and the benefits of integrating four components of ER: (a) Building a classifier for duplicate/non-duplicate record pairs built using machine learning (ML) techniques; (b) Use of MDs for supporting the blocking phase of ML; (c) Record merging on the basis of the classifier results; and (d) The use of the declarative language "LogiQL" -an extended form of Datalog supported by the "LogicBlox" platform- for all activities related to data processing, and the specification and enforcement of MDs.Comment: Final journal version, with some minor technical corrections. Extended version of arXiv:1508.0601

arXiv.org e-Print Archive

Carleton University's Institutional Repository