Search CORE

22 research outputs found

Efficient main memory-based XML stream processing

Author: Scherzinger Stefanie
Publication venue: 'Walter de Gruyter GmbH'
Publication date: 01/01/2008
Field of study

Applications that process XML documents as files or streams are naturally main-memory based. This makes main memory the bottleneck for scalability. This doctoral thesis addresses this problem and presents a toolkit for effective buffer management in main memory-based XML stream processors. XML document projection is an established technique for reducing the buffer requirements of main memory-based XML processors, where only data relevant to query evaluation is loaded into main memory buffers. We present a novel implementation of this task, where we use string matching algorithms designed for efficient keyword search in flat strings to navigate in tree-structured data. We then introduce an extension of the XQuery language, called FluX, that supports event-based query processing. Purely event-based queries of this language can be executed on streaming XML data in a very direct way. We develop an algorithm to efficiently rewrite XQueries into FluX. This algorithm is capable of exploiting order constraints derived from schemata to reduce the amount of buffering in query evaluation. During streaming query evaluation, we continuously purge buffers from data that is no longer relevant. By combining static query analysis with a dynamic analysis of the buffer contents, we effectively reduce the size of memory buffers. We have confirmed the efficacy of these techniques by extensive experiments and by publication at international venues. To compare our contributions to related work in a systematic manner, we contribute an abstract framework for XML stream processing. This framework allows us to gain a greater-picture view over the factors influencing the main memory consumption.Anwendungen, die XML-Dokumente als Dateien oder Ströme verarbeiten, sind natürlicherweise hauptspeicherbasiert. Für die Skalierbarkeit wird der Hauptspeicher damit zu einem Engpass. Diese Doktorarbeit widmet sich diesem Problem, zu dessen Lösung sie Werkzeuge für eine effektive Pufferverwaltung in hauptspeicherbasierten Prozessoren für XML-Datenströme vorstellt. Die Projektion von XML-Dokumenten ist eine etablierte Methode, um den Pufferverbrauch von hauptspeicherbasierten XML-Prozessoren zu reduzieren. Dabei werden nur jene Daten in den Hauptspeicherpuffer geladen, die für die Anfrageauswertung auch relevant sind. Wir präsentieren eine neue Implementierung dieser Aufgabe, wobei wir Algorithmen zur effizienten Suche in flachen Zeichenketten einsetzen, um in baumartig strukturierten Daten zu navigieren. Danach stellen wir eine Erweiterung der XQuery-Sprache vor, genannt FluX, welche eine ereignisbasierte Anfragebearbeitung erlaubt. Anfragen, die nur ereignisbasierte Konstrukte benutzen, können direkt über XML-Datenströmen ausgewertet werden. Dazu entwickeln wir einen Algorithmus, mit dessen Hilfe sich XQuery-Anfragen effizient in FluX übersetzen lassen. Dieser benutzt Ordnungsinformationen aus Datenschemata, womit das Puffern in der Anfragebearbeitung reduziert werden kann. Während der Verarbeitung des Datenstroms bereinigen wir laufend den Hauptspeicherpuffer von solchen Daten, die nicht länger relevant sind. Eine nachhaltige Reduzierung der Größe von Hauptspeicherpuffern gelingt durch die Kombination der statischen Anfrageanalyse mit einer dynamischen Analyse der Pufferinhalte. Die Effektivität dieser Puffermanagement-Techniken erfährt ihre Bestätigung in umfangreichen Experimenten und internationalen Publikationen. Für einen systematischen Vergleich unserer Beiträge mit der aktuellen Literatur entwickeln wir ein abstraktes System zur Modellierung von Prozessoren zur XML-Stromverarbeitung. So können wir die spezifischen Faktoren herausgreifen, die den Hauptspeicherverbrauch beeinflussen

Universaar

Acronym

Algorithms for XML filtering

Author: Silvasti Panu
Publication venue: Aalto University, School of Arts, Design and Architecture, Department of Arts
Publication date: 01/01/2011
Field of study

In a publish-subscribe system based on XML filtering, the subscriber profiles are usually specified by filters written in the XPath language. The system processes the stream of XML documents and delivers to subscribers a notification or the content of those documents that match the filters. The number of interested subscribers and their stored profiles can be very large, thousands or even millions. In this case, the scalability of the system is critical. In this thesis, we develop several algorithms for XML filtering with linear XPath expressions. The algorithms are based on a backtracking Aho-Corasick pattern-matching automaton (PMA) built from "keywords" extracted from the filters, where a keyword is a maximal substring consisting only of XML element names. The output function of the PMA indicates which keyword occurrences of which filter are recognized at a given state. Our best results have been obtained by using a dynamically changing output function, which is dynamically updated during the processing of the input document. We have conducted an extensive performance study in which we compared our filtering algorithms with YFilter and the lazy DFA, two well-known automata-based filtering methods. With a non-recursive XML data set, PMA-based filtering is tens of times more efficient than YFilter and also significantly more efficient than the lazy DFA. With a slightly recursive data set PMA-based filtering has the same performance as the lazy DFA and it is significantly more efficient than YFilter. We have also developed an optimization method called filter pruning. This method improves the performance of filtering by utilizing knowledge about the XML document type definition (DTD) to simplify the filters. The optimization algorithm takes as input a DTD and a set of linear XPath filters and produces a set of pruned linear XPath filters that contain as few wildcards and descendant operators as possible. With a non-recursive data set and with a slightly recursive data set the filter-pruning method yielded a tenfold increase in the filtering speed of the PMA-based algorithms and a hundredfold increase with YFilter and the lazy DFA. Filter pruning can also increase the filtering speed in the case of highly recursive data sets

Aaltodoc Publication Archive

Seventh Biennial Report : June 2003 - March 2005

Author
Publication venue: Max-Planck-Institut für Informatik
Publication date: 01/01/2005
Field of study

MPG.PuRe

Online network intrusion detection system using temporal logic and stream data processing

Author: Ahmed Abdulbasit
Publication venue
Publication date
Field of study

These days, the world are becoming more interconnected, and the Internet has dominated the ways to communicate or to do business. Network security measures must be taken to protect the organization environment. Among these security measures are the intrusion detection systems. These systems aim to detect the actions that attempt to compromise the confidentiality, availability, and integrity of a resource by monitoring the events occurring in computer systems and/or networks. The increasing amounts of data that are transmitted at higher and higher speed networks created a challenging problem for the current intrusion detection systems. Once the traffic exceeds the operational boundaries of these systems, packets are dropped. This means that some attacks will not be detected. In this thesis, we propose developing an online network based intrusion detection system by the combined use of temporal logic and stream data processing. Temporal Logic formalisms allow us to represent attack patterns or normal behaviour. Stream data processing is a recent database technology applied to flows of data. It is designed with high performance features for data intensive applications processing. In this work we develop a system where temporal logic specifications are automatically translated into stream queries that run on the stream database server and are continuously evaluated against the traffic to detect intrusions. The experimental results show that this combination was efficient in using the resources of the running machines and was able to detect all the attacks in the test data. Additionally, the proposed solution provides a concise and unambiguous way to formally represent attack signatures and it is extensible allowing attacks to be added. Also, it is scalable as the system can benefit from using more CPUs and additional memory on the same machine, or using distributed servers

University of Liverpool Repository

Confidentiality-Preserving Publish/Subscribe: A Survey

Author: Felber Pascal
Mercier Hugues
Onica Emanuel
Rivière Etienne
Publication venue: 'Association for Computing Machinery (ACM)'
Publication date: 01/01/2016
Field of study

Publish/subscribe (pub/sub) is an attractive communication paradigm for large-scale distributed applications running across multiple administrative domains. Pub/sub allows event-based information dissemination based on constraints on the nature of the data rather than on pre-established communication channels. It is a natural fit for deployment in untrusted environments such as public clouds linking applications across multiple sites. However, pub/sub in untrusted environments lead to major confidentiality concerns stemming from the content-centric nature of the communications. This survey classifies and analyzes different approaches to confidentiality preservation for pub/sub, from applications of trust and access control models to novel encryption techniques. It provides an overview of the current challenges posed by confidentiality concerns and points to future research directions in this promising field

arXiv.org e-Print Archive

DIAL UCLouvain

Data Service Outsourcing and Privacy Protection in Mobile Internet

Author: Deng Fuhu
Ding Yi
Qin Zhen
Xiong Hu
Zhao Yang
Zhou Erqiang
Publication venue: 'IntechOpen'
Publication date: 01/01/2018
Field of study

Mobile Internet data have the characteristics of large scale, variety of patterns, and complex association. On the one hand, it needs efficient data processing model to provide support for data services, and on the other hand, it needs certain computing resources to provide data security services. Due to the limited resources of mobile terminals, it is impossible to complete large-scale data computation and storage. However, outsourcing to third parties may cause some risks in user privacy protection. This monography focuses on key technologies of data service outsourcing and privacy protection, including the existing methods of data analysis and processing, the fine-grained data access control through effective user privacy protection mechanism, and the data sharing in the mobile Internet

IntechOpen

Crossref

Directory of Open Access Books (DOAB)

Accelerating Event Stream Processing in On- and Offline Systems

Author: Körber Michael
Publication venue: Philipps-Universität Marburg
Publication date: 01/01/2021
Field of study

Due to a growing number of data producers and their ever-increasing data volume, the ability to ingest, analyze, and store potentially never-ending streams of data is a mission-critical task in today's data processing landscape. A widespread form of data streams are event streams, which consist of continuously arriving notifications about some real-world phenomena. For example, a temperature sensor naturally generates an event stream by periodically measuring the temperature and reporting it with measurement time in case of a substantial change to the previous measurement. In this thesis, we consider two kinds of event stream processing: online and offline. Online refers to processing events solely in main memory as soon as they arrive, while offline means processing event data previously persisted to non-volatile storage. Both modes are supported by widely used scale-out general-purpose stream processing engines (SPEs) like Apache Flink or Spark Streaming. However, such engines suffer from two significant deficiencies that severely limit their processing performance. First, for offline processing, they load the entire stream from non-volatile secondary storage and replay all data items into the associated online engine in order of their original arrival. While this naturally ensures unified query semantics for on- and offline processing, the costs for reading the entire stream from non-volatile storage quickly dominate the overall processing costs. Second, modern SPEs focus on scaling out computations across the nodes of a cluster, but use only a fraction of the available resources of individual nodes. This thesis tackles those problems with three different approaches. First, we present novel techniques for the offline processing of two important query types (windowed aggregation and sequential pattern matching). Our methods utilize well-understood indexing techniques to reduce the total amount of data to read from non-volatile storage. We show that this improves the overall query runtime significantly. In particular, this thesis develops the first index-based algorithms for pattern queries expressed with the Match_Recognize clause, a new and powerful language feature of SQL that has received little attention so far. Second, we show how to maximize resource utilization of single nodes by exploiting the capabilities of modern hardware. Therefore, we develop a prototypical shared-memory CPU-GPU-enabled event processing system. The system provides implementations of all major event processing operators (filtering, windowed aggregation, windowed join, and sequential pattern matching). Our experiments reveal that regarding resource utilization and processing throughput, such a hardware-enabled system is superior to hardware-agnostic general-purpose engines. Finally, we present TPStream, a new operator for pattern matching over temporal intervals. TPStream achieves low processing latency and, in contrast to sequential pattern matching, is easily parallelizable even for unpartitioned input streams. This results in maximized resource utilization, especially for modern CPUs with multiple cores

Publikations- und Dokumentenserver der Universitätsbibliothek Marburg

Privacy-preserving framework for context-aware mobile applications

Author: Butter Thomas
Publication venue
Publication date: 01/01/2009
Field of study

In recent years, the pervasiveness of mobile devices, especially mobile phones and personal digital assistants (PDAs), has increased rapidly. At the same time, wireless communication networks have improved considerably and the usage of mobile devices to access the internet is, with decreasing costs, possible almost everywhere and at any time in industrialized countries. However, the usage of mobile technology and mobile applications to support business processes, trans- actions, and personal tasks is still low compared to their potential. The improved capabilities resulted in the introduction of many applications for mobile devices by network operators and software vendors. These services were meant to increase the average revenue per user (ARPU) on top of the voice call income. But many of these services have failed and none of them has led to an improved usage of mobile services today, besides e-mail. A new kind of application, the context-aware application, exploits the ubiquity of the mobile devices in order to fit the personal need or task the user is about to execute satisfactorily. Context-aware systems try to improve the communication with the user by adding information about the current context to the explicit user input and by adapting the output to the current setting of the user. While those applications are seen as important steps to a widespread usage, there are strong factors inhibiting their development and adoption. First of all, the lack of common frameworks handling context data and improv- ing software development increases the cost to build context-aware applications. Each application currently implements its own sensors and logic to handle its data. Furthermore, service providers need to offer tailored services for every con- text of the user. Since no single provider is able to be an expert for all kinds of applications and will not have the necessary number of developers, a common service which finds services of multiple providers for the current situation of the user is needed. All services need to utilize the context attributes which are locally determined by the user’s situation. Development costs are further boosted by the difficulty of developing applications for multiple devices with varying input/output (IO) capabilities like speech output, small and big screens, full qwerty-keyboards, touchscreens, or numeric keypads. From the user’s perspective, privacy also endangers the adoption of mobile services. Context information may include very private data and expose the user’s preferences and habits. While the user may trust a single, well-known, provider to secure the private data and to respect the user’s privacy concerns, the problem increases with more and more smaller service providers

MAnnheim DOCument Server

Modulating application behaviour for closely coupled intrusion detection

Author: Welz Marc
Publication venue: Department of Computer Science
Publication date: 01/01/2003
Field of study

Includes bibliographical references.This thesis presents a security measure that is closely coupled to applications. This distinguishes it from conventional security measures which tend to operate at the infrastructure level (network, operating system or virtual machine). Such lower level mechanisms exhibit a number of limitations, amongst others they are poorly suited to the monitoring of applications which operate on encrypted data or the enforcement of security policies involving abstractions introduced by applications. In order to address these problems, the thesis proposes externalising the security related analysis functions performed by applications. These otherwise remain hidden in applications and so are likely to be underdeveloped, inflexible or insular. It is argued that these deficiencies have resulted in an over-reliance on infrastructure security components

Cape Town University OpenUCT

Sixth Biennial Report : August 2001 - May 2003

Author
Publication venue: Max-Planck-Institut für Informatik
Publication date: 01/01/2003
Field of study

MPG.PuRe