3 research outputs found

    Observation of current approaches to utilize the elastic cloud for big data stream processing

    Get PDF
    This paper conducts a systematic literature map to collect information about current approaches to utilize the elastic cloud for data stream processing in the big data context. First is a description and setup of the used scientific methodology which adheres to generally accepted methods for systematic literature maps. After building a reference set and constructing search queries for the data collection came the data set cleaning where the publications were first automatically filtered and consecutively manually reviewed to determine the relevant papers. The collected data was evaluated and visualized to help answer the defined research questions and present information. Finally the results of the thesis are discussed and the limitations and implications addressed.Diese Arbeit befasst sich mit der Durchführung einer Systematic Literature Map um einen Überblick über ein Feld zu gewähren. Das untersuchte Feld dieser Arbeit befasst sich mit der Verwendung der elastischen Eigenschaften der Cloud für Datenstrom Prozessierung im Big Data Umfeld. Bestandteil der Systematic Literature Map ist sowohl das Sammeln aller Publikationen, welche für das untersuchte Feld relevant sind, als auch die Auswertung und Präsentation der gesammelten Daten. Um die Informationen zielgerichtet zu evaluieren, wurden Forschungsfragen definiert, welche als Leitfaden dienen. Zu Beginn wurden die verwendeten wissenschaftlichen Methoden vorgestellt, welche sich an anerkannten Prozeduren orientieren. Nach dem zusammenstellen von einigen relevanten Publikationen, wurden auf deren Basis Suchanfragen für die Datensammlung erstellt. Danach wurden die Daten aus den Online Datenbanken bekannter Verleger exportiert und Duplikate entfernt. Um die endgültigen relevanten Publikationen festzustellen, wurden anhand von Schlagworten irrelevante Publikationen aussortiert und schließlich manuell einzeln bewertet. Die gesammelten Daten wurden teilweise automatisch ausgewertet und manuell klassifiziert um mit den Ergebnissen die vorher definierten Forschungsfragen zu beantworten. Abschließend werden die Ergebnisse diskutiert und die Einschränkungen und Implikationen dieser Arbeit behandelt

    Task Scheduling in Data Stream Processing Systems

    Get PDF
    In the era of big data, with streaming applications such as social media, surveillance monitoring and real-time search generating large volumes of data, efficient Data Stream Processing Systems (DSPSs) have become essential. When designing an efficient DSPS, a number of challenges need to be considered including task allocation, scalability, fault tolerance, QoS, parallelism degree, and state management, among others. In our research, we focus on task allocation as it has a significant impact on performance metrics such as data processing latency and system throughput. An application processed by DSPSs is represented as a Directed Acyclic Graph (DAG), where each vertex represents a task and the edges show the dataflow between the tasks. Task allocation can be defined as the assignment of the vertices in the DAG to the physical compute nodes such that the data movement between the nodes is minimised. Finding an optimal task placement for stream processing systems is NP-hard. Thus, approximate scheduling approaches are required to improve the performance of DSPSs. In this thesis, we present our three proposed schedulers, each having a different heuristic partitioning approach to minimise inter-node communication for either homogeneous or heterogeneous clusters. We demonstrate how each scheduler can efficiently assign groups of highly communicating tasks to compute nodes. Our schedulers are able to outperform two state-of-the-art schedulers for three micro-benchmarks and two real-world applications, increasing throughput and reducing data processing latency as a result of a better task placement

    On Design and Applications of Practical Concurrent Data Structures

    Get PDF
    The proliferation of multicore processors is having an enormous impact on software design and development. In order to exploit parallelism available in multicores, there is a need to design and implement abstractions that programmers can use for general purpose applications development. A common abstraction for coordinated access to memory is a concurrent data structure. Concurrent data structures are challenging to design and implement as they are required to be correct, scalable, and practical under various application constraints. In this thesis, we contribute to the design of efficient concurrent data structures, propose new design techniques and improvements to existing implementations. Additionally, we explore the utilization of concurrent data structures in demanding application contexts such as data stream processing.In the first part of the thesis, we focus on data structures that are difficult to parallelize due to inherent sequential bottlenecks. We present a lock-free vector design that efficiently addresses synchronization bottlenecks by utilizing the combining technique. Typical combining techniques are blocking. Our design introduces combining without sacrificing non-blocking progress guarantees. We extend the vector to present a concurrent lock-free unbounded binary heap that implements a priority queue with mutable priorities.In the second part of the thesis, we shift our focus to concurrent search data structures. In order to offer strong progress guarantee, typical implementations of non-blocking search data structures employ a "helping" mechanism. However, helping may result in performance degradation. We propose help-optimality, which expresses optimization in amortized step complexity of concurrent operations. To describe the concept, we revisit the lock-free designs of a linked-list and a binary search tree and present improved algorithms. We design the algorithms without using any language/platform specific constructs; we do not use bit-stealing or runtime type introspection of objects. Thus, our algorithms are portable. We further delve into multi-dimensional data and similarity search. We present the first lock-free multi-dimensional data structure and linearizable nearest neighbor search algorithm. Our algorithm for nearest neighbor search is generic and can be adapted to other data structures.In the last part of the thesis, we explore the utilization of concurrent data structures for deterministic stream processing. We propose solutions to two challenges prevalent in data stream processing: (1) efficient processing on cloud as well as edge devices and (2) deterministic data-parallel processing at high-throughput and low-latency. As a first step, we present a methodology for customization of streaming aggregation on low-power multicore embedded platforms. Then we introduce Viper, a communication module that can be integrated into stream processing engines for the coordination of threads analyzing data in parallel
    corecore