Search CORE

108 research outputs found

Quality of Service Aware Data Stream Processing for Highly Dynamic and Scalable Applications

Author: Al Jawarneh Isam Mashhour Hasan <1981>
Publication venue: Alma Mater Studiorum - Università di Bologna
Publication date: 02/04/2020
Field of study

Huge amounts of georeferenced data streams are arriving daily to data stream management systems that are deployed for serving highly scalable and dynamic applications. There are innumerable ways at which those loads can be exploited to gain deep insights in various domains. Decision makers require an interactive visualization of such data in the form of maps and dashboards for decision making and strategic planning. Data streams normally exhibit fluctuation and oscillation in arrival rates and skewness. Those are the two predominant factors that greatly impact the overall quality of service. This requires data stream management systems to be attuned to those factors in addition to the spatial shape of the data that may exaggerate the negative impact of those factors. Current systems do not natively support services with quality guarantees for dynamic scenarios, leaving the handling of those logistics to the user which is challenging and cumbersome. Three workloads are predominant for any data stream, batch processing, scalable storage and stream processing. In this thesis, we have designed a quality of service aware system, SpatialDSMS, that constitutes several subsystems that are covering those loads and any mixed load that results from intermixing them. Most importantly, we natively have incorporated quality of service optimizations for processing avalanches of geo-referenced data streams in highly dynamic application scenarios. This has been achieved transparently on top of the codebases of emerging de facto standard best-in-class representatives, thus relieving the overburdened shoulders of the users in the presentation layer from having to reason about those services. Instead, users express their queries with quality goals and our system optimizers compiles that down into query plans with an embedded quality guarantee and leaves logistic handling to the underlying layers. We have developed standard compliant prototypes for all the subsystems that constitutes SpatialDSMS

AMS Tesi di Dottorato

Optimal Control of Distributed Computing Networks with Mixed-Cast Traffic Flows

Author: Llorca Jaime
Modiano Eytan
Sinha Abhishek
Tulino Antonia
Zhang Jianan
Publication venue
Publication date: 01/01/2018
Field of study

Distributed computing networks, tasked with both packet transmission and processing, require the joint optimization of communication and computation resources. We develop a dynamic control policy that determines both routes and processing locations for packets upon their arrival at a distributed computing network. The proposed policy, referred to as Universal Computing Network Control (UCNC), guarantees that packets i) are processed by a specified chain of service functions, ii) follow cycle-free routes between consecutive functions, and iii) are delivered to their corresponding set of destinations via proper packet duplications. UCNC is shown to be throughput-optimal for any mix of unicast and multicast traffic, and is the first throughput-optimal policy for non-unicast traffic in distributed computing networks with both communication and computation constraints. Moreover, simulation results suggest that UCNC yields substantially lower average packet delay compared with existing control policies for unicast traffic

arXiv.org e-Print Archive

Archivio della ricerca - Università degli studi di Napoli Federico II

DSpace@MIT

Storage and Ingestion Systems in Support of Stream Processing: A Survey

Author: Antoniu Gabriel
Bortoli Stefano
Costan Alexandru
Marcu Ovidiu-Cristian
Nicolae Bogdan
Pérez-Hernández María,
Tudoran Radu
Publication venue: HAL CCSD
Publication date: 29/11/2018
Field of study

Under the pressure of massive, exponentially increasing amounts ofheterogeneous data that are generated faster and faster, Big Data analyticsapplications have seen a shift from batch processing to stream processing,which can reduce the time needed to obtain meaningful insight dramatically.Stream processing is particularly well suited to address the challenges of fog/edgecomputing: much of this massive data comes from Internet of Things (IoT)devices and needs to be continuously funneled through an edge infrastructuretowards centralized clouds. Thus, it is only natural to process data on theirway as much as possible rather than wait for streams to accumulate on thecloud. Unfortunately, state-of-the-art stream processing systems are not wellsuited for this role: the data are accumulated (ingested), processed andpersisted (stored) separately, often using different services hosted ondifferent physical machines/clusters. Furthermore, there is only limited support foradvanced data manipulations, which often forces application developers tointroduce custom solutions and workarounds. In this survey article, wecharacterize the main state-of-the-art stream storage and ingestion systems.We identify the key aspects and discuss limitations and missing features inthe context of stream processing for fog/edge and cloud computing. The goal is tohelp practitioners understand and prepare for potential bottlenecks when usingsuch state-of-the-art systems. In particular, we discuss both functional(partitioning, metadata, search support, message routing, backpressuresupport) and non-functional aspects (high availability, durability,scalability, latency vs. throughput). As a conclusion of our study, weadvocate for a unified stream storage and ingestion system to speed-up datamanagement and reduce I/O redundancy (both in terms of storage space andnetwork utilization)

INRIA a CCSD electronic archive server

Big Data Analytics in Static and Streaming Provenance

Author: Chen Peng
Publication venue: [Bloomington, Ind.] : Indiana University
Publication date: 01/04/2016
Field of study

Thesis (Ph.D.) - Indiana University, Informatics and Computing,, 2016With recent technological and computational advances, scientists increasingly integrate sensors and model simulations to understand spatial, temporal, social, and ecological relationships at unprecedented scale. Data provenance traces relationships of entities over time, thus providing a unique view on over-time behavior under study. However, provenance can be overwhelming in both volume and complexity; the now forecasting potential of provenance creates additional demands. This dissertation focuses on Big Data analytics of static and streaming provenance. It develops filters and a non-preprocessing slicing technique for in-situ querying of static provenance. It presents a stream processing framework for online processing of provenance data at high receiving rate. While the former is sufficient for answering queries that are given prior to the application start (forward queries), the latter deals with queries whose targets are unknown beforehand (backward queries). Finally, it explores data mining on large collections of provenance and proposes a temporal representation of provenance that can reduce the high dimensionality while effectively supporting mining tasks like clustering, classification and association rules mining; and the temporal representation can be further applied to streaming provenance as well. The proposed techniques are verified through software prototypes applied to Big Data provenance captured from computer network data, weather models, ocean models, remote (satellite) imagery data, and agent-based simulations of agricultural decision making

IUScholarWorks (University of Indiana)

A Survey on the Evolution of Stream Processing Systems

Author: Carbone Paris
Fragkoulis Marios
Kalavri Vasiliki
Katsifodimos Asterios
Publication venue
Publication date: 03/08/2020
Field of study

Stream processing has been an active research field for more than 20 years, but it is now witnessing its prime time due to recent successful efforts by the research community and numerous worldwide open-source communities. This survey provides a comprehensive overview of fundamental aspects of stream processing systems and their evolution in the functional areas of out-of-order data management, state management, fault tolerance, high availability, load management, elasticity, and reconfiguration. We review noteworthy past research findings, outline the similarities and differences between early ('00-'10) and modern ('11-'18) streaming systems, and discuss recent trends and open problems.Comment: 34 pages, 15 figures, 5 table

arXiv.org e-Print Archive

Recommended from our members

Destination-based Routing and Circuit Allocation for Future Traffic Growth

Author: Yin Ping
Publication venue: eScholarship, University of California
Publication date: 01/01/2020
Field of study

Internet traffic continues to grow relentlessly, driven largely by increasingly high- \\ resolution video streaming, the increasing adoption of cloud computing, the emergence of 5G networks, and the ever-growing reach of social media and social networks. Existing networks use packet switching to route packets on a hop-by-hop basis from the source to the destination. However, they suffer from two shortcomings. First, in existing networks, packets are routed along a fixed shortest path using the Open Shortest Path First (OSPF) protocol or obliviously load-balanced across equal-cost paths using the Equal-Cost Multi-Path (ECMP) protocol. These routing protocols do not fully utilize the network capacity because they do not adapt to network congestions in their routing decisions. Second, although studies have shown that the majority of packets processed by Internet routers are pass-through traffic, packets nonetheless have to be queued and routed at every hop in existing networks, which unnecessarily adds substantial delays and processing costs.In this thesis, we present two new approaches to overcome these shortcomings. First, we propose new backpressure-based routing algorithms which use only shortest-path routes when they are sufficient to accommodate the given traffic load, but will incrementally expand routing choices as needed to accommodate increasing traffic loads. This avoids the poor delay performance inherent in backpressure-based routing algorithms where packets may take long detours under light or moderate loads, and still retains the notable advantage, the network-wide optimal throughput, because packets are adaptively routed along less congested paths.Second, we propose a unified packet and circuit switched network in which the underlying optical transport is used to circuit-switch pass-through traffic by means of pre-established circuits. This avoids unnecessary packet queuing delays and processing costs at each hop. We propose a novel convex optimization framework based on a new destination-based multicommodity flow formulation for the allocation of circuits in such unified networks

eScholarship - University of California