80,793 research outputs found
Run Time Approximation of Non-blocking Service Rates for Streaming Systems
Stream processing is a compute paradigm that promises safe and efficient
parallelism. Modern big-data problems are often well suited for stream
processing's throughput-oriented nature. Realization of efficient stream
processing requires monitoring and optimization of multiple communications
links. Most techniques to optimize these links use queueing network models or
network flow models, which require some idea of the actual execution rate of
each independent compute kernel within the system. What we want to know is how
fast can each kernel process data independent of other communicating kernels.
This is known as the "service rate" of the kernel within the queueing
literature. Current approaches to divining service rates are static. Modern
workloads, however, are often dynamic. Shared cloud systems also present
applications with highly dynamic execution environments (multiple users,
hardware migration, etc.). It is therefore desirable to continuously re-tune an
application during run time (online) in response to changing conditions. Our
approach enables online service rate monitoring under most conditions,
obviating the need for reliance on steady state predictions for what are
probably non-steady state phenomena. First, some of the difficulties associated
with online service rate determination are examined. Second, the algorithm to
approximate the online non-blocking service rate is described. Lastly, the
algorithm is implemented within the open source RaftLib framework for
validation using a simple microbenchmark as well as two full streaming
applications.Comment: technical repor
Resource optimization of edge servers dealing with priority-based workloads by utilizing service level objective-aware virtual rebalancing
IoT enables profitable communication between sensor/actuator devices and the cloud. Slow network causing Edge data to lack Cloud analytics hinders real-time analytics adoption. VRebalance solves priority-based workload performance for stream processing at the Edge. BO is used in VRebalance to prioritize workloads and find optimal resource configurations for efficient resource management. Apache Storm platform was used with RIoTBench IoT benchmark tool for real-time stream processing. Tools were used to evaluate VRebalance. Study shows VRebalance is more effective than traditional methods, meeting SLO targets despite system changes. VRebalance decreased SLO violation rates by almost 30% for static priority-based workloads and 52.2% for dynamic priority-based workloads compared to hill climbing algorithm. Using VRebalance decreased SLO violations by 66.1% compared to Apache Storm\u27s default allocation
Combining edge and cloud computing for mobility analytics
Mobility analytics using data generated from the Internet of Mobile Things
(IoMT) is facing many challenges which range from the ingestion of data streams
coming from a vast number of fog nodes and IoMT devices to avoiding overflowing
the cloud with useless massive data streams that can trigger bottlenecks [1].
Managing data flow is becoming an important part of the IoMT because it will
dictate in which platform analytical tasks should run in the future. Data flows
are usually a sequence of out-of-order tuples with a high data input rate, and
mobility analytics requires a real-time flow of data in both directions, from
the edge to the cloud, and vice-versa. Before pulling the data streams to the
cloud, edge data stream processing is needed for detecting missing, broken, and
duplicated tuples in addition to recognize tuples whose arrival time is out of
order. Analytical tasks such as data filtering, data cleaning and low-level
data contextualization can be executed at the edge of a network. In contrast,
more complex analytical tasks such as graph processing can be deployed in the
cloud, and the results of ad-hoc queries and streaming graph analytics can be
pushed to the edge as needed by a user application. Graphs are efficient
representations used in mobility analytics because they unify knowledge about
connectivity, proximity and interaction among moving things. This poster
describes the preliminary results from our experimental prototype developed for
supporting transit systems, in which edge and cloud computing are combined to
process transit data streams forwarded from fog nodes into a cloud. The
motivation of this research is to understand how to perform meaningfulness
mobility analytics on transit feeds by combining cloud and fog computing
architectures in order to improve fleet management, mass transit and remote
asset monitoringComment: Edge Computing, Cloud Computing, Mobility Analytics, Internet of
Mobile Things, Edge Fog Fabri
Fog Computing: Issues, Challenges and Future Directions
In Cloud Computing, all the processing of the data collected by the node is done in the central server. This involves a lot of time as data has to be transferred from the node to central server before the processing of data can be done in the server. Also it is not practical to stream terabytes of data from the node to the cloud and back. To overcome these disadvantages, an extension of cloud computing, known as fog computing, is introduced. In this, the processing of data is done completely in the node if the data does not require higher computing power and is done partially if the data requires high computing power, after which the data is transferred to the central server for the remaining computations. This greatly reduces the time involved in the process and is more efficient as the central server is not overloaded. Fog is quite useful in geographically dispersed areas where connectivity can be irregular. The ideal use case requires intelligence near the edge where ultra-low latency is critical, and is promised by fog computing. The concepts of cloud computing and fog computing will be explored and their features will be contrasted to understand which is more efficient and better suited for real-time application
The Intersection of Function-as-a-Service and Stream Computing
With recent advancements in the field of computing including the emergence of cloud computing, the consumption and accessibility of computational resources have increased drastically. Although there have been significant movements towards more sustainable computing, there are many more steps to be taken to decrease the amount of energy consumed and greenhouse gases released from the computing sector. Historically, the switch from on-premises computing to cloud computing has led to less energy consumption through the design of efficient data centers. By releasing direct control of the hardware that their software is run on, an organization can also increase efficiency and reduce costs. A new development in cloud computing has been serverless computing. Even though the term "serverless" is a misnomer because all applications are still executed on servers, serverless lets an organization resign another level of control, managing instances of virtual machines, to their cloud provider in order to reduce their cost. The cloud provider then provisions resources on-demand enabling less idle time. This reduction of idle time is a direct reduction of computing resources used, therefore resulting in a decrease in energy consumption. One form of serverless computing, Function-as-a-Service (FaaS), may have a promising future replacing some stream computing applications in order to increase efficiency and reduce waste. To explore these possibilities, the development of a stream processing application using traditional methods through Kafka Streams and FaaS through AWS Lambda was completed in order to demonstrate that FaaS can be used for stateless stream processing
Curracurrong: a stream processing system for distributed environments
Advances in technology have given rise to applications that are deployed on wireless sensor networks (WSNs), the cloud, and the Internet of things. There are many emerging applications, some of which include sensor-based monitoring, web traffic processing, and network monitoring. These applications collect large amount of data as an unbounded sequence of events and process them to generate a new sequences of events. Such applications need an adequate programming model that can process large amount of data with minimal latency; for this purpose, stream programming, among other paradigms, is ideal. However, stream programming needs to be adapted to meet the challenges inherent in running it in distributed environments. These challenges include the need for modern domain specific language (DSL), the placement of computations in the network to minimise energy costs, and timeliness in real-time applications. To overcome these challenges we developed a stream programming model that achieves easy-to-use programming interface, energy-efficient actor placement, and timeliness. This thesis presents Curracurrong, a stream data processing system for distributed environments. In Curracurrong, a query is represented as a stream graph of stream operators and communication channels. Curracurrong provides an extensible stream operator library and adapts to a wide range of applications. It uses an energy-efficient placement algorithm that optimises communication and computation. We extend the placement problem to support dynamically changing networks, and develop a dynamic program with polynomially bounded runtime to solve the placement problem. In many stream-based applications, real-time data processing is essential. We propose an approach that measures time delays in stream query processing; this model measures the total computational time from input to output of a query, i.e., end-to-end delay
Curracurrong: a stream processing system for distributed environments
Advances in technology have given rise to applications that are deployed on wireless sensor networks (WSNs), the cloud, and the Internet of things. There are many emerging applications, some of which include sensor-based monitoring, web traffic processing, and network monitoring. These applications collect large amount of data as an unbounded sequence of events and process them to generate a new sequences of events. Such applications need an adequate programming model that can process large amount of data with minimal latency; for this purpose, stream programming, among other paradigms, is ideal. However, stream programming needs to be adapted to meet the challenges inherent in running it in distributed environments. These challenges include the need for modern domain specific language (DSL), the placement of computations in the network to minimise energy costs, and timeliness in real-time applications. To overcome these challenges we developed a stream programming model that achieves easy-to-use programming interface, energy-efficient actor placement, and timeliness. This thesis presents Curracurrong, a stream data processing system for distributed environments. In Curracurrong, a query is represented as a stream graph of stream operators and communication channels. Curracurrong provides an extensible stream operator library and adapts to a wide range of applications. It uses an energy-efficient placement algorithm that optimises communication and computation. We extend the placement problem to support dynamically changing networks, and develop a dynamic program with polynomially bounded runtime to solve the placement problem. In many stream-based applications, real-time data processing is essential. We propose an approach that measures time delays in stream query processing; this model measures the total computational time from input to output of a query, i.e., end-to-end delay
Decentralized Control of Distributed Cloud Networks with Generalized Network Flows
Emerging distributed cloud architectures, e.g., fog and mobile edge
computing, are playing an increasingly important role in the efficient delivery
of real-time stream-processing applications such as augmented reality,
multiplayer gaming, and industrial automation. While such applications require
processed streams to be shared and simultaneously consumed by multiple
users/devices, existing technologies lack efficient mechanisms to deal with
their inherent multicast nature, leading to unnecessary traffic redundancy and
network congestion. In this paper, we establish a unified framework for
distributed cloud network control with generalized (mixed-cast) traffic flows
that allows optimizing the distributed execution of the required packet
processing, forwarding, and replication operations. We first characterize the
enlarged multicast network stability region under the new control framework
(with respect to its unicast counterpart). We then design a novel queuing
system that allows scheduling data packets according to their current
destination sets, and leverage Lyapunov drift-plus-penalty theory to develop
the first fully decentralized, throughput- and cost-optimal algorithm for
multicast cloud network flow control. Numerical experiments validate analytical
results and demonstrate the performance gain of the proposed design over
existing cloud network control techniques
Incremental Processing and Optimization of Update Streams
Over the recent years, we have seen an increasing number of applications in networking, sensor networks, cloud computing, and environmental monitoring, which monitor, plan, control, and make decisions over data streams from multiple sources. We are interested in extending traditional stream processing techniques to meet the new challenges of these applications. Generally, in order to support genuine continuous query optimization and processing over data streams, we need to systematically understand how to address incremental optimization and processing of update streams for a rich class of queries commonly used in the applications.
Our general thesis is that efficient incremental processing and re-optimization of update streams can be achieved by various incremental view maintenance techniques if we cast the problems as incremental view maintenance problems over data streams. We focus on two incremental processing of update streams challenges currently not addressed in existing work on stream query processing: incremental processing of transitive closure queries over data streams, and incremental re-optimization of queries. In addition to addressing these specific challenges, we also develop a working prototype system Aspen, which serves as an end-to-end stream processing system that has been deployed as the foundation for a case study of our SmartCIS application. We validate our solutions both analytically and empirically on top of our prototype system Aspen, over a variety of benchmark workloads such as TPC-H and LinearRoad Benchmarks
- …