608 research outputs found
Model-driven Scheduling for Distributed Stream Processing Systems
Distributed Stream Processing frameworks are being commonly used with the
evolution of Internet of Things(IoT). These frameworks are designed to adapt to
the dynamic input message rate by scaling in/out.Apache Storm, originally
developed by Twitter is a widely used stream processing engine while others
includes Flink, Spark streaming. For running the streaming applications
successfully there is need to know the optimal resource requirement, as
over-estimation of resources adds extra cost.So we need some strategy to come
up with the optimal resource requirement for a given streaming application. In
this article, we propose a model-driven approach for scheduling streaming
applications that effectively utilizes a priori knowledge of the applications
to provide predictable scheduling behavior. Specifically, we use application
performance models to offer reliable estimates of the resource allocation
required. Further, this intuition also drives resource mapping, and helps
narrow the estimated and actual dataflow performance and resource utilization.
Together, this model-driven scheduling approach gives a predictable application
performance and resource utilization behavior for executing a given DSPS
application at a target input stream rate on distributed resources.Comment: 54 page
Adaptive Energy-aware Scheduling of Dynamic Event Analytics across Edge and Cloud Resources
The growing deployment of sensors as part of Internet of Things (IoT) is
generating thousands of event streams. Complex Event Processing (CEP) queries
offer a useful paradigm for rapid decision-making over such data sources. While
often centralized in the Cloud, the deployment of capable edge devices on the
field motivates the need for cooperative event analytics that span Edge and
Cloud computing. Here, we identify a novel problem of query placement on edge
and Cloud resources for dynamically arriving and departing analytic dataflows.
We define this as an optimization problem to minimize the total makespan for
all event analytics, while meeting energy and compute constraints of the
resources. We propose 4 adaptive heuristics and 3 rebalancing strategies for
such dynamic dataflows, and validate them using detailed simulations for 100 -
1000 edge devices and VMs. The results show that our heuristics offer
O(seconds) planning time, give a valid and high quality solution in all cases,
and reduce the number of query migrations. Furthermore, rebalance strategies
when applied in these heuristics have significantly reduced the makespan by
around 20 - 25%.Comment: 11 pages, 7 figure
Running deep learning applications on resource constrained devices
The high accuracy of Deep Neural Networks (DNN) come at the expense of high computational cost and memory requirements. During inference, the data is often collected on the edge device which are resource-constrained. The existing solutions for edge deployment include i) executing the entire DNN on the edge (EDGE-ONLY), ii) sending the input from edge to cloud where the DNN is processed (CLOUD-ONLY), and iii) splitting the DNN to execute partially on the edge and partially on the cloud (SPLIT). The choice of deployment between EDGE-ONLY, CLOUD-ONLY and SPLIT is determined by several operating constraints such as device resources and network speed, and application constraints such as latency and accuracy. The EDGE-ONLY approach requires compact DNN with low compute and memory requirements. Thus, the emerging class of DNNs employ low-rank convolutions (LRCONVs) which reduce one or more dimensions compared to the spatial convolutions (CONV). Prior research in hardware accelerators has largely focused on CONVs. The LRCONVs such as depthwise and pointwise convolutions exhibit lower arithmetic intensity and lower data reuse. Thus, LRCONVs result in low hardware utilization and high latency. In our first work, we systematically explore the design space of Cross-layer dataflows to exploit data reuse across layers for emerging DNNs in EDGE-ONLY scenarios. We develop novel fine-grain cross-layer dataflows for LRCONVs that support partial loop dimension completion. Our tool, X-Layer decouples the nested loops in a pipeline and combines them to create a common outer dataflow and several inner dataflows. The CLOUD-ONLY approach can suffer from high latency due to the high transmission cost of large input data from the edge to the cloud. This could be a problem, especially for latency-critical applications. Thankfully, the SPLIT approach reduces latency compared to the CLOUD-ONLY approach. However, existing solutions only split the DNN in floating-point precision. Executing floating-point precision on the edge device can occupy large memory and reduce the potential options for SPLIT solutions. In our second work, we expand and explore the search space of SPLIT solutions by jointly applying mixed-precision post-training quantization and DNN graph split. Our work, Auto-Split finds a balance in the trade-off among the model accuracy, edge device capacity, transmission cost, and the overall latency
Technical Report: A Trace-Based Performance Study of Autoscaling Workloads of Workflows in Datacenters
To improve customer experience, datacenter operators offer support for
simplifying application and resource management. For example, running workloads
of workflows on behalf of customers is desirable, but requires increasingly
more sophisticated autoscaling policies, that is, policies that dynamically
provision resources for the customer. Although selecting and tuning autoscaling
policies is a challenging task for datacenter operators, so far relatively few
studies investigate the performance of autoscaling for workloads of workflows.
Complementing previous knowledge, in this work we propose the first
comprehensive performance study in the field. Using trace-based simulation, we
compare state-of-the-art autoscaling policies across multiple application
domains, workload arrival patterns (e.g., burstiness), and system utilization
levels. We further investigate the interplay between autoscaling and regular
allocation policies, and the complexity cost of autoscaling. Our quantitative
study focuses not only on traditional performance metrics and on
state-of-the-art elasticity metrics, but also on time- and memory-related
autoscaling-complexity metrics. Our main results give strong and quantitative
evidence about previously unreported operational behavior, for example, that
autoscaling policies perform differently across application domains and by how
much they differ.Comment: Technical Report for the CCGrid 2018 submission "A Trace-Based
Performance Study of Autoscaling Workloads of Workflows in Datacenters
- …