23 research outputs found

    Automatic Latency Management for {ROS 2}: {B}enefits, Challenges, and Open Problems

    Get PDF

    A Service-Defined Approach for Orchestration of Heterogeneous Applications in Cloud/Edge Platforms

    Get PDF
    Edge Computing is moving resources toward the network borders, thus enabling the deployment of a pool of new applications that benefit from the new distributed infrastructure. However, due to the heterogeneity of such applications, specific orchestration strategies need to be adopted for each deployment request. Each application can potentially require different optimization criteria and may prefer particular reactions upon the occurrence of the same event. This paper presents a Service- Defined approach for orchestrating cloud/edge services in a distributed fashion, where each application can define its own orchestration strategy by means of declarative statements, which are parsed into a Service-Defined Orchestrator (SDO). Moreover, to coordinate the coexistence of a variety of SDOs on the same infrastructure while preserving the resource assignment optimality, we present DRAGON, a Distributed Resource AssiGnment and OrchestratioN algorithm that seeks optimal partitioning of shared resources between different actors. We evaluate the advantages of our novel Service-Defined orchestration approach over some representative edge use cases, as well as measure convergence and performance of DRAGON on a prototype implementation, assessing the benefits compared to conventional orchestration approaches

    Task allocation in distributed multimedia systems based on the host-satellite model

    Get PDF
    Multimedia applications require intermediate processing between media sources and sinks. In addition to end-user machines intermediate computers can be used for performing media processing. This possibility leads to the problem of allocating processing components on various computers. In this paper, we study this problem in the context of star-shaped application graphs which have to be allocated between given end-user machines (satellites) and a central computer (host). The problem is formulated in terms of best achievable bottleneck resource usage. Several approaches are considered including anapproximate scheme and two fast-heuristics. Performance measurements show the efficiency of the considered approaches. A discussion of our approach shows important differences to solutions provided for related problems of graph partitioning and mapping

    Load shedding in network monitoring applications

    Get PDF
    Monitoring and mining real-time network data streams are crucial operations for managing and operating data networks. The information that network operators desire to extract from the network traffic is of different size, granularity and accuracy depending on the measurement task (e.g., relevant data for capacity planning and intrusion detection are very different). To satisfy these different demands, a new class of monitoring systems is emerging to handle multiple and arbitrary monitoring applications. Such systems must inevitably cope with the effects of continuous overload situations due to the large volumes, high data rates and bursty nature of the network traffic. These overload situations can severely compromise the accuracy and effectiveness of monitoring systems, when their results are most valuable to network operators. In this thesis, we propose a technique called load shedding as an effective and low-cost alternative to over-provisioning in network monitoring systems. It allows these systems to handle efficiently overload situations in the presence of multiple, arbitrary and competing monitoring applications. We present the design and evaluation of a predictive load shedding scheme that can shed excess load in front of extreme traffic conditions and maintain the accuracy of the monitoring applications within bounds defined by end users, while assuring a fair allocation of computing resources to non-cooperative applications. The main novelty of our scheme is that it considers monitoring applications as black boxes, with arbitrary (and highly variable) input traffic and processing cost. Without any explicit knowledge of the application internals, the proposed scheme extracts a set of features from the traffic streams to build an on-line prediction model of the resource requirements of each monitoring application, which is used to anticipate overload situations and control the overall resource usage by sampling the input packet streams. This way, the monitoring system preserves a high degree of flexibility, increasing the range of applications and network scenarios where it can be used. Since not all monitoring applications are robust against sampling, we then extend our load shedding scheme to support custom load shedding methods defined by end users, in order to provide a generic solution for arbitrary monitoring applications. Our scheme allows the monitoring system to safely delegate the task of shedding excess load to the applications and still guarantee fairness of service with non-cooperative users. We implemented our load shedding scheme in an existing network monitoring system and deployed it in a research ISP network. We present experimental evidence of the performance and robustness of our system with several concurrent monitoring applications during long-lived executions and using real-world traffic traces.Postprint (published version

    Scalable and responsive real time event processing using cloud computing

    Get PDF
    PhD ThesisCloud computing provides the potential for scalability and adaptability in a cost e ective manner. However, when it comes to achieving scalability for real time applications response time cannot be high. Many applications require good performance and low response time, which need to be matched with the dynamic resource allocation. The real time processing requirements can also be characterized by unpredictable rates of incoming data streams and dynamic outbursts of data. This raises the issue of processing the data streams across multiple cloud computing nodes. This research analyzes possible methodologies to process the real time data in which applications can be structured as multiple event processing networks and be partitioned over the set of available cloud nodes. The approach is based on queuing theory principles to encompass the cloud computing. The transformation of the raw data into useful outputs occurs in various stages of processing networks which are distributed across the multiple computing nodes in a cloud. A set of valid options is created to understand the response time requirements for each application. Under a given valid set of conditions to meet the response time criteria, multiple instances of event processing networks are distributed in the cloud nodes. A generic methodology to scale-up and scale-down the event processing networks in accordance to the response time criteria is de ned. The real time applications that support sophisticated decision support mechanisms need to comply with response time criteria consisting of interdependent data ow paradigms making it harder to improve the performance. Consideration is given for ways to reduce the latency,improve response time and throughput of the real time applications by distributing the event processing networks in multiple computing nodes

    A predictive fault-tolerance framework for IoT systems

    Get PDF
    As Internet of Things (IoT) systems scale, attributes such as availability, reliability, safety, maintainability, security, and performance become increasingly more important. A key challenge to realise IoT is how to provide a dependable infrastructure for the billions of expected IoT devices. A dependable IoT system is one that can defensibly be trusted to deliver its intended service within a given time period. To define a FT-support solution that is applicable to all IoT systems, it is important that error definition is a generic, language-agnostic process, so that FT can be applied as a software pattern. It must also be interoperable, so that FT support can be easily 'plugged into' any existing IoT system, which is facilitated by an adherence to standards and protocols. Lastly, it is important that FT support is, itself, fault tolerant, so that it can be depended on to provide correct support for IoT systems. The work in this thesis considers how real-time and historical data analysis techniques can be combined to monitor an IoT environment and analyse its short- and long-term data to make the system as resilient to failure as possible. Specifically, complex event processing (CEP) is proposed for real-time error detection based on the analysis of stream data in an IoT system, where errors are defined as nondeterministic finite automata (NFA). For long-term error analysis, machine learning (ML) is proposed to predict when an error is likely to occur and mitigate imminent system faults based on previous experience of erroneous system behaviour in the IoT system. The contribution is threefold: (1) a language-agnostic approach to error definition using NFAs, designed to provide 'FT as a service' for easy deployment and integration into existing IoT systems; (2) an implementation of NFAs on a bespoke CEP system, BoboCEP, that provides distributed, resilient event processing at the network edge via active replication; and (3) a ML approach to intelligent FT that can learn from system errors over time to ensure correct long-term FT support. The proposed solution was evaluated using two vertical-farming testbeds and a dataset from a real-world vertical farm. Results showed that the proposed solution could detect and predict the successful detection and recovery of erroneous system behaviours. A performance analysis of BoboCEP was conducted with favourable results

    Runtime Management of Multiprocessor Systems for Fault Tolerance, Energy Efficiency and Load Balancing

    Get PDF
    Efficiency of modern multiprocessor systems is hurt by unpredictable events: aging causes permanent faults that disable components; application spawnings and terminations taking place at arbitrary times, affect energy proportionality, causing energy waste; load imbalances reduce resource utilization, penalizing performance. This thesis demonstrates how runtime management can mitigate the negative effects of unpredictable events, making decisions guided by a combination of static information known in advance and parameters that only become known at runtime. We propose techniques for three different objectives: graceful degradation of aging-prone systems; energy efficiency of heterogeneous adaptive systems; and load balancing by means of work stealing. Managing aging-prone systems for graceful efficiency degradation, is based on a high-level system description that encapsulates hardware reconfigurability and workload flexibility and allows to quantify system efficiency and use it as an objective function. Different custom heuristics, as well as simulated annealing and a genetic algorithm are proposed to optimize this objective function as a response to component failures. Custom heuristics are one to two orders of magnitude faster, provide better efficiency for the first 20% of system lifetime and are less than 13% worse than a genetic algorithm at the end of this lifetime. Custom heuristics occasionally fail to satisfy reconfiguration cost constraints. As all algorithms\u27 execution time scales well with respect to system size, a genetic algorithm can be used as backup in these cases. Managing heterogeneous multiprocessors capable of Dynamic Voltage and Frequency Scaling is based on a model that accurately predicts performance and power: performance is predicted by combining static, application-specific profiling information and dynamic, runtime performance monitoring data; power is predicted using the aforementioned performance estimations and a set of platform-specific, static parameters, determined only once and used for every application mix. Three runtime heuristics are proposed, that make use of this model to perform partial search of the configuration space, evaluating a small set of configurations and selecting the best one. When best-effort performance is adequate, the proposed approach achieves 3% higher energy efficiency compared to the powersave governor and 2x better compared to the interactive and ondemand governors. When individual applications\u27 performance requirements are considered, the proposed approach is able to satisfy them, giving away 18% of system\u27s energy efficiency compared to the powersave, which however misses the performance targets by 23%; at the same time, the proposed approach maintains an efficiency advantage of about 55% compared to the other governors, which also satisfy the requirements. Lastly, to improve load balancing of multiprocessors, a partial and approximate view of the current load distribution among system cores is proposed, which consists of lightweight data structures and is maintained by each core through cheap operations. A runtime algorithm is developed, using this view whenever a core becomes idle, to perform victim core selection for work stealing, also considering system topology and memory hierarchy. Among 12 diverse imbalanced workloads, the proposed approach achieves better performance than random, hierarchical and local stealing for six workloads. Furthermore, it is at most 8% slower among the other six workloads, while competing strategies incur a penalty of at least 89% on some workload

    Characterizing and controlling program behavior using execution-time variance

    Get PDF
    Immersive applications, such as computer gaming, computer vision and video codecs, are an important emerging class of applications with QoS requirements that are difficult to characterize and control using traditional methods. This thesis proposes new techniques reliant on execution-time variance to both characterize and control program behavior. The proposed techniques are intended to be broadly applicable to a wide variety of immersive applications and are intended to be easy for programmers to apply without needing to gain specialized expertise. First, we create new QoS controllers that programmers can easily apply to their applications to achieve desired application-specific QoS objectives on any platform or application data-set, provided the programmers verify that their applications satisfy some simple domain requirements specific to immersive applications. The controllers adjust programmer-identified knobs every application frame to effect desired values for programmer-identified QoS metrics. The control techniques are novel in that they do not require the user to provide any kind of application behavior models, and are effective for immersive applications that defy the traditional requirements for feedback controller construction. Second, we create new profiling techniques that provide visibility into the behavior of a large complex application, inferring behavior relationships across application components based on the execution-time variance observed at all levels of granularity of the application functionality. Additionally for immersive applications, some of the most important QoS requirements relate to managing the execution-time variance of key application components, for example, the frame-rate. The profiling techniques not only identify and summarize behavior directly relevant to the QoS aspects related to timing, but also indirectly reveal non-timing related properties of behavior, such as the identification of components that are sensitive to data, or those whose behavior changes based on the call-context.Ph.D

    Building the Future Internet through FIRE

    Get PDF
    The Internet as we know it today is the result of a continuous activity for improving network communications, end user services, computational processes and also information technology infrastructures. The Internet has become a critical infrastructure for the human-being by offering complex networking services and end-user applications that all together have transformed all aspects, mainly economical, of our lives. Recently, with the advent of new paradigms and the progress in wireless technology, sensor networks and information systems and also the inexorable shift towards everything connected paradigm, first as known as the Internet of Things and lately envisioning into the Internet of Everything, a data-driven society has been created. In a data-driven society, productivity, knowledge, and experience are dependent on increasingly open, dynamic, interdependent and complex Internet services. The challenge for the Internet of the Future design is to build robust enabling technologies, implement and deploy adaptive systems, to create business opportunities considering increasing uncertainties and emergent systemic behaviors where humans and machines seamlessly cooperate
    corecore