549 research outputs found

    Markov Decision Process Based Energy-Efficient On-Line Scheduling for Slice-Parallel Video Decoders on Multicore Systems

    Get PDF
    We consider the problem of energy-efficient on-line scheduling for slice-parallel video decoders on multicore systems. We assume that each of the processors are Dynamic Voltage Frequency Scaling (DVFS) enabled such that they can independently trade off performance for power, while taking the video decoding workload into account. In the past, scheduling and DVFS policies in multi-core systems have been formulated heuristically due to the inherent complexity of the on-line multicore scheduling problem. The key contribution of this report is that we rigorously formulate the problem as a Markov decision process (MDP), which simultaneously takes into account the on-line scheduling and per-core DVFS capabilities; the power consumption of the processor cores and caches; and the loss tolerant and dynamic nature of the video decoder's traffic. In particular, we model the video traffic using a Direct Acyclic Graph (DAG) to capture the precedence constraints among frames in a Group of Pictures (GOP) structure, while also accounting for the fact that frames have different display/decoding deadlines and non-deterministic decoding complexities. The objective of the MDP is to minimize long-term power consumption subject to a minimum Quality of Service (QoS) constraint related to the decoder's throughput. Although MDPs notoriously suffer from the curse of dimensionality, we show that, with appropriate simplifications and approximations, the complexity of the MDP can be mitigated. We implement a slice-parallel version of H.264 on a multiprocessor ARM (MPARM) virtual platform simulator, which provides cycle-accurate and bus signal-accurate simulation for different processors. We use this platform to generate realistic video decoding traces with which we evaluate the proposed on-line scheduling algorithm in Matlab

    Power-Aware HEVC Decoding with Tunable Image Quality

    Get PDF
    International audienceA high pressure is put on mobile devices to support increasingly advanced applications requiring more processing capabilities. Among those, the emerging High Efficiency Video Coding (HEVC) provides a better video quality for the same bit rate than the previous H.264 standard. A limitation in the usability of a mobile video playing device is the lack of support for guaranteeing stand-by time and up time for battery driven devices. The Green Metadata initiative within the MPEG standard was launched to address the power saving issues of the decoder and defines the technology requirements. In this paper, we propose a HEVC decoder with tunable decoding quality levels for maximum power savings as suggested in the scope of the Green Metadata initiative. Our experiments reveal that the modified HEVC video decoder can save up to 28 % of power consumption in real-world platforms while keeping better quality than decoding with H.264

    Online Energy-Efficient Task-Graph Scheduling for Multicore Platforms

    Get PDF
    Numerous Directed-Acyclic Graph (DAG) schedulers have been developed to improve the energy efficiency of various multi-core platforms. However, these schedulers make a priori assumptions about the relationship between the task dependencies, and they are unable to adapt online to the characteristics of each application without offline profiling data. Therefore, we propose a novel energy-efficient online scheduling solution for the general DAG model to address the two aforementioned problems. Our proposed scheduler is able to adapt at runtime to the characteristics of each application by making smart foresighted decisions, which take into account the impact of current scheduling decisions on the present and future deadline miss rates and energy efficiency. Moreover, our scheduler is able to efficiently handle execution with very limited resources by avoiding scheduling tasks that are expected to miss their deadlines and do not have an impact on future deadlines. We validate our approach against state-of-the-art solutions. In our first set of experiments, our results with the H.264 video decoder demonstrate that the proposed low-complexity solution for the general DAG model reduces the energy consumption by up to 15% compared to an existing sophisticated and complex scheduler that was specifically built for the H.264 video decoder application. In our second set of experiments, our results with different configurations of synthetic DAGs demonstrate that our proposed solution is able to reduce the energy consumption by up to 55% and the deadline miss rates by up to 99% compared to a second existing scheduling solution. Finally, we show that our DFM and scheduler have low complexities on a real mobile platform and we show that our solution is resilient to workload prediction errors by using different estimator accuracies

    Stochastic Performance Throttling for Multicore Architectures under Spatial and Temporal Dependencies

    Get PDF

    Energy and Computing Ressource Aware Feedback Control Strategies for H.264 Video Decoding

    Get PDF
    A shortened version of this report has been submitted for publication in the International Journal of Systems ScienceEmbedded devices using highly integrated chips must cope with conflicting constraints, while executing computationally demanding applications under limited energy storage. Automatic control and feedback loops appear to be an e ective solution to simultaneously accommodate for performance uncertainties due to the tiny scale gates variability, varying and poorly predictable computing demands and limited energy storage constraints. This report presents the practical example of an embedded video decoder controlled by several cascaded feedback loops to carry out the trade-o between decoding quality and energy consumption, exploiting the frequency and voltage scaling capabilities of the chip. The inner loop controls the Dynamic Voltage and Frequency Scaling (DVFS) through a fast predictive control strategy to adapt the computing speed of the chip to the demands of the video flow decoder. The outer loop is fed back with measures coming from the current frame decoding execution, and computes the scheduling set-points needed by the inner loop to process the next frame decoding. The feedback loops have been implemented on a standard PC and some experimental results are provided. It is shown that a noticeable reduction of the energy consumption can be achieved through a very small execution overhead while preserving a requested decoding quality, and that the robustness of feedback loops accommodates for the uncertainty coming both from the silicon's variability and from the demanded computing burden

    Learning Augmented Optimization for Network Softwarization in 5G

    Get PDF
    The rapid uptake of mobile devices and applications are posing unprecedented traffic burdens on the existing networking infrastructures. In order to maximize both user experience and investment return, the networking and communications systems are evolving to the next gen- eration – 5G, which is expected to support more flexibility, agility, and intelligence towards provisioned services and infrastructure management. Fulfilling these tasks is challenging, as nowadays networks are increasingly heterogeneous, dynamic and expanded with large sizes. Network softwarization is one of the critical enabling technologies to implement these requirements in 5G. In addition to these problems investigated in preliminary researches about this technology, many new emerging application requirements and advanced opti- mization & learning technologies are introducing more challenges & opportunities for its fully application in practical production environment. This motivates this thesis to develop a new learning augmented optimization technology, which merges both the advanced opti- mization and learning techniques to meet the distinct characteristics of the new application environment. To be more specific, the abstracts of the key contents in this thesis are listed as follows: • We first develop a stochastic solution to augment the optimization of the Network Function Virtualization (NFV) services in dynamical networks. In contrast to the dominant NFV solutions applied for the deterministic networking environments, the inherent network dynamics and uncertainties from 5G infrastructure are impeding the rollout of NFV in many emerging networking applications. Therefore, Chapter 3 investigates the issues of network utility degradation when implementing NFV in dynamical networks, and proposes a robust NFV solution with full respect to the underlying stochastic features. By exploiting the hierarchical decision structures in this problem, a distributed computing framework with two-level decomposition is designed to facilitate a distributed implementation of the proposed model in large-scale networks. • Next, Chapter 4 aims to intertwin the traditional optimization and learning technologies. In order to reap the merits of both optimization and learning technologies but avoid their limitations, promissing integrative approaches are investigated to combine the traditional optimization theories with advanced learning methods. Subsequently, an online optimization process is designed to learn the system dynamics for the network slicing problem, another critical challenge for network softwarization. Specifically, we first present a two-stage slicing optimization model with time-averaged constraints and objective to safeguard the network slicing operations in time-varying networks. Directly solving an off-line solution to this problem is intractable since the future system realizations are unknown before decisions. To address this, we combine the historical learning and Lyapunov stability theories, and develop a learning augmented online optimization approach. This facilitates the system to learn a safe slicing solution from both historical records and real-time observations. We prove that the proposed solution is always feasible and nearly optimal, up to a constant additive factor. Finally, simulation experiments are also provided to demonstrate the considerable improvement of the proposals. • The success of traditional solutions to optimizing the stochastic systems often requires solving a base optimization program repeatedly until convergence. For each iteration, the base program exhibits the same model structure, but only differing in their input data. Such properties of the stochastic optimization systems encourage the work of Chapter 5, in which we apply the latest deep learning technologies to abstract the core structures of an optimization model and then use the learned deep learning model to directly generate the solutions to the equivalent optimization model. In this respect, an encoder-decoder based learning model is developed in Chapter 5 to improve the optimization of network slices. In order to facilitate the solving of the constrained combinatorial optimization program in a deep learning manner, we design a problem-specific decoding process by integrating program constraints and problem context information into the training process. The deep learning model, once trained, can be used to directly generate the solution to any specific problem instance. This avoids the extensive computation in traditional approaches, which re-solve the whole combinatorial optimization problem for every instance from the scratch. With the help of the REINFORCE gradient estimator, the obtained deep learning model in the experiments achieves significantly reduced computation time and optimality loss

    Error tolerant multimedia stream processing: There's plenty of room at the top (of the system stack)

    Get PDF
    There is a growing realization that the expected fault rates and energy dissipation stemming from increases in CMOS integration will lead to the abandonment of traditional system reliability in favor of approaches that offer reliability to hardware-induced errors across the application, runtime support, architecture, device and integrated-circuit (IC) layers. Commercial stakeholders of multimedia stream processing (MSP) applications, such as information retrieval, stream mining systems, and high-throughput image and video processing systems already feel the strain of inadequate system-level scaling and robustness under the always-increasing user demand. While such applications can tolerate certain imprecision in their results, today's MSP systems do not support a systematic way to exploit this aspect for cross-layer system resilience. However, research is currently emerging that attempts to utilize the error-tolerant nature of MSP applications for this purpose. This is achieved by modifications to all layers of the system stack, from algorithms and software to the architecture and device layer, and even the IC digital logic synthesis itself. Unlike conventional processing that aims for worst-case performance and accuracy guarantees, error-tolerant MSP attempts to provide guarantees for the expected performance and accuracy. In this paper we review recent advances in this field from an MSP and a system (layer-by-layer) perspective, and attempt to foresee some of the components of future cross-layer error-tolerant system design that may influence the multimedia and the general computing landscape within the next ten years. © 1999-2012 IEEE
    corecore