241 research outputs found

    A model-based approach for multiple QoS in scheduling: from models to implementation

    Get PDF
    Meeting multiple Quality of Service (QoS) requirements is an important factor in the success of complex software systems. This paper presents an automated, model-based scheduler synthesis approach for scheduling application software tasks to meet multiple QoS requirements. As a first step, it shows how designers can meet deadlock-freedom and timeliness requirements, in a manner that (i) does not over-provision resources, (ii) does not require architectural changes to the system, and that (iii) leaves enough degrees of freedom to pursue further properties. A major benefit of our synthesis methodology is that it increases traceability, by linking each scheduling constraint with a specific pair of QoS property and underlying platform execution model, so as to facilitate the validation of the scheduling constraints and the understanding of the overall system behaviour, required to meet further QoS properties. The paper shows how the methodology is applied in practice and also presents a prototype implementation infrastructure for executing an application on top of common operating systems, without requiring modifications of the latter

    STCP: A New Transport Protocol for High-Speed Networks

    Get PDF
    Transmission Control Protocol (TCP) is the dominant transport protocol today and likely to be adopted in future high‐speed and optical networks. A number of literature works have been done to modify or tune the Additive Increase Multiplicative Decrease (AIMD) principle in TCP to enhance the network performance. In this work, to efficiently take advantage of the available high bandwidth from the high‐speed and optical infrastructures, we propose a Stratified TCP (STCP) employing parallel virtual transmission layers in high‐speed networks. In this technique, the AIMD principle of TCP is modified to make more aggressive and efficient probing of the available link bandwidth, which in turn increases the performance. Simulation results show that STCP offers a considerable improvement in performance when compared with other TCP variants such as the conventional TCP protocol and Layered TCP (LTCP)

    An assembly and offset assignment scheme for self-similar traffic in optical burst switching

    Get PDF
    Includes bibliographical references.Optical Burst Switching (OBS) is a viable technology for the next generation core network. We propose an FEC-assembly scheme that efficiently assembles self-similar traffic and a Pareto-offset assignment rather than a constant offset assignment. Two buffers, a packet buffer and a burst buffer, are implemented at the Label Edge Router (LER), buffering traffic in the electronic domain. The assembler, between the packet and burst buffers, is served by the packet queue while the assembler serves the burst queue. We outline advantages of why burst assembly cannot be implemented independent of offset assignment. The two schemes must be implemented in a complementary way if QoS is to be realized in an OBS network. We show that there is a direct relation between OBS network performance with burst assembly and offset assignment. We present simulation results of the assembly and offset assignment proposals using the ns2 network simulator. Our results show that the combination of the proposed FEC-Based assembly scheme with the proposed Pareto-offset assignment scheme give better network performance in terms of burst drop, resource contention and delay. Key to any traffic shaping is the nature traffic being shaped. This work also compares performance of both traditional exponential traffic with realistic Self-Similar traffic of Internet traffic on the proposed assembly and offset assignment schemes. In our simulations, we assume that all Label Switch Routers (LSR) have wavelength converters and are without optical buffers. We use Latest Available Unused Channel with Void Filling (LAUC-VF) scheduling scheme and use Just Enough Time (JET) reservation scheme

    Exploring time related issues in data stream processing

    Get PDF
    Ph.DDOCTOR OF PHILOSOPH

    Online Modeling and Tuning of Parallel Stream Processing Systems

    Get PDF
    Writing performant computer programs is hard. Code for high performance applications is profiled, tweaked, and re-factored for months specifically for the hardware for which it is to run. Consumer application code doesn\u27t get the benefit of endless massaging that benefits high performance code, even though heterogeneous processor environments are beginning to resemble those in more performance oriented arenas. This thesis offers a path to performant, parallel code (through stream processing) which is tuned online and automatically adapts to the environment it is given. This approach has the potential to reduce the tuning costs associated with high performance code and brings the benefit of performance tuning to consumer applications where otherwise it would be cost prohibitive. This thesis introduces a stream processing library and multiple techniques to enable its online modeling and tuning. Stream processing (also termed data-flow programming) is a compute paradigm that views an application as a set of logical kernels connected via communications links or streams. Stream processing is increasingly used by computational-x and x-informatics fields (e.g., biology, astrophysics) where the focus is on safe and fast parallelization of specific big-data applications. A major advantage of stream processing is that it enables parallelization without necessitating manual end-user management of non-deterministic behavior often characteristic of more traditional parallel processing methods. Many big-data and high performance applications involve high throughput processing, necessitating usage of many parallel compute kernels on several compute cores. Optimizing the orchestration of kernels has been the focus of much theoretical and empirical modeling work. Purely theoretical parallel programming models can fail when the assumptions implicit within the model are mis-matched with reality (i.e., the model is incorrectly applied). Often it is unclear if the assumptions are actually being met, even when verified under controlled conditions. Full empirical optimization solves this problem by extensively searching the range of likely configurations under native operating conditions. This, however, is expensive in both time and energy. For large, massively parallel systems, even deciding which modeling paradigm to use is often prohibitively expensive and unfortunately transient (with workload and hardware). In an ideal world, a parallel run-time will re-optimize an application continuously to match its environment, with little additional overhead. This work presents methods aimed at doing just that through low overhead instrumentation, modeling, and optimization. Online optimization provides a good trade-off between static optimization and online heuristics. To enable online optimization, modeling decisions must be fast and relatively accurate. Online modeling and optimization of a stream processing system first requires the existence of a stream processing framework that is amenable to the intended type of dynamic manipulation. To fill this void, we developed the RaftLib C++ template library, which enables usage of the stream processing paradigm for C++ applications (it is the run-time which is the basis of almost all the work within this dissertation). An application topology is specified by the user, however almost everything else is optimizable by the run-time. RaftLib takes advantage of the knowledge gained during the design of several prior streaming languages (notably Auto-Pipe). The resultant framework enables online migration of tasks, auto-parallelization, online buffer-reallocation, and other useful dynamic behaviors that were not available in many previous stream processing systems. Several benchmark applications have been designed to assess the performance gains through our approaches and compare performance to other leading stream processing frameworks. Information is essential to any modeling task, to that end a low-overhead instrumentation framework has been developed which is both dynamic and adaptive. Discovering a fast and relatively optimal configuration for a stream processing application often necessitates solving for buffer sizes within a finite capacity queueing network. We show that a generalized gain/loss network flow model can bootstrap the process under certain conditions. Any modeling effort, requires that a model be selected; often a highly manual task, involving many expensive operations. This dissertation demonstrates that machine learning methods (such as a support vector machine) can successfully select models at run-time for a streaming application. The full set of approaches are incorporated into the open source RaftLib framework

    DRIBO: Robust Deep Reinforcement Learning via Multi-View Information Bottleneck

    Full text link
    Deep reinforcement learning (DRL) agents are often sensitive to visual changes that were unseen in their training environments. To address this problem, we leverage the sequential nature of RL to learn robust representations that encode only task-relevant information from observations based on the unsupervised multi-view setting. Specifically, we introduce an auxiliary objective based on the multi-view in-formation bottleneck (MIB) principle which quantifies the amount of task-irrelevant information and encourages learning representations that are both predictive of the future and less sensitive to task-irrelevant distractions. This enables us to train high-performance policies that are robust to visual distractions and can generalize to unseen environments. We demonstrate that our approach can achieve SOTA performance on diverse visual control tasks on the DeepMind Control Suite, even when the background is replaced with natural videos. In addition, we show that our approach outperforms well-established baselines for generalization to unseen environments on the Procgen benchmark. Our code is open-sourced and available at https://github.com/JmfanBU/DRIBO.Comment: 27 page

    PFPS: Priority-First Packet Scheduler for IEEE 802.15.4 Heterogeneous Wireless Sensor Networks

    Get PDF
    This paper presents priority-first packet scheduling approach for heterogeneous traffic flows in low data rate heterogeneous wireless sensor networks (HWSNs). A delay sensitive or emergency event occurrence demands the data delivery on the priority basis over regular monitoring sensing applications. In addition, handling sudden multi-event data and achieving their reliability requirements distinctly becomes the challenge and necessity in the critical situations. To address this problem, this paper presents distributed approach of managing data transmission for simultaneous traffic flows over multi-hop topology, which reduces the load of a sink node; and helps to make a life of the network prolong. For this reason, heterogeneous traffic flows algorithm (CHTF) algorithm classifies the each incoming packets either from source nodes or downstream hop node based on the packet priority and stores them into the respective queues. The PFPS-EDF and PFPS-FCFS algorithms present scheduling for each data packets using priority weight. Furthermore, reporting rate is timely updated based on the queue level considering their fairness index and processing rate. The reported work in this paper is validated in ns2 (ns2.32 allinone) simulator by putting the network into each distinct cases for validation of presented work and real time TestBed. The protocol evaluation presents that the distributed queue-based PFPS scheduling mechanism works efficiently using CSMA/CA MAC protocol of the IEEE 802.15.4 sensor networks

    An efficient design space exploration framework to optimize power-efficient heterogeneous many-core multi-threading embedded processor architectures

    Get PDF
    By the middle of this decade, uniprocessor architecture performance had hit a roadblock due to a combination of factors, such as excessive power dissipation due to high operating frequencies, growing memory access latencies, diminishing returns on deeper instruction pipelines, and a saturation of available instruction level parallelism in applications. An attractive and viable alternative embraced by all the processor vendors was multi-core architectures where throughput is improved by using micro-architectural features such as multiple processor cores, interconnects and low latency shared caches integrated on a single chip. The individual cores are often simpler than uniprocessor counterparts, use hardware multi-threading to exploit thread-level parallelism and latency hiding and typically achieve better performance-power figures. The overwhelming success of the multi-core microprocessors in both high performance and embedded computing platforms motivated chip architects to dramatically scale the multi-core processors to many-cores which will include hundreds of cores on-chip to further improve throughput. With such complex large scale architectures however, several key design issues need to be addressed. First, a wide range of micro- architectural parameters such as L1 caches, load/store queues, shared cache structures and interconnection topologies and non-linear interactions between them define a vast non-linear multi-variate micro-architectural design space of many-core processors; the traditional method of using extensive in-loop simulation to explore the design space is simply not practical. Second, to accurately evaluate the performance (measured in terms of cycles per instruction (CPI)) of a candidate design, the contention at the shared cache must be accounted in addition to cycle-by-cycle behavior of the large number of cores which superlinearly increases the number of simulation cycles per iteration of the design exploration. Third, single thread performance does not scale linearly with number of hardware threads per core and number of cores due to memory wall effect. This means that at every step of the design process designers must ensure that single thread performance is not unacceptably slowed down while increasing overall throughput. While all these factors affect design decisions in both high performance and embedded many-core processors, the design of embedded processors required for complex embedded applications such as networking, smart power grids, battlefield decision-making, consumer electronics and biomedical devices to name a few, is fundamentally different from its high performance counterpart because of the need to consider (i) low power and (ii) real-time operations. This implies the design objective for embedded many-core processors cannot be to simply maximize performance, but improve it in such a way that overall power dissipation is minimized and all real-time constraints are met. This necessitates additional power estimation models right at the design stage to accurately measure the cost and reliability of all the candidate designs during the exploration phase. In this dissertation, a statistical machine learning (SML) based design exploration framework is presented which employs an execution-driven cycle- accurate simulator to accurately measure power and performance of embedded many-core processors. The embedded many-core processor domain is Network Processors (NePs) used to processed network IP packets. Future generation NePs required to operate at terabits per second network speeds captures all the aspects of a complex embedded application consisting of shared data structures, large volume of compute-intensive and data-intensive real-time bound tasks and a high level of task (packet) level parallelism. Statistical machine learning (SML) is used to efficiently model performance and power of candidate designs in terms of wide ranges of micro-architectural parameters. The method inherently minimizes number of in-loop simulations in the exploration framework and also efficiently captures the non-linear interactions between the micro-architectural design parameters. To ensure scalability, the design space is partitioned into (i) core-level micro-architectural parameters to optimize single core architectures subject to the real-time constraints and (ii) shared memory level micro- architectural parameters to explore the shared interconnection network and shared cache memory architectures and achieves overall optimality. The cost function of our exploration algorithm is the total power dissipation which is minimized, subject to the constraints of real-time throughput (as determined from the terabit optical network router line-speed) required in IP packet processing embedded application
    corecore