1,847 research outputs found

    Providing Insight into the Performance of Distributed Applications Through Low-Level Metrics

    Get PDF
    The field of high-performance computing (HPC) has always dealt with the bleeding edge of computational hardware and software to achieve the maximum possible performance for a wide variety of workloads. When dealing with brand new technologies, it can be difficult to understand how these technologies work and why they work the way they do. One of the more prevalent approaches to providing insight into modern hardware and software is to provide tools that allow developers to access low-level metrics about their performance. The modern HPC ecosystem supports a wide array of technologies, but in this work, I will be focusing on two particularly influential technologies: The Message Passing Interface (MPI), and Graphical Processing Units (GPUs).For many years, MPI has been the dominant programming paradigm in HPC. Indeed, over 90% of applications that are a part of the U.S. Exascale Computing Project plan to use MPI in some fashion. The MPI Standard provides programmers with a wide variety of methods to communicate between processes, along with several other capabilities. The high-level MPI Profiling Interface has been the primary method for profiling MPI applications since the inception of the MPI Standard, and more recently the low-level MPI Tool Information Interface was introduced.Accelerators like GPUs have been increasingly adopted as the primary computational workhorse for modern supercomputers. GPUs provide more parallelism than traditional CPUs through a hierarchical grid of lightweight processing cores. NVIDIA provides profiling tools for their GPUs that give access to low-level hardware metrics.In this work, I propose research in applying low-level metrics to both the MPI and GPU paradigms in the form of an implementation of low-level metrics for MPI, and a new method for analyzing GPU load imbalance with a synthetic efficiency metric. I introduce Software-based Performance Counters (SPCs) to expose internal metrics of the Open MPI implementation along with a new interface for exposing these counters to users and tool developers. I also analyze a modified load imbalance formula for GPU-based applications that uses low-level hardware metrics provided through nvprof in a hierarchical approach to take the internal load imbalance of the GPU into account

    OpenCL Actors - Adding Data Parallelism to Actor-based Programming with CAF

    Full text link
    The actor model of computation has been designed for a seamless support of concurrency and distribution. However, it remains unspecific about data parallel program flows, while available processing power of modern many core hardware such as graphics processing units (GPUs) or coprocessors increases the relevance of data parallelism for general-purpose computation. In this work, we introduce OpenCL-enabled actors to the C++ Actor Framework (CAF). This offers a high level interface for accessing any OpenCL device without leaving the actor paradigm. The new type of actor is integrated into the runtime environment of CAF and gives rise to transparent message passing in distributed systems on heterogeneous hardware. Following the actor logic in CAF, OpenCL kernels can be composed while encapsulated in C++ actors, hence operate in a multi-stage fashion on data resident at the GPU. Developers are thus enabled to build complex data parallel programs from primitives without leaving the actor paradigm, nor sacrificing performance. Our evaluations on commodity GPUs, an Nvidia TESLA, and an Intel PHI reveal the expected linear scaling behavior when offloading larger workloads. For sub-second duties, the efficiency of offloading was found to largely differ between devices. Moreover, our findings indicate a negligible overhead over programming with the native OpenCL API.Comment: 28 page

    Reducing Power Consumption and Latency in Mobile Devices using a Push Event Stream Model, Kernel Display Server, and GUI Scheduler

    Get PDF
    The power consumed by mobile devices can be dramatically reduced by improving how mobile operating systems handle events and display management. Currently, mobile operating systems use a pull model that employs a polling loop to constantly ask the operating system if an event exists. This constant querying prevents the CPU from entering a deep sleep, which unnecessarily consumes power. We’ve improved this process by switching to a push model which we refer to as the event stream model (ESM). This model leverages modern device interrupt controllers which automatically notify an application when events occur, thus removing the need to constantly rouse the CPU in order to poll for events. Since the CPU rests while no events are occurring, power consumption is reduced. Furthermore, an application is immediately notified when an event occurs, as opposed to waiting for a polling loop to recognize when an event has occurred. This immediate notification reduces latency, which is the elapsed time between the occurrence of an event and the beginning of its processing by an application. We further improved the benefits of the ESM by moving the display server, a central piece of the graphical user interface (GUI), into the kernel. Existing display servers duplicate some of the kernel code. They contain important information about an application that can assist the kernel with scheduling, such as whether the application is visible and able to receive events. However, they do not share such information with the kernel. Our new kernel-level display server (KDS) interacts directly with the process scheduler to determine when applications are allowed to use the CPU. For example, when an application is idle and not visible on the screen, the KDS prevents that application from using the CPU, thus conserving power. These combined improvements have reduced power consumption by up to 31.2% and latency by up to 17.1 milliseconds in our experimental applications. This improvement in power consumption roughly increases battery life by one to four hours when the device is being actively used or fifty to three-hundred hours when the device is idle

    Unlocking the deployment of spectrum sharing with a policy enforcement framework

    Get PDF
    Spectrum sharing has been proposed as a promising way to increase the efficiency of spectrum usage by allowing incumbent operators (IOs) to share their allocated radio resources with licensee operators (LOs), under a set of agreed rules. The goal is to maximize a common utility, such as the sum rate throughput, while maintaining the level of service required by the IOs. However, this is only guaranteed under the assumption that all “players”respect the agreed sharing rules. In this paper, we propose a comprehensive framework for licensed shared access (LSA) networks that discourages LO misbehavior. Our framework is built around three core functions: misbehavior detection via the employment of a dedicated sensing network; a penalization function; and, a behavior-driven resource allocation. To the best of our knowledge, this is the first time that these components are combined for the monitoring/policing of the spectrum under the LSA framework. Moreover, a novel simulator for LSA is provided as an open access tool, serving the purpose of testing and validating our proposed techniques via a set of extensive system-level simulations in the context of mobile network operators, where IOs and several competing LOs are considered. The results demonstrate that violation of the agreed sharing rules can lead to a great loss of resources for the misbehaving LOs, the amount of which is controlled by the system. Finally, we promote that including a policy enforcement function as part of the spectrum sharing system can be beneficial for the LSA system, since it can guarantee compliance with the spectrum sharing rules and limit the short-term benefits arising from misbehavior

    Research routing and MAC based on LEACH and S-MAC for energy efficiency and QoS in wireless sensor network

    Get PDF
    The wireless sensor is a micro-embedded device with weak data processing capability and small storage space. These nodes need to complete complex jobs, including data monitoring, acquisition and conversion, and data processing. Energy efficiency should be considered as one of the important aspects of the Wireless Sensor Network (WSN) throughout architecture and protocol design. At the same time, supporting Quality of Service (QoS) in WSNs is a research field, because the time-sensitive and important information is expected for the transmitting to to the sink node immediately. The thesis is supported by the projects entitled “The information and control system for preventing forest fires”, and “The Erhai information management system”, funded by the Chinese Government. Energy consumption and QoS are two main objectives of the projects. The thesis discusses the two aspects in route and Media Access Control (MAC). For energy efficiency, the research is based on Low Energy Adaptive Clustering Hierarchy (LEACH) protocol. LEACH is a benchmark clustering routing protocol which imposes upon cluster heads to complete a lot of aggregation and relay of messages to the base-station. However, there are limitations in LEACH. LEACH does not suit a wide area in clustering strategy and multi-hop routing. Moreover, routing protocols only focus on one factor, combining the clustering strategy and multi-hop routing mechanism were not considered in routing protocol for performance of network. QoS is supported by the MAC and routing protocol. Sensor MAC(S-MAC) makes the use of the periodically monitoring / sleeping mechanism, as well as collision and crosstalk avoidance mechanism. The mechanism reduces energy costs. Meanwhile, it supports good scalability and avoids the collision. However, the protocols do not take the differentiated services. For supporting QoS,A new route protocol needs to be designed and realized on embed platforms, which has WIFI mode and a Linux operation system to apply on the actual system. This research project was conducted as following the steps: A new protocol called RBLEACH is proposed to solve cluster on a widely scale based on LEACH. The area is divided into a few areas, where LEACH is improved to alter the selecting function in each area. RBLEACH creates routes selected by using a new algorithm to optimize the performance of the network. A new clustering method that has been developed to use several factors is PS-ACO-LEACH. The factors include the residual energy of the cluster head and Euclidean distances between cluster members and a cluster head. It can optimally solve fitness function and maintain a load balance in between the cluster head nodes, a cluster head and the base station. Based on the “Ant Colony” algorithm and transition of probability, a new routing protocol was created by “Pheromone” to find the optimal path of cluster heads to the base station. This protocol can reduce energy consumption of cluster heads and unbalanced energy consumption. Simulations prove that the improved protocol can enhance the performance of the network, including lifetime and energy conservation. Additionally, Multi Index Adaptive Routing Algorithm (MIA-QR) was designed based on network delay, packet loss rate and signal strength for QoS. The protocol is achieved by VC on an embedded Linux system. The MIA-QR is tested and verified by experiment and the protocol is to support QoS. Finally, an improved protocol (SMAC -SD) for wireless sensor networks is proposed, in order to solve the problem of S-MAC protocol that consider either service differentiation or ensure quality of service. According to service differentiation, SMAC-SD adopts an access mechanism based on different priorities including the adjustment of priority mechanisms of channel access probability, channel multi-request mechanisms and the configuring of waiting queues with different priorities and RTS backoff for different service, which makes the important service receive high channel access probability, ensuring the transmission quality of the important service. The simulation results show that the improved protocol is able to gain amount of important service and shortens the delay at the same time. Meanwhile, it improves the performance of the network effectivel

    Virtual sensing directional hub MAC (VSDH-MAC) protocol with power control

    Get PDF
    Medium access control (MAC) protocols play a vital role in making effective use of a multiple access channel as it governs the achievable performance such as channel utilization and corresponding quality of service of wireless sensor networks (WSNs). In this paper, a virtual carrier sensing directional hub (VSDH) MAC protocol incorporating realistic directional antenna patterns is proposed for directional single hub centralized WSNs. While in most instances, MAC protocols assume idealized directional antenna patterns, the proposed VSDH-MAC protocol incorporates realistic directional antenna patterns to deliver enhanced link performance. We demonstrate that the use of directional antennas with a suitable MAC protocol can provide enhanced communication range and increased throughput with reduced energy consumption at each node, compared to the case when only omnidirectional antennas are used. For the scenarios considered in this study, results show that the average transmit power of the sensor nodes can be reduced by a factor of two, and at the same time offer significantly extended lifetime

    Effective bootstrapping of Peer-to Peer networks over Mobile Ad-hoc networks

    Get PDF
    Mobile Ad-hoc Networks (MANETs) and Peer-to-Peer (P2P) networks are vigorous, revolutionary communication technologies in the 21st century. They lead the trend of decentralization. Decentralization will ultimately win clients over client/server model, because it gives ordinary network users more control, and stimulates their active participation. It is a determinant factor in shaping the future of networking. MANETs and P2P networks are very similar in nature. Both are dynamic, distributed. Both use multi-hop broadcast or multicast as major pattern of traffic. Both set up connection by self-organizing and maintain connection by self-healing. Embodying the slogan networking without networks, both abandoned traditional client/server model and disclaimed pre-existing infrastructure. However, their status quo levels of real world application are widely divergent. P2P networks are now accountable for about 50 ~ 70% internet traffic, while MANETs are still primarily in the laboratory. The interesting and confusing phenomenon has sparked considerable research effort to transplant successful approaches from P2P networks into MANETs. While most research in the synergy of P2P networks and MANETs focuses on routing, the network bootstrapping problem remains indispensable for any such transplantation to be realized. The most pivotal problems in bootstrapping are: (1) automatic configuration of nodes addresses and IDs, (2) topology discovery and transformation in different layers and name spaces. In this dissertation research, we have found novel solutions for these problems. The contributions of this dissertation are: (1) a non-IP, flat address automatic configuration scheme, which integrates lower layer addresses and P2P IDs in application layer and makes simple cryptographical assignment possible. A related paper entitled Pastry over Ad-Hoc Networks with Automatic Flat Address Configuration was submitted to Elsevier Journal of Ad Hoc Networks in May. (2) an effective ring topology construction algorithm which builds perfect ring in P2P ID space using only simplest multi-hop unicast or multicast. Upon this ring, popular structured P2P networks like Chord, Pastry could be built with great ease. A related paper entitled Chord Bootstrapping on MANETs - All Roads lead to Rome will be ready for submission after defense of the dissertation
    corecore