90 research outputs found

    Accurately modeling the on-chip and off-chip GPU memory subsystem

    Full text link
    [EN] Research on GPU architecture is becoming pervasive in both the academia and the industry because these architectures offer much more performance per watt than typical CPU architectures. This is the main reason why massive deployment of GPU multiprocessors is considered one of the most feasible solutions to attain exascale computing capabilities. The memory hierarchy of the GPU is a critical research topic, since its design goals widely differ from those of conventional CPU memory hierarchies. Researchers typically use detailed microarchitectural simulators to explore novel designs to better support GPGPU computing as well as to improve the performance of GPU and CPU-GPU systems. In this context, the memory hierarchy is a critical and continuously evolving subsystem. Unfortunately, the fast evolution of current memory subsystems deteriorates the accuracy of existing state-of-the-art simulators. This paper focuses on accurately modeling the entire (both on-chip and off-chip) GPU memory subsystem. For this purpose, we identify four main memory related components that impact on the overall performance accuracy. Three of them belong to the on-chip memory hierarchy: (i) memory request coalescing mechanisms, (ii) miss status holding registers, and (iii) cache coherence protocol; while the fourth component refers to the memory controller and GDDR memory working activity. To evaluate and quantify our claims, we accurately modeled the aforementioned memory components in an extended version of the state-of-the-art Multi2Sim heterogeneous CPUGPU processor simulator. Experimental results show important deviations, which can vary the final system performance provided by the simulation framework up to a factor of three. The proposed GPU model has been compared and validated against the original framework and the results from a real AMD Southern-Islands 7870HD GPU. (C) 2017 Elsevier B.V. All rights reserved.This work was supported in part by Generalitat Valenciana under grant AICO/2016/059, by the Spanish Ministerio de Economía y Competitividad (MINECO) and Plan E funds under Grant TIN2015-66972-C5-1-R, and by Programa de Ayudas de Investigación y Desarrollo (PAID) de la Universitat Politècnica de València .Candel-Margaix, F.; Petit Martí, SV.; Sahuquillo Borrás, J.; Duato Marín, JF. (2018). Accurately modeling the on-chip and off-chip GPU memory subsystem. Future Generation Computer Systems. 82:510-519. https://doi.org/10.1016/j.future.2017.02.012S5105198

    Modelling and Design of Resilient Networks under Challenges

    Get PDF
    Communication networks, in particular the Internet, face a variety of challenges that can disrupt our daily lives resulting in the loss of human lives and significant financial costs in the worst cases. We define challenges as external events that trigger faults that eventually result in service failures. Understanding these challenges accordingly is essential for improvement of the current networks and for designing Future Internet architectures. This dissertation presents a taxonomy of challenges that can help evaluate design choices for the current and Future Internet. Graph models to analyse critical infrastructures are examined and a multilevel graph model is developed to study interdependencies between different networks. Furthermore, graph-theoretic heuristic optimisation algorithms are developed. These heuristic algorithms add links to increase the resilience of networks in the least costly manner and they are computationally less expensive than an exhaustive search algorithm. The performance of networks under random failures, targeted attacks, and correlated area-based challenges are evaluated by the challenge simulation module that we developed. The GpENI Future Internet testbed is used to conduct experiments to evaluate the performance of the heuristic algorithms developed

    An Intelligent Mobility Prediction Scheme for Location-Based Service over Cellular Communications Network

    Get PDF
    One of the trickiest challenges introduced by cellular communications networks is mobility prediction for Location Based-Services (LBSs). Hence, an accurate and efficient mobility prediction technique is particularly needed for these networks. The mobility prediction technique incurs overheads on the transmission process. These overheads affect properties of the cellular communications network such as delay, denial of services, manual filtering and bandwidth. The main goal of this research is to enhance a mobility prediction scheme in cellular communications networks through three phases. Firstly, current mobility prediction techniques will be investigated. Secondly, innovation and examination of new mobility prediction techniques will be based on three hypothesises that are suitable for cellular communications network and mobile user (MU) resources with low computation cost and high prediction success rate without using MU resources in the prediction process. Thirdly, a new mobility prediction scheme will be generated that is based on different levels of mobility prediction. In this thesis, a new mobility prediction scheme for LBSs is proposed. It could be considered as a combination of the cell and routing area (RA) prediction levels. For cell level prediction, most of the current location prediction research is focused on generalized location models, where the geographic extent is divided into regular-shape cells. These models are not suitable for certain LBSs where the objectives are to compute and present on-road services. Such techniques are the New Markov-Based Mobility Prediction (NMMP) and Prediction Location Model (PLM) that deal with inner cell structure and different levels of prediction, respectively. The NMMP and PLM techniques suffer from complex computation, accuracy rate regression and insufficient accuracy. In this thesis, Location Prediction based on a Sector Snapshot (LPSS) is introduced, which is based on a Novel Cell Splitting Algorithm (NCPA). This algorithm is implemented in a micro cell in parallel with the new prediction technique. The LPSS technique, compared with two classic prediction techniques and the experimental results, shows the effectiveness and robustness of the new splitting algorithm and prediction technique. In the cell side, the proposed approach reduces the complexity cost and prevents the cell level prediction technique from performing in time slots that are too close. For these reasons, the RA avoids cell-side problems. This research discusses a New Routing Area Displacement Prediction for Location-Based Services (NRADP) which is based on developed Ant Colony Optimization (ACO). The NRADP, compared with Mobility Prediction based on an Ant System (MPAS) and the experimental results, shows the effectiveness, higher prediction rate, reduced search stagnation ratio, and reduced computation cost of the new prediction technique

    Topics in Power Usage in Network Services

    Get PDF
    The rapid advance of computing technology has created a world powered by millions of computers. Often these computers are idly consuming energy unnecessarily in spite of all the efforts of hardware manufacturers. This thesis examines proposals to determine when to power down computers without negatively impacting on the service they are used to deliver, compares and contrasts the efficiency of virtualisation with containerisation, and investigates the energy efficiency of the popular cryptocurrency Bitcoin. We begin by examining the current corpus of literature and defining the key terms we need to proceed. Then we propose a technique for improving the energy consumption of servers by moving them into a sleep state and employing a low powered device to act as a proxy in its place. After this we move on to investigate the energy efficiency of virtualisation and compare the energy efficiency of two of the most common means used to do this. Moving on from this we look at the cryptocurrency Bitcoin. We consider the energy consumption of bitcoin mining and if this compared with the value of bitcoin makes this profitable. Finally we conclude by summarising the results and findings of this thesis. This work increases our understanding of some of the challenges of energy efficient computation as well as proposing novel mechanisms to save energy

    Reliability models for HPC applications and a Cloud economic model

    Get PDF
    With the enormous number of computing resources in HPC and Cloud systems, failures become a major concern. Therefore, failure behaviors such as reliability, failure rate, and mean time to failure need to be understood to manage such a large system efficiently. This dissertation makes three major contributions in HPC and Cloud studies. First, a reliability model with correlated failures in a k-node system for HPC applications is studied. This model is extended to improve accuracy by accounting for failure correlation. Marshall-Olkin Multivariate Weibull distribution is improved by excess life, conditional Weibull, to better estimate system reliability. Also, the univariate method is proposed for estimating Marshall-Olkin Multivariate Weibull parameters of a system composed of a large number of nodes. Then, failure rate, and mean time to failure are derived. The model is validated by using log data from Blue Gene/L system at LLNL. Results show that when failures of nodes in the system have correlation, the system becomes less reliable. Secondly, a reliability model of Cloud computing is proposed. The reliability model and mean time to failure and failure rate are estimated based on a system of k nodes and s virtual machines under four scenarios: 1) Hardware components fail independently, and software components fail independently; 2) software components fail independently, and hardware components are correlated in failure; 3) correlated software failure and independent hardware failure; and 4) dependent software and hardware failure. Results show that if the failure of the nodes and/or software in the system possesses a degree of dependency, the system becomes less reliable. Also, an increase in the number of computing components decreases the reliability of the system. Finally, an economic model for a Cloud service provider is proposed. This economic model aims at maximizing profit based on the right pricing and rightsizing in the Cloud data center. Total cost is a key element in the model and it is analyzed by considering the Total Cost of Ownership (TCO) of the Cloud

    EXPLORING BEHAVIORAL PATTERNS IN COMPLEX ADAPTIVE SYSTEMS

    Get PDF
    Many phenomenons in real world can be characterized as complex adaptive systems (CAS). We are surrounded with a huge number of communicating and interacting agents. Some of those agents may be capable of learning and adapting to new situation, trying to achieve their goals. E-commerce, social media, cloud computing, transportation network and real-time ride sharing, supply chain are a few examples of CAS. These are the systems which surround us in every day’s life, and naturally we want to make sense of those systems and optimize systems’ behavior or optimize our behavior around those systems. Given the complexity of these systems, we want to find a set of simplified patterns out of the seeming chaos of interactions in a CAS, and provide more manageable means of analysis for such systems. In my thesis I consider a few example problems from different domains: modeling human behavior during fire evacuation, detection of notable transitions in data streams, modeling finite resource sharing on a computational cluster with many clients, and predicting buyer behavior on the marketplace. These (and other) seemingly different problems demonstrate one important similarity: complex semi-repetitive or semi-similar behavior. This semi-repetitive behavior poses a challenge to model such processes. This challenge comes for two major reasons: 1 ) state-space explosion and sparsity of data 2 ) critical transitions and precision of process modeling I show, that the analysis of smilingly different CAS coming from different domains, can be performed by following the same recipe

    Flash Memory Devices

    Get PDF
    Flash memory devices have represented a breakthrough in storage since their inception in the mid-1980s, and innovation is still ongoing. The peculiarity of such technology is an inherent flexibility in terms of performance and integration density according to the architecture devised for integration. The NOR Flash technology is still the workhorse of many code storage applications in the embedded world, ranging from microcontrollers for automotive environment to IoT smart devices. Their usage is also forecasted to be fundamental in emerging AI edge scenario. On the contrary, when massive data storage is required, NAND Flash memories are necessary to have in a system. You can find NAND Flash in USB sticks, cards, but most of all in Solid-State Drives (SSDs). Since SSDs are extremely demanding in terms of storage capacity, they fueled a new wave of innovation, namely the 3D architecture. Today “3D” means that multiple layers of memory cells are manufactured within the same piece of silicon, easily reaching a terabit capacity. So far, Flash architectures have always been based on "floating gate," where the information is stored by injecting electrons in a piece of polysilicon surrounded by oxide. On the contrary, emerging concepts are based on "charge trap" cells. In summary, flash memory devices represent the largest landscape of storage devices, and we expect more advancements in the coming years. This will require a lot of innovation in process technology, materials, circuit design, flash management algorithms, Error Correction Code and, finally, system co-design for new applications such as AI and security enforcement

    Performance Optimization Strategies for Transactional Memory Applications

    Get PDF
    This thesis presents tools for Transactional Memory (TM) applications that cover multiple TM systems (Software, Hardware, and hybrid TM) and use information of all different layers of the TM software stack. Therefore, this thesis addresses a number of challenges to extract static information, information about the run time behavior, and expert-level knowledge to develop these new methods and strategies for the optimization of TM applications
    • …
    corecore