141 research outputs found

    TABARNAC: Visualizing and Resolving Memory Access Issues on NUMA Architectures

    Get PDF
    International audienceIn modern parallel architectures, memory accesses represent a common bottleneck. Thus, optimizing the way applications access the memory is an important way to improve performance and energy consumption. Memory accesses are even more important with NUMA machines, as the access time to data depends on its location in the memory. Many efforts were made to develop adaptive tools to improve memory accesses at the runtime by optimizing the mapping of data and threads to NUMA nodes. However, theses tools are not able to change the memory access pattern of the original application, therefore a code written without considering memory performance might not benefit from them. Moreover, automatic mapping tools take time to converge towards the best mapping, losing optimization opportunities. A deeper understanding of the memory behavior can help optimizing it, removing the need for runtime analysis. In this paper, we present TABARNAC , a tool for analyzing the memory behavior of parallel applications with a focus on NUMA architectures. TABARNAC provides a new visualization of the memory access behavior, focusing on the distribution of accesses by thread and by structure. Such visualization allows the developer to easily understand why performance issues occur and how to fix them. Using TABARNAC , we explain why some applications do not benefit from data and thread mapping. Moreover, we propose several code modifications to improve the memory access behavior of several parallel applications. Permission to make digital or hard copies of all or part of this work for personal or classroom use is granted without fee provided that copies are not made or distributed for profit or commercial advantage and that copies bear this notice and the full citation on the first page. Copyrights for components of this work owned by others than the author(s) must be honored. Abstracting with credi

    MINING AND VERIFICATION OF TEMPORAL EVENTS WITH APPLICATIONS IN COMPUTER MICRO-ARCHITECTURE RESEARCH

    Get PDF
    Computer simulation programs are essential tools for scientists and engineers to understand a particular system of interest. As expected, the complexity of the software increases with the depth of the model used. In addition to the exigent demands of software engineering, verification of simulation programs is especially challenging because the models represented are complex and ridden with unknowns that will be discovered by developers in an iterative process. To manage such complexity, advanced verification techniques for continually matching the intended model to the implemented model are necessary. Therefore, the main goal of this research work is to design a useful verification and validation framework that is able to identify model representation errors and is applicable to generic simulators. The framework that was developed and implemented consists of two parts. The first part is First-Order Logic Constraint Specification Language (FOLCSL) that enables users to specify the invariants of a model under consideration. From the first-order logic specification, the FOLCSL translator automatically synthesizes a verification program that reads the event trace generated by a simulator and signals whether all invariants are respected. The second part consists of mining the temporal flow of events using a newly developed representation called State Flow Temporal Analysis Graph (SFTAG). While the first part seeks an assurance of implementation correctness by checking that the model invariants hold, the second part derives an extended model of the implementation and hence enables a deeper understanding of what was implemented. The main application studied in this work is the validation of the timing behavior of micro-architecture simulators. The study includes SFTAGs generated for a wide set of benchmark programs and their analysis using several artificial intelligence algorithms. This work improves the computer architecture research and verification processes as shown by the case studies and experiments that have been conducted

    Computing resource allocation in three-tier IoT fog networks: a joint optimization approach combining Stackelberg game and matching

    Get PDF
    Fog computing is a promising architecture to provide economical and low latency data services for future Internet of Things (IoT)-based network systems. Fog computing relies on a set of low-power fog nodes (FNs) that are located close to the end users to offload the services originally targeting at cloud data centers. In this paper, we consider a specific fog computing network consisting of a set of data service operators (DSOs) each of which controls a set of FNs to provide the required data service to a set of data service subscribers (DSSs). How to allocate the limited computing resources of FNs to all the DSSs to achieve an optimal and stable performance is an important problem. Therefore, we propose a joint optimization framework for all FNs, DSOs, and DSSs to achieve the optimal resource allocation schemes in a distributed fashion. In the framework, we first formulate a Stackelberg game to analyze the pricing problem for the DSOs as well as the resource allocation problem for the DSSs. Under the scenarios that the DSOs can know the expected amount of resource purchased by the DSSs, a many-to-many matching game is applied to investigate the pairing problem between DSOs and FNs. Finally, within the same DSO, we apply another layer of many-to-many matching between each of the paired FNs and serving DSSs to solve the FN-DSS pairing problem. Simulation results show that our proposed framework can significantly improve the performance of the IoT-based network systems

    A Churn for the Better: Localizing Censorship using Network-level Path Churn and Network Tomography

    Get PDF
    Recent years have seen the Internet become a key vehicle for citizens around the globe to express political opinions and organize protests. This fact has not gone unnoticed, with countries around the world repurposing network management tools (e.g., URL filtering products) and protocols (e.g., BGP, DNS) for censorship. However, repurposing these products can have unintended international impact, which we refer to as "censorship leakage". While there have been anecdotal reports of censorship leakage, there has yet to be a systematic study of censorship leakage at a global scale. In this paper, we combine a global censorship measurement platform (ICLab) with a general-purpose technique -- boolean network tomography -- to identify which AS on a network path is performing censorship. At a high-level, our approach exploits BGP churn to narrow down the set of potential censoring ASes by over 95%. We exactly identify 65 censoring ASes and find that the anomalies introduced by 24 of the 65 censoring ASes have an impact on users located in regions outside the jurisdiction of the censoring AS, resulting in the leaking of regional censorship policies

    ATP: a Datacenter Approximate Transmission Protocol

    Full text link
    Many datacenter applications such as machine learning and streaming systems do not need the complete set of data to perform their computation. Current approximate applications in datacenters run on a reliable network layer like TCP. To improve performance, they either let sender select a subset of data and transmit them to the receiver or transmit all the data and let receiver drop some of them. These approaches are network oblivious and unnecessarily transmit more data, affecting both application runtime and network bandwidth usage. On the other hand, running approximate application on a lossy network with UDP cannot guarantee the accuracy of application computation. We propose to run approximate applications on a lossy network and to allow packet loss in a controlled manner. Specifically, we designed a new network protocol called Approximate Transmission Protocol, or ATP, for datacenter approximate applications. ATP opportunistically exploits available network bandwidth as much as possible, while performing a loss-based rate control algorithm to avoid bandwidth waste and re-transmission. It also ensures bandwidth fair sharing across flows and improves accurate applications' performance by leaving more switch buffer space to accurate flows. We evaluated ATP with both simulation and real implementation using two macro-benchmarks and two real applications, Apache Kafka and Flink. Our evaluation results show that ATP reduces application runtime by 13.9% to 74.6% compared to a TCP-based solution that drops packets at sender, and it improves accuracy by up to 94.0% compared to UDP

    Workload Interleaving with Performance Guarantees in Data Centers

    Get PDF
    In the era of global, large scale data centers residing in clouds, many applications and users share the same pool of resources for the purposes of reducing energy and operating costs, and of improving availability and reliability. Along with the above benefits, resource sharing also introduces performance challenges: when multiple workloads access the same resources concurrently, contention may occur and introduce delays in the performance of individual workloads. Providing performance isolation to individual workloads needs effective management methodologies. The challenges of deriving effective management methodologies lie in finding accurate, robust, compact metrics and models to drive algorithms that can meet different performance objectives while achieving efficient utilization of resources. This dissertation proposes a set of methodologies aiming at solving the challenging performance isolation problem in workload interleaving in data centers, focusing on both storage components and computing components. at the storage node level, we focus on methodologies for better interleaving user traffic with background workloads, such as tasks for improving reliability, availability, and power savings. More specifically, a scheduling policy for background workload based on the statistical characteristics of the system busy periods and a methodology that quantitatively estimates the performance impact of power savings are developed. at the storage cluster level, we consider methodologies on how to efficiently conduct work consolidation and schedule asynchronous updates without violating user performance targets. More specifically, we develop a framework that can estimate beforehand the benefits and overheads of each option in order to automate the process of reaching intelligent consolidation decisions while achieving faster eventual consistency. at the computing node level, we focus on improving workload interleaving at off-the-shelf servers as they are the basic building blocks of large-scale data centers. We develop priority scheduling middleware that employs different policies to schedule background tasks based on the instantaneous resource requirements of the high priority applications running on the server node. Finally, at the computing cluster level, we investigate popular computing frameworks for large-scale data intensive distributed processing, such as MapReduce and its Hadoop implementation. We develop a new Hadoop scheduler called DyScale to exploit capabilities offered by heterogeneous cores in order to achieve a variety of performance objectives

    On I/O Performance and Cost Efficiency of Cloud Storage: A Client\u27s Perspective

    Get PDF
    Cloud storage has gained increasing popularity in the past few years. In cloud storage, data are stored in the service provider’s data centers; users access data via the network and pay the fees based on the service usage. For such a new storage model, our prior wisdom and optimization schemes on conventional storage may not remain valid nor applicable to the emerging cloud storage. In this dissertation, we focus on understanding and optimizing the I/O performance and cost efficiency of cloud storage from a client’s perspective. We first conduct a comprehensive study to gain insight into the I/O performance behaviors of cloud storage from the client side. Through extensive experiments, we have obtained several critical findings and useful implications for system optimization. We then design a client cache framework, called Pacaca, to further improve end-to-end performance of cloud storage. Pacaca seamlessly integrates parallelized prefetching and cost-aware caching by utilizing the parallelism potential and object correlations of cloud storage. In addition to improving system performance, we have also made efforts to reduce the monetary cost of using cloud storage services by proposing a latency- and cost-aware client caching scheme, called GDS-LC, which can achieve two optimization goals for using cloud storage services: low access latency and low monetary cost. Our experimental results show that our proposed client-side solutions significantly outperform traditional methods. Our study contributes to inspiring the community to reconsider system optimization methods in the cloud environment, especially for the purpose of integrating cloud storage into the current storage stack as a primary storage layer

    Productive Efficiency of Energy-Aware Data Centers

    Get PDF
    Information technologies must be made aware of the sustainability of cost reduction. Data centers may reach energy consumption levels comparable to many industrial facilities and small-sized towns. Therefore, innovative and transparent energy policies should be applied to improve energy consumption and deliver the best performance. This paper compares, analyzes and evaluates various energy efficiency policies, which shut down underutilized machines, on an extensive set of data-center environments. Data envelopment analysis (DEA) is then conducted for the detection of the best energy efficiency policy and data-center characterization for each case. This analysis evaluates energy consumption and performance indicators for natural DEA and constant returns to scale (CRS). We identify the best energy policies and scheduling strategies for high and low data-center demands and for medium-sized and large data-centers; moreover, this work enables data-center managers to detect inefficiencies and to implement further corrective actions.Universidad de Sevilla 2018/0000052
    • …
    corecore