60 research outputs found

    Using Locality and Interleaving Information to Improve Shared Cache Performance

    Get PDF
    The cache interference is found to play a critical role in optimizing cache allocation among concurrent threads for shared cache. Conventional LRU policy usually works well for low interference workloads, while high cache interference among threads demands explicit allocation regulation, such as cache partitioning. Cache interference is shown to be tied to inter-thread memory reference interleaving granularity: high interference is caused by ne-grain interleaving while low interference is caused coarse-grain interleaving. Proling of real multi-program workloads shows that cache set mapping and temporal phase result in the variation of interleaving granularity. When memory references from dierent threads map to disjoint cache sets, or they occur in distinct time windows, they tend to cause little interference due to coarse-grain interleaving. The interleaving granularity measured by runlength in workloads is found to correlate with the preference of cache management policy: ne-grain interleaving workloads perform better with cache partitioning, and coarse-grain interleaving workloads perform better with LRU. Most existing shared cache management techniques are based on working set locality analysis. This dissertation studies the shared cache performance by taking both locality and interleaving information into consideration. Oracle algorithm which provides theoretical best performance is investigated to provide insight into how to design a better practical policy. Proling and analysis of Oracle algorithm lead to the proposal of probabilistic replacement (PR), a novel cache allocation policy. With aggressor threads information learned on-line, PR evicts the bad locality blocks of aggressor threads probabilistically while preserving good locality blocks of non-aggressor threads. PR is shown to be able to adapt to the different interleaving granularities in different sets over time. Its flexibility in tuning eviction probability also improves fairness among thread performance. Evaluation indicates that PR outperforms LRU, UCP, and ideal cache partitioning at moderate hardware cost. For single program cache management, this dissertation also proposes a novel technique: reuse distance last touch predictor (RD-LTP). RD-LTP is able to capture reuse distance information, which represents the intrinsic memory reference pattern. Based on this improved LT predictor, an MRU LT eviction policy is developed to select the right victim at the presence of incorrect LT prediction. In addition to LT predictor, another predictor: reuse distance predictors (RDPs) is proposed, which is able to predict actual reuse distance values. Compared to various existing cache management techniques, these two novel predictors deliver higher cache performance with higher prediction coverage and accuracy at moderate hardware cost

    Calibration and Analysis of Enterprise and Edge Network Measurements

    Get PDF
    With the growth of the Internet over the past several decades, the field of Internet and network measurements has attracted the attention of many researchers. Doing the measurements has allowed a better understanding of the inner workings of both the global Internet and its specific parts. But undertaking a measurement study in a sound fashion is no easy task. Given the complexity of modern networks, one has to take great care in anticipating, detecting and eliminating all the measurement errors and biases. In this thesis we pave the way for a more systematic calibration of network traces. Such calibration ensures the soundness and robustness of the analysis results by revealing and fixing flaws in the data. We collect our measurement data in two environments: in a medium-sized enterprise and at the Internet edge. For the former we perform two rounds of data collection from the enterprise switches. We use the differences in the way we recorded the network traces during the first and second rounds to develop and assess the methodology for five calibration aspects: measurement gain, measurement loss, measurement reordering, timing, and topology. For the dataset gathered at the Internet edge, we perform calibration in the form of extensive checks of data consistency and sanity. After calibrating the data, we engage in the analysis of its various aspects. For the enterprise dataset we look at TCP dynamics in the enterprise environment. Here we first make a high- level overview of TCP connection characteristics such as termination status, size, duration, rate, etc. Then we assess the parameters important for TCP performance, such as retransmissions, out-of-order deliveries and channel utilization. Finally, using the Internet edge dataset, we gauge the performance characteristics of the edge connectivity

    Exploiting the Weak Generational Hypothesis for Write Reduction and Object Recycling

    Get PDF
    Programming languages with automatic memory management are continuing to grow in popularity due to ease of programming. However, these languages tend to allocate objects excessively, leading to inefficient use of memory and large garbage collection and allocation overheads. The weak generational hypothesis notes that objects tend to die young in languages with automatic dynamic memory management. Much work has been done to optimize allocation and garbage collection algorithms based on this observation. Previous work has largely focused on developing efficient software algorithms for allocation and collection. However, much less work has studied architectural solutions. In this work, we propose and evaluate architectural support for assisting allocation and garbage collection. We first study the effects of languages with automatic memory management on the memory system. As objects often die young, it is likely many objects die while in the processor\u27s caches. Writes of dead data back to main memory are unnecessary, as the data will never be used again. To study this, we develop and present architecture support to identify dead objects while they remain resident in cache and eliminate any unnecessary writes. We show that many writes out of the caches are unnecessary, and can be avoided using our hardware additions. Next, we study the effects of using dead data in cache to assist with allocation and garbage collection. Logic is developed and presented to allow for reuse of cache space found dead to satisfy future allocation requests. We show that dead cache space can be recycled at a high rate, reducing pressure on the allocator and reducing cache miss rates. However, a full implementation of our initial approach is shown to be unscalable. We propose and study limitations to our approach, trading object coverage for scalability. Third, we present a new approach for identifying objects that die young based on a limitation of our previous approach. We show this approach has much lower storage and logic requirements and is scalable, while only slightly decreasing overall object coverage

    Performance Analysis of Transactional Traffic in Mobile Ad-hoc Networks

    Get PDF
    Mobile Ad Hoc networks (MANETs) present unique challenge to new protocol design, especially in scenarios where nodes are highly mobile. Routing protocols performance is essential to the performance of wireless networks especially in mobile ad-hoc scenarios. The development of new routing protocols requires com- paring them against well-known protocols in various simulation environments. The protocols should be analysed under realistic conditions including, but not limited to, representative data transmission models, limited buffer space for data transmission, sensible simulation area and transmission range combination, and realistic moving patterns of the mobiles nodes. Furthermore, application traffic like transactional application traffic has not been investigated for domain-specific MANETs scenarios. Overall, there are not enough performance comparison work in the past literatures. This thesis presents extensive performance comparison among MANETs comparing transactional traffic including both highly-dynamic environment as well as low-mobility cases

    Avoiding Bad Query Mixes to Minimize Unsuccessful Client Requests Under Heavy Loads

    Get PDF
    In three-tiered web applications, some form of admission control is required to ensure that throughput and response times are not significantly harmed during periods of heavy load. We propose Q-Cop, a prototype system for improving admission control decisions that computes measures of load on the system based on the actual mix of queries being executed. This measure of load is used to estimate execution times for incoming queries, which allows Q-Cop to make control decisions with the goal of minimizing the number of requests that are not serviced before the client, or their browser, times out. Using TPC-W queries, we show that the response times of different types of queries can vary significantly, in excess of 50% in our experiments, depending not just on the number of queries being processed but on the mix of other queries that are running simultaneously. The variation implies that admission control can benefit from taking into account not just the number of queries being processed, but also the mix of queries. We develop a model of expected query execution times that accounts for the mix of queries being executed and integrate that model into a three-tiered system to make admission control decisions. This approach makes more informed decisions about which queries to reject, and our results show that it significantly reduces the number of unsuccessful client requests. Our results show that this approach makes more informed decisions about which queries to reject and as a result significantly reduces the number of unsuccessful client requests. For comparison, we develop several other models which represent related work in the field, including an MPL-based approach and an approach that considers the type of query but not the mix of queries. We show that Q-Cop does not need to re-compute any modelling information in order to perform well, a strong advantage over most other approaches. Across the range of workloads examined, an average of 47% fewer requests are denied than the next best approach

    Predictive analysis and optimisation of pipelined wavefront applications using reusable analytic models

    Get PDF
    Pipelined wavefront computations are an ubiquitous class of high performance parallel algorithms used for the solution of many scientific and engineering applications. In order to aid the design and optimisation of these applications, and to ensure that during procurement platforms are chosen best suited to these codes, there has been considerable research in analysing and evaluating their operational performance. Wavefront codes exhibit complex computation, communication, synchronisation patterns, and as a result there exist a large variety of such codes and possible optimisations. The problem is compounded by each new generation of high performance computing system, which has often introduced a previously unexplored architectural trait, requiring previous performance models to be rewritten and reevaluated. In this thesis, we address the performance modelling and optimisation of this class of application, as a whole. This differs from previous studies in which bespoke models are applied to specific applications. The analytic performance models are generalised and reusable, and we demonstrate their application to the predictive analysis and optimisation of pipelined wavefront computations running on modern high performance computing systems. The performance model is based on the LogGP parameterisation, and uses a small number of input parameters to specify the particular behaviour of most wavefront codes. The new parameters and model equations capture the key structural and behavioural differences among different wavefront application codes, providing a succinct summary of the operations for each application and insights into alternative wavefront application design. The models are applied to three industry-strength wavefront codes and are validated on several systems including a Cray XT3/XT4 and an InfiniBand commodity cluster. Model predictions show high quantitative accuracy (less than 20% error) for all high performance configurations and excellent qualitative accuracy. The thesis presents applications, projections and insights for optimisations using the model, which show the utility of reusable analytic models for performance engineering of high performance computing codes. In particular, we demonstrate the use of the model for: (1) evaluating application configuration and resulting performance; (2) evaluating hardware platform issues including platform sizing, configuration; (3) exploring hardware platform design alternatives and system procurement and, (4) considering possible code and algorithmic optimisations

    Основи підтримки мобільності в інфокомунікаційних системах. Моделювання пакетного самоподібного трафіку

    Get PDF
    Навчальний посібник містить поглиблений опис методів і способів дослідження самоподібного трафіка, розробка й використання нових підходів до адекватного моделювання трафіка сучасних пакетних телекомунікаційних мереж, особливо у світлі стрімкого розвитку й поширення високошвидкісних мереж, у тому числі безпроводових, що пред'являють дуже жорсткі вимоги до забезпечення високої якості обслуговування. Посібник структурно має два розділи, два додатки і побудований так, що складність матеріалу зростала з кожним наступним розділом. Навчальний посібник призначено для поглибленого вивчення самоподібного трафіка в інфокомунікаційних системах для здобувачів ступеня магістра за спеціальністю 172 «Телекомунікації та радіотехніка» з дисципліни «Основи підтримки мобільності в інфокомунікаційних системах», буде також корисним для аспірантів, наукових та інженерно- технічних працівників за напрямом інформаційно-телекомунікаційних систем та технологій.The training manual contains an in-depth description of the methods and methods of studying self-similar traffic, the development and use of new approaches to adequate modeling traffic of modern packet telecommunication networks, especially in light of the rapid development and spread of high-speed networks, including wireless networks, which impose very strict requirements for ensuring high quality of service. The guide structurally has two sections, two appendices and is structured in such a way that the complexity of the material increased with each subsequent section. The study guide is intended for an in-depth study of self-similar traffic in information communication systems for master's degree holders in the specialty 172 "Telecommunications and radio engineering" in the discipline "Fundamentals of supporting mobility in information communication systems", it will also be useful for graduate students, scientific and engineering technical workers in the direction of information and telecommunication systems and technologies

    Mean-Field-Type Games in Engineering

    Full text link
    A mean-field-type game is a game in which the instantaneous payoffs and/or the state dynamics functions involve not only the state and the action profile but also the joint distributions of state-action pairs. This article presents some engineering applications of mean-field-type games including road traffic networks, multi-level building evacuation, millimeter wave wireless communications, distributed power networks, virus spread over networks, virtual machine resource management in cloud networks, synchronization of oscillators, energy-efficient buildings, online meeting and mobile crowdsensing.Comment: 84 pages, 24 figures, 183 references. to appear in AIMS 201

    Energy Efficient Designs for Collaborative Signal and Information Processing inWireless Sensor Networks

    Get PDF
    Collaborative signal and information processing (CSIP) plays an important role in the deployment of wireless sensor networks. Since each sensor has limited computing capability, constrained power usage, and limited sensing range, collaboration among sensor nodes is important in order to compensate for each other’s limitation as well as to improve the degree of fault tolerance. In order to support the execution of CSIP algorithms, distributed computing paradigm and clustering protocols, are needed, which are the major concentrations of this dissertation. In order to facilitate collaboration among sensor nodes, we present a mobile-agent computing paradigm, where instead of each sensor node sending local information to a processing center, as is typical in the client/server-based computing, the processing code is moved to the sensor nodes through mobile agents. We further conduct extensive performance evaluation versus the traditional client/server-based computing. Experimental results show that the mobile agent paradigm performs much better when the number of nodes is large while the client/server paradigm is advantageous when the number of nodes is small. Based on this result, we propose a hybrid computing paradigm that adopts different computing models within different clusters of sensor nodes. Either the client/server or the mobile agent paradigm can be employed within clusters or between clusters according to the different cluster configurations. This new computing paradigm can take full advantages of both client/server and mobile agent computing paradigms. Simulations show that the hybrid computing paradigm performs better than either the client/server or the mobile agent computing. The mobile agent itinerary has a significant impact on the overall performance of the sensor network. We thus formulate both the static mobile agent planning and the dynamic mobile agent planning as optimization problems. Based on the models, we present three itinerary planning algorithms. We have showed, through simulation, that the predictive dynamic itinerary performs the best under a wide range of conditions, thus making it particularly suitable for CSIP in wireless sensor networks. In order to facilitate the deployment of hybrid computing paradigm, we proposed a decentralized reactive clustering (DRC) protocol to cluster the sensor network in an energy-efficient way. The clustering process is only invoked by events occur in the sensor network. Nodes that do not detect the events are put into the sleep state to save energy. In addition, power control technique is used to minimize the transmission power needed. The advantages of DRC protocol are demonstrated through simulations
    corecore