66 research outputs found

    Analyzing the Performance of Lock-Free Data Structures: A Conflict-based Model

    Full text link
    This paper considers the modeling and the analysis of the performance of lock-free concurrent data structures. Lock-free designs employ an optimistic conflict control mechanism, allowing several processes to access the shared data object at the same time. They guarantee that at least one concurrent operation finishes in a finite number of its own steps regardless of the state of the operations. Our analysis considers such lock-free data structures that can be represented as linear combinations of fixed size retry loops. Our main contribution is a new way of modeling and analyzing a general class of lock-free algorithms, achieving predictions of throughput that are close to what we observe in practice. We emphasize two kinds of conflicts that shape the performance: (i) hardware conflicts, due to concurrent calls to atomic primitives; (ii) logical conflicts, caused by simultaneous operations on the shared data structure. We show how to deal with these hardware and logical conflicts separately, and how to combine them, so as to calculate the throughput of lock-free algorithms. We propose also a common framework that enables a fair comparison between lock-free implementations by covering the whole contention domain, together with a better understanding of the performance impacting factors. This part of our analysis comes with a method for calculating a good back-off strategy to finely tune the performance of a lock-free algorithm. Our experimental results, based on a set of widely used concurrent data structures and on abstract lock-free designs, show that our analysis follows closely the actual code behavior.Comment: Short version to appear in DISC'1

    Approximation Algorithms for Energy Minimization in Cloud Service Allocation under Reliability Constraints

    Get PDF
    We consider allocation problems that arise in the context of service allocation in Clouds. More specifically, we assume on the one part that each computing resource is associated to a capacity constraint, that can be chosen using Dynamic Voltage and Frequency Scaling (DVFS) method, and to a probability of failure. On the other hand, we assume that the service runs as a set of independent instances of identical Virtual Machines. Moreover, there exists a Service Level Agreement (SLA) between the Cloud provider and the client that can be expressed as follows: the client comes with a minimal number of service instances which must be alive at the end of the day, and the Cloud provider offers a list of pairs (price,compensation), this compensation being paid by the Cloud provider if it fails to keep alive the required number of services. On the Cloud provider side, each pair corresponds actually to a guaranteed success probability of fulfilling the constraint on the minimal number of instances. In this context, given a minimal number of instances and a probability of success, the question for the Cloud provider is to find the number of necessary resources, their clock frequency and an allocation of the instances (possibly using replication) onto machines. This solution should satisfy all types of constraints during a given time period while minimizing the energy consumption of used resources. We consider two energy consumption models based on DVFS techniques, where the clock frequency of physical resources can be changed. For each allocation problem and each energy model, we prove deterministic approximation ratios on the consumed energy for algorithms that provide guaranteed probability failures, as well as an efficient heuristic, whose energy ratio is not guaranteed

    Sharing resources for performance and energy optimization of concurrent streaming applications

    Get PDF
    We aim at finding optimal mappings for concurrent streaming applications. Each application consists of a linear chain with several stages, and processes successive data sets in pipeline mode. The objective is to minimize the energy consumption of the whole platform, while satisfying given performance-related bounds on the period and latency of each application. The problem is to decide which processors to enroll, at which speed (or mode) to use them, and which stages they should execute. Processors can be identical (with the same modes) or heterogeneous. We also distinguish two mapping categories, interval mappings, and general mappings. For interval mappings, a processor is assigned a set of consecutive stages of the same application, so there is no resource sharing across applications. On the contrary, the assignment is fully arbitrary for general mappings, hence a processor can be reused for several applications. On the theoretical side, we establish complexity results for this tri-criteria mapping problem (energy, period, latency), classifying polynomial versus NP-complete instances. Furthermore, we derive an integer linear program that provides the optimal solution in the most general case. On the experimental side, we design polynomial-time heuristics, and assess their absolute performance thanks to the linear program. One main goal is to assess the impact of processor sharing on the quality of the solution

    Assessing the performance of energy-aware mappings

    Get PDF
    International audienceWe aim at mapping streaming applications that can be modeled by a series-parallel graph onto a 2-dimensional tiled chip multiprocessor (CMP) architecture. The objective of the mapping is to minimize the energy consumption, using dynamic voltage and frequency scaling (DVFS) techniques, while maintaining a given level of performance, reflected by the rate of processing the data streams. This mapping problem turns out to be NP-hard, and several heuristics are proposed. We assess their performance through comprehensive simulations using the StreamIt workflow suite and randomly generated series-parallel graphs, and various CMP grid sizes

    Power-aware Manhattan routing on chip multiprocessors

    Get PDF
    Nous nous intéressons au routage des communications dans un processeur multi-cœur (CMP). Le but est de trouver un routage valide, c'est-à-dire un routage dans lequel la quantité de données routée entre deux cœurs voisins ne dépasse pas la bande passante maximale, et tel que la puissance dissipée dans les communications est minimale. Nous nous positionnons au niveau système : nous supposons que des applications, sous forme de graphes de tâches, s'exécutent sur le CMP, chaque tâche étant déjà assignée à un cœur. Nous avons donc un ensemble de communications à router entre les cœurs. Nous utilisons un modèle classique, dans lequel la puissance dissipée par un lien de communication est la somme d'une partie statique et d'une partie dynamique, cette dernière dépendant de la fréquence du lien. Cette fréquence est ajustable et proportionnelle à la bande passante. La politique la plus utilisée est le routage XY : chaque communication est en- voyée horizontalement, puis verticalement. Cependant si nous nous autorisons à utiliser les chemins de Manhattan entre la source et la destination, la puissance dissipée peut être considérablement réduite. De plus, il est parfois possible de trouver une solution, alors qu'il n'en existait pas avec un routage XY. Dans ce papier, nous comparons le routage XY et le routage via des chemins de Manhattan, aussi bien d'un point de vue théorique que d'un point de vue pratique. Nous considérons deux variantes du routage par chemins de Manhattan : dans un routage à chemin unique, un seul chemin peut être utilisé pour chaque communication, tandis que le routage à chemin multiples nous permet d'éclater une communication et de lui faire emprunter plusieurs routes. Nous établissons la NP-complétude du problème consistant à trouver un routage Manhattan qui minimise la puissance dissipée, exhibons la borne supérieure minimale du ratio entre la puissance dissipée par un routage XY et celle dissipée par un routage Manhattan, et pour terminer, nous effectuons des simulations pour étudier les performances de nos heuristiques de routage Manhattan

    Approximation Algorithms for Energy Minimization in Cloud Service Allocation under Reliability Constraints

    Get PDF
    International audienceWe consider allocation problems that arise in the context of service allocation in Clouds. More specifically, we assume on the one part that each computing resource is associated to a capacity constraint, that can be chosen using Dynamic Voltage and Frequency Scaling (DVFS) method, and to a probability of failure. On the other hand, we assume that the service runs as a set of independent instances of identical Virtual Machines. Moreover, there exists a Service Level Agreement (SLA) between the Cloud provider and the client that can be expressed as follows: the client comes with a minimal number of service instances which must be alive at the end of the day, and the Cloud provider offers a list of pairs (price,compensation), this compensation being paid by the Cloud provider if it fails to keep alive the required number of services. On the Cloud provider side, each pair corresponds actually to a guaranteed success probability of fulfilling the constraint on the minimal number of instances. In this context, given a minimal number of instances and a probability of success, the question for the Cloud provider is to find the number of necessary resources, their clock frequency and an allocation of the instances (possibly using replication) onto machines. This solution should satisfy all types of constraints during a given time period while minimizing the energy consumption of used resources. We consider two energy consumption models based on DVFS techniques, where the clock frequency of physical resources can be changed. For each allocation problem and each energy model, we prove deterministic approximation ratios on the consumed energy for algorithms that provide guaranteed probability failures, as well as an efficient heuristic, whose energy ratio is not guaranteed.Nous considérons un problème d'allocation de services dans des \textit{Clouds}. Les resources de calcul sont caractérisées par une probabilité de panne, et une contrainte de capacité, qui peut être ajustée grâce à la technique dite de Dynamic Voltage and Frequency Scaling (DVFS). Il existe un contrat entre le fournisseur et le client, le fournisseur assurant au client qu'un certain nombre d'instances du service du client sera toujours en train de s'exécuter à la fin de la journée, avec une certaine probabilité. La question est donc de savoir à quelle vitesse devront tourner les processeurs, et à quel point les services devront être répliqués sur les machines. Nous exhibons des algorithmes d'approximation, prouvons leurs facteurs d'approximation sur l'énergie consommée, et décrivons des heuristiques performantes

    Optimal algorithms and approximation algorithms for replica placement with distance constraints in tree networks

    Get PDF
    In this paper, we study the problem of replica placement in tree networks subject to server capacity and distance constraints. The client requests are known beforehand, while the number and location of the servers are to be determined. The Single policy enforces that all requests of a client are served by a single server in the tree, while in the Multiple policy, the requests of a given client can be processed by multiple servers, thus distributing the processing of requests over the platform. For the Single policy, we prove that all instances of the problem are NP-hard, and we propose approximation algorithms. The problem with the Multiple policy was known to be NP-hard with distance constraints, but we provide a polynomial time optimal algorithm to solve the problem in the particular case of binary trees when no request exceeds the server capacity

    Modeling and Analyzing the Performance of Lock-Free Data Structures

    Get PDF
    Abstract This paper considers the modeling and the analysis of the performance of lock-free concurrent data structures. Lock-free designs employ an optimistic conflict control approach, allowing several processes to access the shared data object at the same time. The operations on these data structures are typically designed as compositions of retry loops. Our main contribution is a new way of modeling and analyzing a general class of lock-free algorithms, achieving predictions of throughput that are close to what we observe in practice. In our model we introduce two key metrics that shape the performance of lock-free algorithms: (i) expansion in execution time of a retry due to memory congestion and (ii) number of wasted retries. We show how to compute these two metrics, and how to combine them, to calculate the throughput of an arguably large class of lock-free algorithms. Our analysis also captures the throughput performance of a lock-free algorithm when executed as part of a larger parallel application. This part of our analysis leads to an analytical method for calculating a good back-off strategy to finely tune the performance of a lock-free application. Our experimental results, based on a set of widely used concurrent data structures and on abstract lock-free designs, show that our analysis follows closely the actual code behavior. To the best of our knowledge, this is the first attempt to make ends meet between theoretical bounds on performance and actual measured throughput
    • …
    corecore