240 research outputs found

    Performance Optimization Strategies for Transactional Memory Applications

    Get PDF
    This thesis presents tools for Transactional Memory (TM) applications that cover multiple TM systems (Software, Hardware, and hybrid TM) and use information of all different layers of the TM software stack. Therefore, this thesis addresses a number of challenges to extract static information, information about the run time behavior, and expert-level knowledge to develop these new methods and strategies for the optimization of TM applications

    Measurement, Modeling, and Characterization for Power-Aware Computing

    Get PDF
    Society’s increasing dependence on information technology has resulted in the deployment of vast compute resources. The energy costs of operating these resources coupled with environmental concerns have made power-aware computingone of the primary challenges for the IT sector. Making energy-efficient computing a rule rather than an exception requires that researchers and system designers use the right set of techniques and tools. These involve measuring,modeling, and characterizing the energy consumption of computers at varying degrees of granularity.In this thesis, we present techniques to measure power consumption of computer systems at various levels. We compare them for accuracy and sensitivityand discuss their effectiveness. We test Intel’s hardware power model for estimation accuracy and show that it is fairly accurate for estimating energy consumption when sampled at the temporal granularity of more than tens ofmilliseconds.We present a methodology to estimate per-core processor power consumption using performance counter and temperature-based power modeling and validate it across multiple platforms. We show our model exhibits negligible computationoverhead, and the median estimation errors ranges from 0.3% to 10.1% for applications from SPEC2006, SPEC-OMP and NAS benchmarks. We test the usefulness of the model in a meta-scheduler to enforce power constraint on a system.Finally, we perform a detailed performance and energy characterization of Intel’s Restricted Transactional Memory (RTM). We use TinySTM software transactional memory (STM) system to benchmark RTM’s performance against competing STM alternatives. We use microbenchmarks and STAMP benchmarksuite to compare RTM versus STM performance and energy behavior. We quantify the RTM hardware limitations that affect its success rate. We show that RTM performs better than TinySTM when working-set fits inside the cache and that RTM is better at handling high contention workloads

    Improving Performance of Transactional Applications through Adaptive Transactional Memory

    Get PDF
    With the rise of chip multiprocessors (CMPs), it is necessary to use parallel programming to exploit computational power of CMPs. Traditionally, lock-based mechanisms have been used to synchronize shared variables in parallel programs. However, with the complexity associated with locks, writing a correct parallel program is a huge burden for programmers. As an alternative, Transactional Memory (TM) is gaining momentum as a parallel programming model for multi--‐core processors. TM provides programmers with an atomic construct (transaction), which can be used to guarantee atomicity of accesses to shared variables, as the synchronization is handled through the underlying system. Transactional memory comes in two variants: Software transaction memory (STM) and Hardware transaction memory (HTM). Both STM and HTM systems have advantages and disadvantages that either enhance or penalize performance in transactional applications. In this thesis, the focus is on implementing an adaptive system that exploits both STM and HTM at transaction granularity. The goal is to achieve performance gain by incorporating the benefits of both TM systems. A synchronization technique is developed to seamlessly switch between HTM and STM based on the characteristics of a transaction. We exploit decision tree to predict the optimum system for each transaction in a given application. The decision tree is a form of supervised machine learning to classify transactions based on parameters such as transaction size, transaction write ratio, etc. From the evaluations using STAMP, NAS, and DiscoPoP benchmark suites, the proposed adaptive system is able to improve speed of transactional applications by 20.82% on average

    Effective Communication of Scientific Results

    Full text link
    Communication is essential for the advancement of Science. Technology advances and the proliferation of personal devices have changed the ways in which people communicate in all aspects of life. Scientific communication has also been profoundly affected by such changes, and thus it is important to reflect on effective ways to communicate scientific results to scientists that are flooded with information. This article advocates for receiver-oriented communication in Science, discusses how effective oral presentations should be prepared and delivered, provides advice on the thought process that can lead to scientific papers that communicate effectively, discusses suitable methodology to produce experimental data that is relevant and offers advice on how to present such data in ways that lead to the formulation of correct claims that are supported by the data.Comment: 14 pages manuscrip

    Measurement, Modeling, and Characterization for Energy-Efficient Computing

    Get PDF
    The ever-increasing ecological footprint of Information Technology (IT) sector coupled with adverse effects of high power consumption on electronic circuits has increased the significance of energy-efficient computing in the last decade. Making energy-efficient computing a norm rather than an exception requires that system designers and programmers understand the energy implications of their design and implementation choices. This necessitates a detailed view of system’s energy expenditure and/or power consumption. We explore this aspect of energy-efficient computing in this thesis through power measurement, power modeling, and energy characterization.First, we present a quantitative comparison between power measurement data collected for computer systems using four techniques: a power meter at wall outlet, currenttransducers at ATX power rails, CPU voltage regulator’s current monitor, and Intel’s proprietary RAPL (Running Average Power Limit) interface. We compare them for accuracy, sensitivity and accessibility.Second, we present two different methodologies to model processor power consumption. The first model estimates power consumption at the granularity of individualcores using per-core performance events and temperature sensors. We validate the methodology on six different platforms and show that our model estimates power consumption with high accuracy across all platforms consistently. To understand the energy expenditure trends across different frequencies and different degrees of parallelism, we need to model power at a much finer granularity. The second power model addresses this issue by estimating static and dynamic power consumption for individual cores and the uncore. We validate this model on Intel’s Haswell platform for single-threaded and multi-threaded benchmarks. We use this power model to characterize energy efficiency of frequency scaling on Haswell microarchitecture and use the insights to implementa low overhead DVFS scheduler. We also characterize the energy efficiency of thread scaling using the power model and demonstrate how different communication parametersand microarchitectural traits affect application’s energy when it scales.Finally, we perform detailed performance and energy characterization of Intel’s RestrictedTransactional Memory (RTM).We use TinySTM software transactional memory(STM) system to benchmark RTM’s performance against competing STM alternatives.We use microbenchmarks and STAMP benchmark suite to compare RTM an STM performanceand energy behavior. We quantify the RTM hardware limitations and identifyconditions required for RTM to outperform STM

    Design and evaluation of a Thread-Level Speculation runtime library

    Get PDF
    En los próximos años es más que probable que máquinas con cientos o incluso miles de procesadores sean algo habitual. Para aprovechar estas máquinas, y debido a la dificultad de programar de forma paralela, sería deseable disponer de sistemas de compilación o ejecución que extraigan todo el paralelismo posible de las aplicaciones existentes. Así en los últimos tiempos se han propuesto multitud de técnicas paralelas. Sin embargo, la mayoría de ellas se centran en códigos simples, es decir, sin dependencias entre sus instrucciones. La paralelización especulativa surge como una solución para estos códigos complejos, posibilitando la ejecución de cualquier tipo de códigos, con o sin dependencias. Esta técnica asume de forma optimista que la ejecución paralela de cualquier tipo de código no de lugar a errores y, por lo tanto, necesitan de un mecanismo que detecte cualquier tipo de colisión. Para ello, constan de un monitor responsable que comprueba constantemente que la ejecución no sea errónea, asegurando que los resultados obtenidos de forma paralela sean similares a los de cualquier ejecución secuencial. En caso de que la ejecución fuese errónea los threads se detendrían y reiniciarían su ejecución para asegurar que la ejecución sigue la semántica secuencial. Nuestra contribución en este campo incluye (1) una nueva librería de ejecución especulativa fácil de utilizar; (2) nuevas propuestas que permiten reducir de forma significativa el número de accesos requeridos en las peraciones especulativas, así como consejos para reducir la memoria a utilizar; (3) propuestas para mejorar los métodos de scheduling centradas en la gestión dinámica de los bloques de iteraciones utilizados en las ejecuciones especulativas; (4) una solución híbrida que utiliza memoria transaccional para implementar las secciones críticas de una librería de paralelización especulativa; y (5) un análisis de las técnicas especulativas en uno de los dispositivos más vanguardistas del momento, los coprocesadores Intel Xeon Phi. Como hemos podido comprobar, la paralelización especulativa es un campo de investigación activo. Nuestros resultados demuestran que esta técnica permite obtener mejoras de rendimiento en un gran número de aplicaciones. Así, esperamos que este trabajo contribuya a facilitar el uso de soluciones especulativas en compiladores comerciales y/o modelos de programación paralela de memoria compartida.Departamento de Informática (Arquitectura y Tecnología de Computadores, Ciencias de la Computación e Inteligencia Artificial, Lenguajes y Sistemas Informáticos

    Techniques for improving the performance of software transactional memory

    Get PDF
    Transactional Memory (TM) gives software developers the opportunity to write concurrent programs more easily compared to any previous programming paradigms and gives a performance comparable to lock-based synchronizations. Current Software TM (STM) implementations have performance overheads that can be reduced by introducing new abstractions in Transactional Memory programming model. In this thesis we present four new techniques for improving the performance of Software TM: (i) Abstract Nested Transactions (ANT), (ii) TagTM, (iii) profile-guided transaction coalescing, and (iv) dynamic transaction coalescing. ANT improves performance of transactional applications without breaking the semantics of the transactional paradigm, TagTM speeds up accesses to transactional meta-data, profile-guided transaction coalescing lowers transactional overheads at compile time, and dynamic transaction coalescing lowers transactional overheads at runtime. Our analysis shows that Abstract Nested Transactions, TagTM, profile-guided transaction coalescing, and dynamic transaction coalescing improve the performance of the original programs that use Software Transactional Memory

    Efficient runtime systems for speculative parallelization

    Get PDF
    Manuelle Parallelisierung ist zeitaufwändig und fehleranfällig. Automatische Parallelisierung andererseits findet häufig nur einen Bruchteil der verfügbaren Parallelität. Mithilfe von Spekulation kann jedoch auch für komplexere Programme ein Großteil der Parallelität ausgenutzt werden. Spekulativ parallelisierte Programme benötigen zur Ausführung immer ein Laufzeitsystem, um die spekulativen Annahmen abzusichern und für den Fall des Nichtzutreffens die korrekte Ausführungssemantik sicherzustellen. Solche Laufzeitsysteme sollen die Ausführungszeit des parallelen Programms so wenig wie möglich beeinflussen. In dieser Arbeit untersuchen wir, inwiefern aktuelle Systeme, die Speicherzugriffe explizit und in Software beobachten, diese Anforderung erfüllen, und stellen Änderungen vor, die die Laufzeit massiv verbessern. Außerdem entwerfen wir zwei neue Systeme, die mithilfe von virtueller Speicherverwaltung das Programm indirekt beobachten und dadurch eine deutlich geringere Auswirkung auf die Laufzeit haben. Eines der vorgestellten Systeme ist mittels eines Moduls direkt in den Linux-Betriebssystemkern integriert und bietet so die bestmögliche Effizienz. Darüber hinaus bietet es weitreichendere Sicherheitsgarantien als alle bisherigen Techniken, indem sogar Systemaufrufe zum Beispiel zur Datei Ein- und Ausgabe in der spekulativen Isolation mit eingeschlossen sind. Wir zeigen an einer Reihe von Benchmarks die Überlegenheit unserer Spekulationssyteme über den derzeitigen Stand der Technik. Sämtliche unserer Erweiterungen und Neuentwicklungen stehen als open source zur freien Verfügung. Diese Arbeit ist in englischer Sprache verfasst.Manual parallelization is time consuming and error-prone. Automatic parallelization on the other hand is often unable to extract substantial parallelism. Using speculation, however, most of the parallelism can be exploited even of complex programs. Speculatively parallelized programs always need a runtime system during execution in order to ensure the validity of the speculative assumptions, and to ensure the correct semantics even in the case of misspeculation. These runtime systems should influence the execution time of the parallel program as little as possible. In this thesis, we investigate to which extend state-of-the-art systems which track memory accesses explicitly in software fulfill this requirement. We describe and implement changes which improve their performance substantially. We also design two new systems utilizing virtual memory abstraction to track memory changed implicitly, thus causing less overhead during execution. One of the new systems is integrated into the Linux kernel as a kernel module, providing the best possible performance. Furthermore it provides stronger soundness guarantees than any state-of-the-art system by also capturing system calls, hence including for example file I/O into speculative isolation. In a number of benchmarks we show the performance improvements of our virtual memory based systems over the state of the art. All our extensions and newly developed speculation systems are made available as open source
    corecore