2,162 research outputs found

    LIKWID: Lightweight Performance Tools

    Full text link
    Exploiting the performance of today's microprocessors requires intimate knowledge of the microarchitecture as well as an awareness of the ever-growing complexity in thread and cache topology. LIKWID is a set of command line utilities that addresses four key problems: Probing the thread and cache topology of a shared-memory node, enforcing thread-core affinity on a program, measuring performance counter metrics, and microbenchmarking for reliable upper performance bounds. Moreover, it includes a mpirun wrapper allowing for portable thread-core affinity in MPI and hybrid MPI/threaded applications. To demonstrate the capabilities of the tool set we show the influence of thread affinity on performance using the well-known OpenMP STREAM triad benchmark, use hardware counter tools to study the performance of a stencil code, and finally show how to detect bandwidth problems on ccNUMA-based compute nodes.Comment: 12 page

    Performance and Power Analysis of HPC Workloads on Heterogenous Multi-Node Clusters

    Get PDF
    Performance analysis tools allow application developers to identify and characterize the inefficiencies that cause performance degradation in their codes, allowing for application optimizations. Due to the increasing interest in the High Performance Computing (HPC) community towards energy-efficiency issues, it is of paramount importance to be able to correlate performance and power figures within the same profiling and analysis tools. For this reason, we present a performance and energy-efficiency study aimed at demonstrating how a single tool can be used to collect most of the relevant metrics. In particular, we show how the same analysis techniques can be applicable on different architectures, analyzing the same HPC application on a high-end and a low-power cluster. The former cluster embeds Intel Haswell CPUs and NVIDIA K80 GPUs, while the latter is made up of NVIDIA Jetson TX1 boards, each hosting an Arm Cortex-A57 CPU and an NVIDIA Tegra X1 Maxwell GPU.The research leading to these results has received funding from the European Community’s Seventh Framework Programme [FP7/2007-2013] and Horizon 2020 under the Mont-Blanc projects [17], grant agreements n. 288777, 610402 and 671697. E.C. was partially founded by “Contributo 5 per mille assegnato all’Università degli Studi di Ferrara-dichiarazione dei redditi dell’anno 2014”. We thank the University of Ferrara and INFN Ferrara for the access to the COKA Cluster. We warmly thank the BSC tools group, supporting us for the smooth integration and test of our setup within Extrae and Paraver.Peer ReviewedPostprint (published version

    Rising Tide_Promotion & Tenure_Tenure

    Get PDF
    Copies of Rising Tide Center tenure resources

    Experiences on the characterization of parallel applications in embedded systems with Extrae/Paraver

    Get PDF
    Cutting-edge functionalities in embedded systems require the use of parallel architectures to meet their performance requirements. This imposes the introduction of a new layer in the software stacks of embedded systems: the parallel programming model. Unfortunately, the tools used to analyze embedded systems fall short to characterize the performance of parallel applications at a parallel programming model level, and correlate this with information about non-functional requirements such as real-time, energy, memory usage, etc. HPC tools, like Extrae, are designed with that level of abstraction in mind, but their main focus is on performance evaluation. Overall, providing insightful information about the performance of parallel embedded applications at the parallel programming model level, and relate it to the non-functional requirements, is of paramount importance to fully exploit the performance capabilities of parallel embedded architectures. This paper contributes to the state-of-the-art of analysis tools for embedded systems by: (1) analyzing the particular constraints of embedded systems compared to HPC systems (e.g., static setting, restricted memory, limited drivers) to support HPC analysis tools; (2) porting Extrae, a powerful tracing tool from the HPC domain, to the GR740 platform, a SoC used in the space domain; and (3) augmenting Extrae with new features needed to correlate the parallel execution with the following non-functional requirements: energy, temperature and memory usage. Finally, the paper presents the usefulness of Extrae to characterize OpenMP applications and its non-functional requirements, evaluating different aspects of the applications running in the GR740.This work has been partially funded from the HP4S (High Performance Parallel Payload Processing for Space) project under the ESA-ESTEC ITI contract № 4000124124/18/NL/CRS.Peer ReviewedPostprint (author's final draft

    Wellness Protocol: An Integrated Framework for Ambient Assisted Living : A thesis presented in partial fulfilment of the requirements for the degree of Doctor of Philosophy In Electronics, Information and Communication Systems At School of Engineering and Advanced Technology, Massey University, Manawatu Campus, New Zealand

    Get PDF
    Listed in 2016 Dean's List of Exceptional ThesesSmart and intelligent homes of today and tomorrow are committed to enhancing the security, safety and comfort of the occupants. In the present scenario, most of the smart homes Protocols are limited to controlled activities environments for Ambient Assisted Living (AAL) of the elderly and the convalescents. The aim of this research is to develop a Wellness Protocol that forecasts the wellness of any individual living in the AAL environment. This is based on wireless sensors and networks that are applied to data mining and machine learning to monitor the activities of daily living. The heterogeneous sensor and actuator nodes, based on WSNs are deployed into the home environment. These nodes generate the real-time data related to the object usage and other movements inside the home, to forecast the wellness of an individual. The new Protocol has been designed and developed to be suitable especially for the smart home system. The Protocol is reliable, efficient, flexible, and economical for wireless sensor networks based AAL. According to consumer demand, the Wellness Protocol based smart home systems can be easily installed with existing households without any significant changes and with a user-friendly interface. Additionally, the Wellness Protocol has extended to designing a smart building environment for an apartment. In the endeavour of smart home design and implementation, the Wellness Protocol deals with large data handling and interference mitigation. A Wellness based smart home monitoring system is the application of automation with integral systems of accommodation facilities to boost and progress the everyday life of an occupant

    The German Socio-Economic Panel (SOEP) in the Nineties: An Example of Incremental Innovations in an Ongoing Longitudinal Study

    Get PDF
    The main aim of the present paper is to historically reappraise the development of the German Socio-Economic Panel Study (SOEP) in the 1990s after the first six waves had been collected. This development was closely connected to the opening of the Iron Curtain in Eastern Europe and the fall of the Wall separating the two German states. In addition to its relevance for the SOEP, this study is also of interest in relation to the contemporary history of science.SOEP, German unification, immigration studies, research governance, survey methods

    X-MAP A Performance Prediction Tool for Porting Algorithms and Applications to Accelerators

    Get PDF
    Most modern high-performance computing systems comprise of one or more accelerators with varying architectures in addition to traditional multicore Central Processing Units (CPUs). Examples of these accelerators include Graphic Processing Units (GPU) and Intel’s Many Integrated Cores architecture called Xeon Phi (PHI). These architectures provide massive parallel computation capabilities, which provide substantial performance benefits over traditional CPUs for a variety of scientific applications. We know that all accelerators are not similar because each of them has their own unique architecture. This difference in the underlying architecture plays a crucial role in determining if a given accelerator will provide a significant speedup over its competition. In addition to the architecture itself, one more differentiating factor for these accelerators is the programming language used to program them. For example, Nvidia GPUs can be programmed using Compute Unified Device Architecture (CUDA) and OpenCL while Intel Xeon PHIs can be programmed using OpenMP and OpenCL. The choice of programming language also plays a critical role in the speedup obtained depending on how close the language is to the hardware in addition to the level of optimization. With that said, it is thus very difficult for an application developer to choose the ideal accelerator to achieve the best possible speedup. In light of this, we present an easy to use Graphical User Interface (GUI) Tool called X-MAP which is a performance prediction tool for porting algorithms and applications to architectures which encompasses a Machine Learning based inference model to predict the performance of an applica-tion on a number of well-known accelerators and at the same time predict the best architecture and programming language for the application. We do this by collecting hardware counters from a given application and predicting run time by providing this data as inputs to a Neural Network Regressor based inference model. We predict the architecture and associated programming language by pro viding the hardware counters as inputs to an inference model based on Random Forest Classification Model. Finally, with a mean absolute prediction error of 8.52 and features such as syntax high-lighting for multiple programming languages, a function-wise breakdown of the entire application to understand bottlenecks and the ability for end users to submit their own prediction models to further improve the system, makes X-MAP a unique tool that has a significant edge over existing performance prediction solutions
    corecore