270 research outputs found

    Less is More: Exploiting the Standard Compiler Optimization Levels for Better Performance and Energy Consumption

    Get PDF
    This paper presents the interesting observation that by performing fewer of the optimizations available in a standard compiler optimization level such as -O2, while preserving their original ordering, significant savings can be achieved in both execution time and energy consumption. This observation has been validated on two embedded processors, namely the ARM Cortex-M0 and the ARM Cortex-M3, using two different versions of the LLVM compilation framework; v3.8 and v5.0. Experimental evaluation with 71 embedded benchmarks demonstrated performance gains for at least half of the benchmarks for both processors. An average execution time reduction of 2.4% and 5.3% was achieved across all the benchmarks for the Cortex-M0 and Cortex-M3 processors, respectively, with execution time improvements ranging from 1% up to 90% over the -O2. The savings that can be achieved are in the same range as what can be achieved by the state-of-the-art compilation approaches that use iterative compilation or machine learning to select flags or to determine phase orderings that result in more efficient code. In contrast to these time consuming and expensive to apply techniques, our approach only needs to test a limited number of optimization configurations, less than 64, to obtain similar or even better savings. Furthermore, our approach can support multi-criteria optimization as it targets execution time, energy consumption and code size at the same time.Comment: 15 pages, 3 figures, 71 benchmarks used for evaluatio

    Cache organization on loop blocking

    Get PDF
    Disponible dans les fichiers attachés à ce documen

    Energy-Delay Tradeoff Analysis of ILP-based Compilation Techniques on a VLIW Architecture

    Get PDF
    Energy consumption is becoming an important issue on modern processors, especially on embedded systems. While many architectural solutions to reduce energy consumption exist, software solutions on the other hand mostly rely on performance optimization techniques. The rationale behind this latter approach follows the rule that energy consumption is roughly proportional to the execution time. While some ILP techniques allow to increase performance by eliminating redundant instructions, others however do increase the total instruction count, mitigating their benefit as far as energy consumption is concerned. This paper explores the energy-delay tradeoff of ILP enhancing techniques at the compilation level. The goal is to develop a theoretical understanding of the main energy issues involved by forming ILP blocks. We present an analytical methodology which essentially exploits the variations in program performance to identify conditions leading to energy consumption increase. Our results show that there exists a threshold above which ILP enhancing optimizations may necessarily turn into diminishing energy reduction returns. The proposed tradeoff analysis reveals that this can be mainly attributed to the limited available instruction parallelism of applications which causes wasted computation and, to some extent, machine overhead to start dominating the energy consumption in some scenarios where the ILP is pushed above a given threshold

    Implementing Wilson-Dirac Operator on the Cell Broadband Engine

    Get PDF
    Computing the actions of Wilson-Dirac operators consumes most of the CPU time for the grand challenge problem of simulating Lattice Quantum Chromodynamics (Lattice QCD). This routine exhibits many challenges to implementation on most computational environments because of the multiple pattern of accessing the same data that make it difficult to align the data efficiently at compile time. Additionally, the low computation to memory access ratio makes this computation both memory bandwidth and memory latency bounded. In this work, we present an implementation of this routine on Cell Broadband Engine. We propose runtime data fusion, an approach aiming at aligning data at runtime, for data that cannot be aligned optimally at compile time, to improve SIMDized execution. We also show DMA optimization technique that reduces the impact of BW limits on performance. Our implementation for this routine achieves 31.2 GFlops for single precision computations and 8.75 GFlops for double precision computations

    Étude exploratoire des caractéristiques professionnelles d'un échantillon de suicidants hospitalisés

    Get PDF
    Cette étude a pour objectif de décrire les caractéristiques professionnelles d’un échantillon de suicidants. Un enquêteur a interrogé les suicidants âgés de 18 à 65 ans, hospitalisés consécutivement dans une unité spécialisée du CHU d’Angers sur une durée de 6 mois et demi. Au total, 87 suicidants actifs avec un emploi ont été interrogés. Ils ont souvent été confrontés à des contraintes organisationnelles décrites dans la littérature comme responsables de souffrance mentale liée au travail. Cela concerne globalement autant les hommes que les femmes. En comparaison aux enquêtes de santé au travail (Sumer, Samotrace…), les suicidants sont plus nombreux à ressentir entre autres un stress intense au travail, une conscience professionnelle heurtée et à être en situation tendue selon le modèle de Karasek. Cela pourrait être en faveur d’un lien entre les tentatives de suicide et certains facteurs de pénibilité mentale au travail. Les résultats de cette étude sont à interpréter avec prudence du fait des phénomènes de circularité des données et de la faiblesse de l’échantillon

    Locality-Aware Automatic Parallelization for GPGPU with OpenHMPP Directives

    Get PDF
    This is a post-peer-review, pre-copyedit version of an article published in International Journal of Parallel Programming. The final authenticated version is available online at: https://doi.org/10.1007/s10766-015-0362-9[Abstract] The use of GPUs for general purpose computation has increased dramatically in the past years due to the rising demands of computing power and their tremendous computing capacity at low cost. Hence, new programming models have been developed to integrate these accelerators with high-level programming languages, giving place to heterogeneous computing systems. Unfortunately, this heterogeneity is also exposed to the programmer complicating its exploitation. This paper presents a new technique to automatically rewrite sequential programs into a parallel counterpart targeting GPU-based heterogeneous systems. The original source code is analyzed through domain-independent computational kernels, which hide the complexity of the implementation details by presenting a non-statement-based, high-level, hierarchical representation of the application. Next, a locality-aware technique based on standard compiler transformations is applied to the original code through OpenHMPP directives. Two representative case studies from scientific applications have been selected: the three-dimensional discrete convolution and the simple-precision general matrix multiplication. The effectiveness of our technique is corroborated by a performance evaluation on NVIDIA GPUs.Ministerio de EconomĂ­a y Competitividad; TIN2010-16735Ministerio de EconomĂ­a y Competitividad; TIN2013-42148-PGalicia, ConsellerĂ­a de Cultura, EducaciĂłn e OrdenaciĂłn Universitaria; GRC2013-055Ministerio de EducaciĂłn; AP2008-0101

    Lancer de rayon : approches parallèles

    Get PDF
    Disponible dans les fichiers attachés à ce documen

    Thoracic Duct Fistula after Thyroid Cancer Surgery: Towards a New Treatment?

    Get PDF
    The use of somatostatin analogs is a new conservative therapeutic approach for the treatment of chyle fistulas developing after thyroid cancer surgery. The combination therapy with a total parenteral nutrition should avoid the high morbidity of a re-intervention with an uncertain outcome. This promising trend is supported by the present case report of a chyle leak occurring after total thyroidectomy with central and lateral neck dissection for a papillary carcinoma, which was treated successfully without immediate or distant sequelae

    A strategy for array management in local memory

    Get PDF
    Projet CHLOEOne major point in loop restructuring for data locality optimization is the choice and the evaluation of a data locality criteria. We show in this paper how to compute approximations of window sets defined by Gannon, Jalby and Gallivan (the window associated with an iteration i describes the "active" portion of array : elements which have already been referenced before iteration i and which will be referenced after iteration i. Such a notion is extremely useful for data localization since it identifies the portions of arrays which are worth keeping in local memory because they are going to be referenced later. The computation of these window approximations can be performed symbolically at compile time and generates simple geometrical shape that simplifies the management of the data transfers. This allow to derive a global strategy of data management for local memories. . . Moreover, the effects of loop transformations fit naturally in the geometrical framework we use for the calculations. The determination of window approximations is studied both from a theoretical and a computational point of view and examples of applications are given

    Travail et tentatives de suicide

    Get PDF
    ObjectifL’irruption médiatique de suicides en lien avec le travail constitue un élément marquant de ces dernières années. Ce phénomène doit mobiliser tous les acteurs de l’entreprise, de l’agent au directeur. Pour envisager une prévention des suicides liés au travail, il est important de les définir en termes quantitatif et qualitatif. Afin de répondre en partie à cette problématique, cette étude vise à préciser les caractéristiques professionnelles d’un échantillon de suicidants dont une partie annonce un lien entre leur passage à l’acte suicidaire et leur travail. Méthodes Cette étude d’une durée de six mois a inclus les suicidants actifs ayant un emploi, âgés de 18 à 65ans, hospitalisés au CHU d’Angers. Au total, 87 suicidants consécutifs ont répondu aux questionnaires portant sur les caractéristiques du travail. Résultats Les suicidants appartenant au groupe « tentatives de suicide (TS) liées au travail » décrivent davantage la présence d’éléments d’organisations pathogènes du travail comprenant une surcharge de travail, des délais à respecter et des dérangements fréquents. De plus, ils rapportent également davantage de conflits dans l’entreprise. Cela concerne en majorité les cadres et les professions intermédiaires. La proportion des hommes est plus importante parmi les suicidants établissant un lien entre leur TS et leur travail, mais les femmes sont aussi concernées. Discussion Le lien entre le travail et les TS semble davantage provenir de l’organisation du travail, des rapports avec la hiérarchie et les collègues, ainsi que d’une mauvaise reconnaissance au travail, que de l’exposition aux contraintes physiques, aux horaires de travail et au type de contrat de travail. Conclusion La prévention du suicide lié au travail doit passer par une réflexion sur l’organisation du travail, les rapports sociaux au sein de l’entreprise et la promotion de la reconnaissance du travail réel
    • …
    corecore