Search CORE

128,795 research outputs found

A Review of Lightweight Thread Approaches for High Performance Computing

Author: Balaji Pavan
Castelló Adrián
Mayo Rafael
Peña Antonio J.
Quintana-Ortí Enrique S.
Seo Sangmin
Publication venue: 'Institute of Electrical and Electronics Engineers (IEEE)'
Publication date: 08/12/2016
Field of study

High-level, directive-based solutions are becoming the programming models (PMs) of the multi/many-core architectures. Several solutions relying on operating system (OS) threads perfectly work with a moderate number of cores. However, exascale systems will spawn hundreds of thousands of threads in order to exploit their massive parallel architectures and thus conventional OS threads are too heavy for that purpose. Several lightweight thread (LWT) libraries have recently appeared offering lighter mechanisms to tackle massive concurrency. In order to examine the suitability of LWTs in high-level runtimes, we develop a set of microbenchmarks consisting of commonly-found patterns in current parallel codes. Moreover, we study the semantics offered by some LWT libraries in order to expose the similarities between different LWT application programming interfaces. This study reveals that a reduced set of LWT functions can be sufficient to cover the common parallel code patterns andthat those LWT libraries perform better than OS threads-based solutions in cases where task and nested parallelism are becoming more popular with new architectures.The researchers from the Universitat Jaume I de Castelló were supported by project TIN2014-53495-R of the MINECO, the Generalitat Valenciana fellowship programme Vali+d 2015, and FEDER. This work was partially supported by the U.S. Dept. of Energy, Office of Science, Office of Advanced Scientific Computing Research (SC-21), under contract DEAC02-06CH11357. We gratefully acknowledge the computing resources provided and operated by the Joint Laboratory for System Evaluation (JLSE) at Argonne National Laboratory.Peer ReviewedPostprint (author's final draft

Crossref

UPCommons. Portal del coneixement obert de la UPC

Implementation Methods for Estimating Haplotypes with Grid Computing Technology

Author: Phua Kia Kien
Sanjaya KC
Zafri Muhammad
Publication venue
Publication date: 21/01/2010
Field of study

Introduction: PHASE is a software for haplotype reconstruction and recombination rate estimation from population data and implements the methods for estimating haplotypes from population genotype data. This software require enormous computing power due to the complexity algorithms used. Recent advancement in grid computing and multi-core processing technology has resulted in the creating of virtual supercomputers (that are capable of massive parallel computations).

Method: PHASE version 2.0 was downloaded from PHASE’s homepage (http://stephenslab.uchicago.edu/software.html), and installed on 10 Apple iMac computers (20 CPUs) running Mac OS 10.5 using Apple™ Xserve as a controller. A PHP web-based application was developed to enable easy job submission and result retrieval. Haplotype estimation jobs for PHASE were submitted using a web browser running either from Windows®, Linux® or Macintosh™ OS 10.5 operating systems. Jobs consisting of 20 input files of genomic loci (SNPs) were submitted to the Xgrid via the controller and the job completion time for each run were recorded and compared.

Results: The job completion time for estimating 20 input files and returning the results to the controller using XGrid (20 CPUs) took 4,980 seconds compared with 58,800 seconds using a single computer (2 CPUs), representing 91.5% reduction in computing time

Crossref

Nature Precedings

CacheZoom: How SGX Amplifies The Power of Cache Attacks

Author: D Brumley
D Gruss
DA Osvik
DJ Bernstein
G Irazoqui
J Bonneau
M Hamburg
M Matsui
MS İnci
N Benger
N Weichbrodt
O Acıiçmez
PC Kocher
R Langner
S Bhattacharya
T Morris
Y Tsunoo
Publication venue
Publication date: 27/06/2017
Field of study

In modern computing environments, hardware resources are commonly shared, and parallel computation is widely used. Parallel tasks can cause privacy and security problems if proper isolation is not enforced. Intel proposed SGX to create a trusted execution environment within the processor. SGX relies on the hardware, and claims runtime protection even if the OS and other software components are malicious. However, SGX disregards side-channel attacks. We introduce a powerful cache side-channel attack that provides system adversaries a high resolution channel. Our attack tool named CacheZoom is able to virtually track all memory accesses of SGX enclaves with high spatial and temporal precision. As proof of concept, we demonstrate AES key recovery attacks on commonly used implementations including those that were believed to be resistant in previous scenarios. Our results show that SGX cannot protect critical data sensitive computations, and efficient AES key recovery is possible in a practical environment. In contrast to previous works which require hundreds of measurements, this is the first cache side-channel attack on a real system that can recover AES keys with a minimal number of measurements. We can successfully recover AES keys from T-Table based implementations with as few as ten measurements.Comment: Accepted at Conference on Cryptographic Hardware and Embedded Systems (CHES '17

arXiv.org e-Print Archive

Crossref

Cryptology ePrint Archive

A runtime heuristic to selectively replicate tasks for application-specific reliability targets

Author: Labarta Mancho Jesús José
Subasi Omer
Unsal Osman Sabri
Yalcin Gulay
Zyulkyarov Ferad
Publication venue: 'Institute of Electrical and Electronics Engineers (IEEE)'
Publication date: 01/01/2016
Field of study

In this paper we propose a runtime-based selective task replication technique for task-parallel high performance computing applications. Our selective task replication technique is automatic and does not require modification/recompilation of OS, compiler or application code. Our heuristic, we call App_FIT, selects tasks to replicate such that the specified reliability target for an application is achieved. In our experimental evaluation, we show that App FIT selective replication heuristic is low-overhead and highly scalable. In addition, results indicate that complete task replication is overkill for achieving reliability targets. We show that with App FIT, we can tolerate pessimistic exascale error rates with only 53% of the tasks being replicated.This work was supported by FI-DGR 2013 scholarship and the European Community’s Seventh Framework Programme [FP7/2007-2013] under the Mont-blanc 2 Project (www.montblanc-project.eu), grant agreement no. 610402 and in part by the European Union (FEDER funds) under contract TIN2015-65316-P.Peer ReviewedPostprint (author's final draft

Crossref

UPCommons. Portal del coneixement obert de la UPC

Analysis of multi-threaded processing efficiency in Androidapplications

Author: Shcherbakov E.V.
Shcherbakova M.E.
Щербаков Е.В.
Щербаков Є.В.
Щербакова М.Е.
Щербакова М.Є.
Publication venue: CНУ ім. В. Даля
Publication date: 01/01/2016
Field of study

Рассмотрены особенности использования модели параллельных вычислений на основе потоков при разработке приложений для платформы Android. Кратко проанализированы составляющие эту платформу программные средства многопоточного распараллеливания в операционной системе Linux, используемые при этом средства языка программирования Java, а также объектно- ориентированные возможности разработки многопоточных приложений для операционной системы Android. Исследована эффективность рассмотренной модели распараллеливания вычислений при разработке Android-приложений, требующих длительных вычислений и интенсивного отображения результатов вычислений на экране мобильного устройства, функционирующего под управлением ОС Android.Розглянуті особливості використання моделі паралельних обчислень на основі потоків при розробці додатків для платформи Android. Коротко проаналізовані складові цієї платформи, такі як програмні засоби багато-потокового розпаралелювання в операційній системі Linux, використовувані при цьому засоби мови програмування Java, а також об'єктно- орієнтовані можливості розробки багато-потокових додатків для операційної системи Android. Досліджена ефективність розглянутої моделі розпаралелювання обчислень при розробці Android-додатків, які потребують тривалих обчислень і інтенсивного відображення результатів обчислень на екрані мобільного пристрою, який функціонує під управлінням ОС Android.Considered the features of the use of parallel computing model based on threads in developing applications for the Android platform. Briefly analyze the components of the platform software multithreaded parallelization on Linux operating system used with the Java programming language tools and object-oriented features of the development of multithreaded applications for the Android OS. Were analyzed the efficiency of this model of parallel computing in the development of Android-applications that require long calculation and intensive display computing results on the screen of the mobile device that operates on Android OS

eEast-UkrNUIR (Electronic Volodymyr Dahl East Ukrainian National University Institutional Repository)

Approximate Computing Strategies for Low-Overhead Fault Tolerance in Safety-Critical Applications

Author: Rodrigues Gennaro Severino
Publication venue
Publication date: 01/01/2019
Field of study

This work studies the reliability of embedded systems with approximate computing on software and hardware designs. It presents approximate computing methods and proposes approximate fault tolerance techniques applied to programmable hardware and embedded software to provide reliability at low computational costs. The objective of this thesis is the development of fault tolerance techniques based on approximate computing and proving that approximate computing can be applied to most safety-critical systems. It starts with an experimental analysis of the reliability of embedded systems used at safety-critical projects. Results show that the reliability of single-core systems, and types of errors they are sensitive to, differ from multicore processing systems. The usage of an operating system and two different parallel programming APIs are also evaluated. Fault injection experiment results show that embedded Linux has a critical impact on the system’s reliability and the types of errors to which it is most sensitive. Traditional fault tolerance techniques and parallel variants of them are evaluated for their fault-masking capability on multicore systems. The work shows that parallel fault tolerance can indeed not only improve execution time but also fault-masking. Lastly, an approximate parallel fault tolerance technique is proposed, where the system abandons faulty execution tasks. This first approximate computing approach to fault tolerance in parallel processing systems was able to improve the reliability and the fault-masking capability of the techniques, significantly reducing errors that would cause system crashes. Inspired by the conflict between the improvements provided by approximate computing and the safety-critical systems requirements, this work presents an analysis of the applicability of approximate computing techniques on critical systems. The proposed techniques are tested under simulation, emulation, and laser fault injection experiments. Results show that approximate computing algorithms do have a particular behavior, different from traditional algorithms. The approximation techniques presented and proposed in this work are also used to develop fault tolerance techniques. Results show that those new approximate fault tolerance techniques are less costly than traditional ones and able to achieve almost the same level of error masking.Este trabalho estuda a confiabilidade de sistemas embarcados com computação aproximada em software e projetos de hardware. Ele apresenta métodos de computação aproximada e técnicas aproximadas para tolerância a falhas em hardware programável e software embarcado que provêem alta confiabilidade a baixos custos computacionais. O objetivo desta tese é o desenvolvimento de técnicas de tolerância a falhas baseadas em computação aproximada e provar que este paradigma pode ser usado em sistemas críticos. O texto começa com uma análise da confiabilidade de sistemas embarcados usados em sistemas de tolerância crítica. Os resultados mostram que a resiliência de sistemas singlecore, e os tipos de erros aos quais eles são mais sensíveis, é diferente dos multi-core. O uso de sistemas operacionais também é analisado, assim como duas APIs de programação paralela. Experimentos de injeção de falhas mostram que o uso de Linux embarcado tem um forte impacto na confiabilidade do sistema. Técnicas tradicionais de tolerância a falhas e variações paralelas das mesmas são avaliadas. O trabalho mostra que técnicas de tolerância a falhas paralelas podem de fato melhorar não apenas o tempo de execução da aplicação, mas também seu mascaramento de erros. Por fim, uma técnica de tolerância a falhas paralela aproximada é proposta, onde o sistema abandona instâncias de execuções que apresentam falhas. Esta primeira experiência com computação aproximada foi capaz de melhorar a confiabilidade das técnicas previamente apresentadas, reduzindo significativamente a ocorrência de erros que provocam um crash total do sistema. Inspirado pelo conflito entre as melhorias trazidas pela computação aproximada e os requisitos dos sistemas críticos, este trabalho apresenta uma análise da aplicabilidade de computação aproximada nestes sistemas. As técnicas propostas são testadas sob experimentos de injeção de falhas por simulação, emulação e laser. Os resultados destes experimentos mostram que algoritmos aproximados possuem um comportamento particular que lhes é inerente, diferente dos tradicionais. As técnicas de aproximação apresentadas e propostas no trabalho são também utilizadas para o desenvolvimento de técnicas de tolerância a falhas aproximadas. Estas novas técnicas possuem um custo menor que as tradicionais e são capazes de atingir o mesmo nível de mascaramento de erros

Lume 5.8

Approximating Generalized Network Design under (Dis)economies of Scale with Applications to Energy Efficiency

Author: Emek Yuval
Kutten Shay
Lavi Ron
Shi Yangguang
Publication venue: 'Association for Computing Machinery (ACM)'
Publication date: 14/03/2018
Field of study

In a generalized network design (GND) problem, a set of resources are assigned to multiple communication requests. Each request contributes its weight to the resources it uses and the total load on a resource is then translated to the cost it incurs via a resource specific cost function. For example, a request may be to establish a virtual circuit, thus contributing to the load on each edge in the circuit. Motivated by energy efficiency applications, recently, there is a growing interest in GND using cost functions that exhibit (dis)economies of scale ((D)oS), namely, cost functions that appear subadditive for small loads and superadditive for larger loads. The current paper advances the existing literature on approximation algorithms for GND problems with (D)oS cost functions in various aspects: (1) we present a generic approximation framework that yields approximation results for a much wider family of requests in both directed and undirected graphs; (2) our framework allows for unrelated weights, thus providing the first non-trivial approximation for the problem of scheduling unrelated parallel machines with (D)oS cost functions; (3) our framework is fully combinatorial and runs in strongly polynomial time; (4) the family of (D)oS cost functions considered in the current paper is more general than the one considered in the existing literature, providing a more accurate abstraction for practical energy conservation scenarios; and (5) we obtain the first approximation ratio for GND with (D)oS cost functions that depends only on the parameters of the resources' technology and does not grow with the number of resources, the number of requests, or their weights. The design of our framework relies heavily on Roughgarden's smoothness toolbox (JACM 2015), thus demonstrating the possible usefulness of this toolbox in the area of approximation algorithms.Comment: 39 pages, 1 figure. An extended abstract of this paper is to appear in the 50th Annual ACM Symposium on the Theory of Computing (STOC 2018

arXiv.org e-Print Archive

Crossref