118 research outputs found
A Comparative Study of Scheduling Techniques for Multimedia Applications on SIMD Pipelines
Parallel architectures are essential in order to take advantage of the
parallelism inherent in streaming applications. One particular branch of these
employ hardware SIMD pipelines. In this paper, we analyse several scheduling
techniques, namely ad hoc overlapped execution, modulo scheduling and modulo
scheduling with unrolling, all of which aim to efficiently utilize the special
architecture design. Our investigation focuses on improving throughput while
analysing other metrics that are important for streaming applications, such as
register pressure, buffer sizes and code size. Through experiments conducted on
several media benchmarks, we present and discuss trade-offs involved when
selecting any one of these scheduling techniques.Comment: Presented at DATE Friday Workshop on Heterogeneous Architectures and
Design Methods for Embedded Image Systems (HIS 2015) (arXiv:1502.07241
Finding parallel patterns through static analysis in C++ applications
Since The 'Free Lunch' Of Processor Performance Is Over, Parallelism Has Become The New Trend In Hardware And Architecture Design. However, Parallel Resources Deployed In Data Centers Are Underused In Many Cases, Given That Sequential Programming Is Still Deeply Rooted In Current Software Development. To Address This Problem, New Methodologies And Techniques For Parallel Programming Have Been Progressively Developed. For Instance, Parallel Frameworks, Offering Programming Patterns, Allow Expressing Concurrency In Applications To Better Exploit Parallel Hardware. Nevertheless, A Large Portion Of Production Software, From A Broad Range Of Scientific And Industrial Areas, Is Still Developed Sequentially. Considering That These Software Modules Contain Thousands, Or Even Millions, Of Lines Of Code, An Extremely Large Amount Of Effort Is Needed To Identify Parallel Regions. To Pave The Way In This Area, This Paper Presents Parallel Pattern Analyzer Tool, A Software Component That Aids The Discovery And Annotation Of Parallel Patterns In Source Codes. This Tool Simplifies The Transformation Of Sequential Source Code To Parallel. Specifically, We Provide Support For Identifying Map, Farm, And Pipeline Parallel Patterns And Evaluate The Quality Of The Detection For A Set Of Different C++ Applications.This work was partially supported by the EU Projects ICT 644235 “RePhrase: Refactoring Parallel Heterogeneous Resource-Aware Applications” and the FP7 609666 “Repara: Reengineering and Enabling Performance and Power of Application
A Simple MPI Library for Lightweight Manycore Processors
TCC(graduação) - Universidade Federal de Santa Catarina. Centro Tecnológico. Ciências da Computação.Nas últimas décadas, melhorar o desempenho de núcleos individuais e aumentar o nú-
mero de núcleos de alta potência por chip foram as principais tendências na construção
de processadores. No entanto, esta combinação levou não apenas a um aumento no poder
computacional, mas também a um aumento considerável no seu consumo de energia. Há
uma preocupação crescente entre a comunidade científica a respeito da eficiência ener-
gética dos supercomputadores modernos. Nos últimos anos, muitos esforços têm sido
feitos em pesquisas, buscando soluções alternativas capazes de resolver este problema de
escalabilidade e eficiência energética. O desempenho e a eficiência energética providos
pelos manycores leves são inegáveis. Contudo, a falta de suporte avançado e portátil
para esses processadores, como interfaces padrão de alto desempenho para o desenvolvi-
mento de código portável, torna o desenvolvimento de software um desafio. Atualmente,
duas abordagens são empregadas tentando aumentar a programabilidade em manycores
leves: Sistemas operacionais (SOs) e sistemas de execução (runtimes). A primeira fornece
portabilidade mas expõe interfaces de programação complexas no nível do SO aos desen-
volvedores. Já a segunda se concentra em fornecer interfaces ricas e de alto desempenho,
as quais são específicas do fabricante e resultam em software não portável. Portanto, as
soluções existentes forçam os desenvolvedores a escolher entre a portabilidade do software
ou um processo de desenvolvimento mais rápido. Para resolver esse dilema, neste traba-
lho é proposta uma biblioteca MPI leve e portável (LWMPI) projetada do zero para lidar
com as restrições e complexidades dos manycores leves. A LWMPI foi integrada a um
SO direcionado a esses processadores, oferecendo assim uma melhor programabilidade e
portabilidade implícita para manycores leves, sem incorrer em sobrecargas de desempe-
nho excessivas que inviabilizariam o seu uso. Para fornecer uma avaliação abrangente da
LWMPI, foram utilizadas três aplicações de uma suíte de benchmarking representativa,
usada para avaliar o desempenho de manycores leves, além de um benchmark sintético.
Os resultados obtidos no processador Kalray MPPA-256 revelaram que a LWMPI atinge
uma performance e uma escalabilidade de desempenho melhor do que uma solução feita
especificamente para essa análise e que se utiliza puramente das abstrações de IPC do
Nanvix, ao mesmo tempo em que oferece uma interface de programação mais rica.In the last decades, improving the performance of individual cores and increasing the
number of high power cores per chip were the main trends in the construction of proces-
sors. However, this combination led not only to an increase in the computing capacity, but
also to a considerable growth in energy consumption. There is a crescent concern among
the scientific community about the energy efficiency of modern supercomputers. In the
last years, many efforts have been made in research, searching for alternative solutions
capable of solving this problem of scalability and energy efficiency. The performance and
energy efficiency provided by lightweight manycores is undeniable. Although, the lack of
rich and portable support for these processors, such as high-performance standard inter-
faces that deliver portable source codes, makes software development a challenging task.
Currently, two approaches are employed trying to improve programmability in lightweight
manycores: Operating Systems (OSes) and baremetal runtime systems. The former pro-
vides portability but exposes complex OS-level programming interfaces to developers.
The latter focuses on providing rich and high performance interfaces, which are vendor-
specific and yield to non-portable software. Thus, the existing solutions force software
engineers to choose between software portability or a faster development process. To
address this dilemma, we propose a portable and lightweight MPI library (LWMPI) de-
signed from scratch to cope with restrictions and intricacies of lightweight manycores. We
integrated LWMPI into a distributed OS that targets these processors, thus featuring bet-
ter programmability and implicit portability for lightweight manycores, without incurring
excessive performance overheads that could hinder its use. To deliver a comprehensive
evaluation of LWMPI, we relied on three applications from a representative benchmark
suite used to assess the performance of lightweight manycores, and a synthetic benchmark.
Our results obtained on the Kalray MPPA-256 processor unveiled that LWMPI present
better performance and scalability when compared with a specifically made solution that
uses the raw Nanvix Inter-Process Communication (IPC) abstractions, while exposing a
richer programming interface
- …