Search CORE

10 research outputs found

Towards co-designed optimizations in parallel frameworks: A MapReduce case study

Author: Barrett Colin
Kotselidis Christos
Luján Mikel
Publication venue
Publication date: 01/01/2016
Field of study

The explosion of Big Data was followed by the proliferation of numerous complex parallel software stacks whose aim is to tackle the challenges of data deluge. A drawback of a such multi-layered hierarchical deployment is the inability to maintain and delegate vital semantic information between layers in the stack. Software abstractions increase the semantic distance between an application and its generated code. However, parallel software frameworks contain inherent semantic information that general purpose compilers are not designed to exploit. This paper presents a case study demonstrating how the specific semantic information of the MapReduce paradigm can be exploited on multicore architectures. MR4J has been implemented in Java and evaluated against hand-optimized C and C++ equivalents. The initial observed results led to the design of a semantically aware optimizer that runs automatically without requiring modification to application code. The optimizer is able to speedup the execution time of MR4J by up to 2.0x. The introduced optimization not only improves the performance of the generated code, during the map phase, but also reduces the pressure on the garbage collector. This demonstrates how semantic information can be harnessed without sacrificing sound software engineering practices when using parallel software frameworks.Comment: 8 page

arXiv.org e-Print Archive

Crossref

The University of Manchester - Institutional Repository

Research and Education in Computational Science and Engineering

Over the past two decades the field of computational science and engineering (CSE) has penetrated both basic and applied research in academia, industry, and laboratories to advance discovery, optimize systems, support decision-makers, and educate the scientific and engineering workforce. Informed by centuries of theory and experiment, CSE performs computational experiments to answer questions that neither theory nor experiment alone is equipped to answer. CSE provides scientists and engineers of all persuasions with algorithmic inventions and software systems that transcend disciplines and scales. Carried on a wave of digital technology, CSE brings the power of parallelism to bear on troves of data. Mathematics-based advanced computing has become a prevalent means of discovery and innovation in essentially all areas of science, engineering, technology, and society; and the CSE community is at the core of this transformation. However, a combination of disruptive developments---including the architectural complexity of extreme-scale computing, the data revolution that engulfs the planet, and the specialization required to follow the applications to new frontiers---is redefining the scope and reach of the CSE endeavor. This report describes the rapid expansion of CSE and the challenges to sustaining its bold advances. The report also presents strategies and directions for CSE research and education for the next decade.Comment: Major revision, to appear in SIAM Revie

arXiv.org e-Print Archive

Infoscience - École polytechnique fédérale de Lausanne

SynCron: Efficient Synchronization Support for Near-Data-Processing Architectures

Author: Fernandez Ivan
Giannoula Christina
Goumas Georgios
Gómez-Luna Juan
Karakostas Vasileios
Koziris Nectarios
Mutlu Onur
Orosa Lois
Papadopoulou Nikela
Vijaykumar Nandita
Publication venue
Publication date: 13/02/2021
Field of study

Near-Data-Processing (NDP) architectures present a promising way to alleviate data movement costs and can provide significant performance and energy benefits to parallel applications. Typically, NDP architectures support several NDP units, each including multiple simple cores placed close to memory. To fully leverage the benefits of NDP and achieve high performance for parallel workloads, efficient synchronization among the NDP cores of a system is necessary. However, supporting synchronization in many NDP systems is challenging because they lack shared caches and hardware cache coherence support, which are commonly used for synchronization in multicore systems, and communication across different NDP units can be expensive. This paper comprehensively examines the synchronization problem in NDP systems, and proposes SynCron, an end-to-end synchronization solution for NDP systems. SynCron adds low-cost hardware support near memory for synchronization acceleration, and avoids the need for hardware cache coherence support. SynCron has three components: 1) a specialized cache memory structure to avoid memory accesses for synchronization and minimize latency overheads, 2) a hierarchical message-passing communication protocol to minimize expensive communication across NDP units of the system, and 3) a hardware-only overflow management scheme to avoid performance degradation when hardware resources for synchronization tracking are exceeded. We evaluate SynCron using a variety of parallel workloads, covering various contention scenarios. SynCron improves performance by 1.27

\times

on average (up to 1.78

\times

) under high-contention scenarios, and by 1.35

\times

on average (up to 2.29

\times

) under low-contention real applications, compared to state-of-the-art approaches. SynCron reduces system energy consumption by 2.08

\times

on average (up to 4.25

\times

).Comment: To appear in the 27th IEEE International Symposium on High-Performance Computer Architecture (HPCA-27

arXiv.org e-Print Archive

Enlighten

Thermoelastic problem in the setting of dual-phase-lag heat conduction : existence and uniqueness of a weak solution

Author: Maes Frederick
Van Bockstal Karel
Publication venue: 'Elsevier BV'
Publication date: 01/01/2021
Field of study

Ghent University Academic Bibliography

A Survey on Resiliency Techniques in Cloud Computing Infrastructures and Applications

Author: Biswanath Mukherjee
Carlos Colman-Meixner
Chris Develder
Massimo Tornatore
Publication venue: 'Institute of Electrical and Electronics Engineers (IEEE)'
Publication date
Field of study

Crossref

Programação paralela baseada em skeletons para processamento de imagens 3D

Author: Lourenço Pedro Miguel Galvão Farelo
Publication venue
Publication date: 01/03/2016
Field of study

A melhoria de desempenho obtida através da computação paralela permitiu o aumento da sua utilização na resolução de problemas computacionalmente exigentes em muitas áreas em ciência e engenharia.No entanto, devido à complexidade da criação de programas paralelos são necessárias ferramentas que simplifiquem o seu desenvolvimento. Um tipo de problemas que é resolvido com programação paralela é o processamento de imagem em áreas como a Ciência dos Materiais e a Medicina. À semelhança de outras áreas, também nestas e para este tipo de problemas, é possível encontrar soluções e estratégias de paralelização comuns, e que capturam o conhecimento acumulado ao longo do tempo. O conhecimento sobre estes padrões e a sua disponibilização permitem assim simplificar o desenvolvimento desses programas paralelos mas é necessário existirem ferramentas que os implementem com um desempenho adequado. Os padrões devem também ser de fácil adaptação e reutilização em problemas similares, melhorando a produtividade no desenvolvimento de programas em diversas áreas que necessitem de processamento de imagem. No contexto da computação paralela, em geral, existem já ferramentas que disponibilizam padrões de paralelização permitindo que não peritos na área possam desenvolver os seus programas de um modo mais simples. Os algorithmic skeletons são uma das soluções existentes para capturar esses padrões, existindo frameworks que os implementam libertando os programadores da necessidade do conhecerem os detalhes da arquitetura alvo. Os algorithmic skeletons podem também ser aplicados aos problemas de processamento de imagem, capturando diretamente ou por composição padrões nesses domínio. No entanto, as ferramentas de algorithmic skeletons existentes não disponibilizam padrões otimizados com propriedades adaptativas que possam ter em conta, quer as características do sistema em execução (e.g. carga do sistema versus consumo de energia, etc.), quer da imagem em processamento (e.g. imagens com mais ou menos objetos). Neste contexto, este trabalho começou por estudar e comparar as implementações de um algoritmo de processamento de imagem usando dois framework de algorithmic skeletons que permitem gerar código para GPGPUs, de modo a identificar os padrões subjacentes e o framework mais adequado. Seguiu-se como contribuição a extensão do framework FastFlow com uma arquitetura de medição do estado de execução do skeleton farm, e a extensão deste com propriedades adaptativas. É possível alterar o número de workers de uma farm, controlar a distribuição de tarefas pelos vários workers, e escolher se a a execução do skeleton é feita em CPU ou GPU

Repositório da Universidade Nova de Lisboa