63 research outputs found
A partition methodology to develop data flow dominated embedded systems
Comunicação apresentada no International Workshop on Model-Based Methodologies for Pervasive and Embedded Software (MOMPES 2004), 1, Hamilton, Ontario, Canada, 15-18 June 2004.This paper proposes an automatic partition methodology oriented to develop
data flow dominated embedded systems. The target architecture is
CPU-based with reconfigurable devices on attached board(s), which closely
matches the PSM meta-model applied to system modelling. A PSM flow
graph was developed to represent the system during the partitioning process.
The partitioning task applies known optimization algorithms - tabu search
and cluster growth algorithms - which were enriched with new elements to
reduce computation time and to achieve higher quality partition solutions.
These include the closeness function that guides cluster growth algorithm,
which dynamically adapts to the type of object and partition under analysis.
The methodology was applied to two case studies, and some evaluation
results are presented
HEP-Frame: A software engineered framework to aid the development and efficient multicore execution of scientific code
This communication presents an evolutionary soft- ware prototype of a user-centered Highly Efficient Pipelined Framework, HEP-Frame, to aid the development of sustainable parallel scientific code with a flexible pipeline structure. HEP- Frame is the result of a tight collaboration between computational scientists and software engineers: it aims to improve scientists coding productivity, ensuring an efficient parallel execution on a wide set of multicore systems, with both HPC and HTC techniques. Current prototype complies with the requirements of an actual scientific code, includes desirable sustainability features and supports at compile time additional plugin interfaces for other scientific fields. The porting and development productivity was assessed and preliminary efficiency results are promising.This work was supported by FCT (Fundação para a Ciência e Tecnologia) within Project Scope (UID/CEC/00319/2013), by LIP (Laboratório de Instrumentação e Física Experimental de Partículas) and by Project Search-ON2 (NORTE-07-0162- FEDER-000086), co-funded by the North Portugal Regional Operational Programme (ON.2 - O Novo Norte), under the National Strategic Reference Framework, through the European Regional Development Fund
Tuning pipelined scientific data analyses for efficient multicore execution
Scientific data analyses often apply a pipelined sequence of computational tasks to independent datasets. Each task in the pipeline captures and processes a dataset element, may be dependent on other tasks in the pipeline, may have a different computational complexity and may be filtered out from progressing in the pipeline. The goal of this work is to develop an efficient scheduler that automatically (i) manages a parallel data reading and an adequate data structure creation, (ii) adaptively defines the most efficient order of pipeline execution of the tasks, considering their inter-dependence and both the filtering out rate and the computational weight, and (iii) manages the parallel execution of the computational tasks in a multicore system, applied to the same or to different dataset elements. A real case study data analysis application from High Energy Physics (HEP) was used to validate the efficiency of this scheduler. Preliminary results show an impressive performance improvement of the pipeline tuning when compared to the original sequential HEP code (up to a 35x speedup in a dual 12-core system), and also show significant performance speedups over conventional parallelization approaches of this case study application (up to 10x faster in the same system).Project Search-ON2 (NORTE-07-0162- FEDER-000086), co-funded by the North Portugal Regional Operational Programme (ON.2 - O Novo Norte), under the National Strategic Reference Framework, through the European Regional Development Fund
Removing inefficiencies from scientific code : the study of the Higgs boson couplings to top quarks
Publicado em "Computational science and its applications – ICCSA 2014 : proceedings", Series : Lecture notes in computer science, vol. 8582This paper presents a set of methods and techniques to remove inefficiencies in a data analysis application used in searches by the ATLAS Experiment at the Large Hadron Collider. Profiling scientific code helped to pinpoint design and runtime inefficiencies, the former due to coding and data structure design. The data analysis code used by groups doing searches in the ATLAS Experiment contributed to clearly identify some of these inefficiencies and to give suggestions on how to prevent and overcome those common situations in scientific code to improve the efficient use of available computational resources in a parallel homogeneous platform.This work is funded by National Funds through the FCT - Fundaçãoo para a Ciência e a Tecnologia (Portuguese Foundation for Science and Technology) within project PEst-OE/EEI/UI0752/2014, by LIP (Laborat ́orio de Instrumentação e Física Experimental de Partículas), and the SeARCH cluster (REEQ/443/EEI/2005)
Edgar : a platform for hardware/software codesign
Codesign is a unified methodology to develop complex systems with hardware
and software components. EDgAR, a platform for hardware/software codesign
is described, which is intended to prototype complex digital systems. It employs
programmable logic devices (MACHs and FPGAs) and a transputer-based parallel
architecture. This platform and its associated methodology reduce the
systems production cost, decreasing the time for the design and the test of the
prototypes. The EDgAR supporting tools are introduced, which were conceived
to specify systems at an high-level of abstraction, with a standard language and
to allow a high degree of automation on the synthesis process. This platform
was used to emulate an integrated circuit for image processing purposes
A generic and highly efficient parallel variant of Borůvka's algorithm
This paper presents (i) a parallel, platformindependent variant of Borůvka's algorithm, an efficient Minimum Spanning Tree (MST) solver, and (ii) a comprehensive comparison of MST-solver implementations, both on multi-core CPU-chips and GPUs. The core of our variant is an effective and explicit contraction of the graph. Our multi-core CPU implementation scales linearly up to 8 threads, whereas the GPU implementation performs considerably better than the optimal number of threads running on the CPU. We also show that our implementations outperform all other parallel MST-solver implementations in (ii), for a broad set of publicly available roadnetwork graphs.info:eu-repo/semantics/publishedVersio
Paralelização de algoritmos de enumeração para o problema do vector mais curto em sistemas de memória partilhada e distribuída
A criptografia baseada em retículos tem vindo a tornar-se um tópico central ao longo da última década, uma vez que se acredita que este tipo de criptografia seja resistente a ataques infligidos com computadores quânticos. A segurança desta criptografia é medida pela eficácia e praticabilidade dos algoritmos que resolvem problemas centrais em retículos, como o problema do vector mais curto (PVC), e é, por isso, importante determinar qual o desempenho máximo destes algoritmos em arquitecturas computacionais de alto rendimento.
Neste sentido, este artigo apresenta, pela primeira vez, um estudo detalhado sobre o desempenho dos dois mais promissores algoritmos de resolução do PVC, o ENUM e uma variante eficiente da enumeração de Schnorr-Euchner, com e sem poda extrema. Em particular, são propostas versões paralelas destes algoritmos, desenvolvidas para óptimo balanço de carga e, consequentemente, melhor desempenho.
Conduziu-se uma extensa série de testes, quer em memória partilhada, para as variantes sem poda, quer em memória distribuída, para as variantes com poda. Os resultados mostram que as implementações em memória partilhada atingem, em certos casos, acelerações lineares até 16 \textit{threads}. As implementações em memória distribuída, por seu turno, são aceleradas em cerca de 13 vezes para 16 processos, permitindo a resolução do PVC em retículos em dimensão 80 em menos de 250 segundos.Fundação para a Ciência e a Tecnologia (FCT
Redes de Petri e VHDL na prototipagem rápida de sistemas digitais
O objectivo principal deste artigo é exemplificar a utilização de uma metodologia de especificação de sistemas digitais, baseada em Redes de Petri orientadas por objectos, para obter de uma forma rápida e simplificada um protótipo em VHDL do sistema pretendido. É considerado para exemplificação um sistema digital, para o qual se efectua a especificação no modelo RdP-shobi e a geração automática de código VHDL. Este exemplo permite concluir acerca da utilidade desta metodologia no projecto de sistemas digitais, suportado por princípios de orientação por objectos e por uma ferramenta de EDA desenvolvida propositadamente para o efeito
JaSkel: a java skeleton-based framework for structured cluster and grid computing
This paper presents JaSkel, a skeleton-based framework to develop parallel and grid applications. The framework provides a set of Java abstract classes as a skeleton catalogue, which implements recurring parallel interaction paradigms. This approach aims to improve code efficiency and portability. It also helps to structure scalable applications through the refinement and composition of skeletons. Evaluation results show that using the provided skeletons do contribute to improve both application development time and execution performanceFundação para a Ciência e a Tecnologia (FCT) - PPC-VM Project(POSI/CHS/47158/2002); Project SeARCH (contract REEQ/443/2001)
High fidelity walkthroughs in archaeology sites
Comunicação apresentada no 6th International Symposium on Virtual Reality, Archaeology and Cultural Heritage (VAST 2005), Pisa, Italy, 8-11 Novembro 2005.Fast and affordable computing systems currently support walkthroughs into virtual reconstructed sites, with fast frame
rate generation of synthetic images. However, archaeologists still complain about the lack of realism in these interactive tours,
mainly due to the false ambient illumination. Accurate visualizations require physically based global illumination models
to render the scenes, which are computationally too demanding.
Faster systems and novel rendering techniques are required: current clusters provide a feasible and
affordable path towards these goals, and we developed a framework to support smooth virtual walkthroughs,
using progressive rendering to converge to high fidelity images whenever computing power surplus is available.
This framework exploits spatial and temporal coherence among
successive frames, serving multiple clients that share and
interact with the same virtual model, while maintaining each its
own view of the model. It is based on a three-tier architecture:
the outer layer embodies light-weight visualization clients, which
perform all the user interactions and display the final images
using the available graphics hardware; the inner layer is a
parallel version of a physically based ray tracer running on a
cluster of off-the-shelf PCs; in the middle layer lies the shading
management agent (SMA), which monitors the clients' states,
supplies each with properly shaded 3D points, maintains a cache of
previously rendered geometry and requests relevant shading samples
to the parallel renderer, whenever required.
A prototype of a high fidelity walkthrough in the archaeologic
virtual model of the roman town of Bracara Augusta was developed,
and the current evaluation tests aimed to measure the performance
improvements due to the use of SMA caches and associated parallel
rendering capabilities. Preliminary results show that interactive
frame rates are sustainable and the system is highly responsive.Fundação para a Ciência e Tecnologia (FCT) - POSI/CHS/42041/2001
- …