73,102 research outputs found
A Modeling Approach based on UML/MARTE for GPU Architecture
Nowadays, the High Performance Computing is part of the context of embedded
systems. Graphics Processing Units (GPUs) are more and more used in
acceleration of the most part of algorithms and applications. Over the past
years, not many efforts have been done to describe abstractions of applications
in relation to their target architectures. Thus, when developers need to
associate applications and GPUs, for example, they find difficulty and prefer
using API for these architectures. This paper presents a metamodel extension
for MARTE profile and a model for GPU architectures. The main goal is to
specify the task and data allocation in the memory hierarchy of these
architectures. The results show that this approach will help to generate code
for GPUs based on model transformations using Model Driven Engineering (MDE).Comment: Symposium en Architectures nouvelles de machines (SympA'14) (2011
Loo.py: transformation-based code generation for GPUs and CPUs
Today's highly heterogeneous computing landscape places a burden on
programmers wanting to achieve high performance on a reasonably broad
cross-section of machines. To do so, computations need to be expressed in many
different but mathematically equivalent ways, with, in the worst case, one
variant per target machine.
Loo.py, a programming system embedded in Python, meets this challenge by
defining a data model for array-style computations and a library of
transformations that operate on this model. Offering transformations such as
loop tiling, vectorization, storage management, unrolling, instruction-level
parallelism, change of data layout, and many more, it provides a convenient way
to capture, parametrize, and re-unify the growth among code variants. Optional,
deep integration with numpy and PyOpenCL provides a convenient computing
environment where the transition from prototype to high-performance
implementation can occur in a gradual, machine-assisted form
BlogForever D2.6: Data Extraction Methodology
This report outlines an inquiry into the area of web data extraction, conducted within the context of blog preservation. The report reviews theoretical advances and practical developments for implementing data extraction. The inquiry is extended through an experiment that demonstrates the effectiveness and feasibility of implementing some of the suggested approaches. More specifically, the report discusses an approach based on unsupervised machine learning that employs the RSS feeds and HTML representations of blogs. It outlines the possibilities of extracting semantics available in blogs and demonstrates the benefits of exploiting available standards such as microformats and microdata. The report proceeds to propose a methodology for extracting and processing blog data to further inform the design and development of the BlogForever platform
GCC-Plugin for Automated Accelerator Generation and Integration on Hybrid FPGA-SoCs
In recent years, architectures combining a reconfigurable fabric and a
general purpose processor on a single chip became increasingly popular. Such
hybrid architectures allow extending embedded software with application
specific hardware accelerators to improve performance and/or energy efficiency.
Aiding system designers and programmers at handling the complexity of the
required process of hardware/software (HW/SW) partitioning is an important
issue. Current methods are often restricted, either to bare-metal systems, to
subsets of mainstream programming languages, or require special coding
guidelines, e.g., via annotations. These restrictions still represent a high
entry barrier for the wider community of programmers that new hybrid
architectures are intended for. In this paper we revisit HW/SW partitioning and
present a seamless programming flow for unrestricted, legacy C code. It
consists of a retargetable GCC plugin that automatically identifies code
sections for hardware acceleration and generates code accordingly. The proposed
workflow was evaluated on the Xilinx Zynq platform using unmodified code from
an embedded benchmark suite.Comment: Presented at Second International Workshop on FPGAs for Software
Programmers (FSP 2015) (arXiv:1508.06320
- âŠ