4,028 research outputs found
Just In Time Assembly (JITA) - A Run Time Interpretation Approach for Achieving Productivity of Creating Custom Accelerators in FPGAs
The reconfigurable computing community has yet to be successful in allowing programmers to access FPGAs through traditional software development flows. Existing barriers that prevent programmers from using FPGAs include: 1) knowledge of hardware programming models, 2) the need to work within the vendor specific CAD tools and hardware synthesis. This thesis presents a series of published papers that explore different aspects of a new approach being developed to remove the barriers and enable programmers to compile accelerators on next generation reconfigurable manycore architectures. The approach is entitled Just In Time Assembly (JITA) of hardware accelerators. The approach has been defined to allow hardware accelerators to be built and run through software compilation and run time interpretation outside of CAD tools and without requiring each new accelerator to be synthesized. The approach advocates the use of libraries of pre-synthesized components that can be referenced through symbolic links in a similar fashion to dynamically linked software libraries. Synthesis still must occur but is moved out of the application programmers software flow and into the initial coding process that occurs when programming patterns that define a Domain Specific Language (DSL) are first coded. Programmers see no difference between creating software or hardware functionality when using the DSL. A new run time interpreter is introduced to assemble the individual pre-synthesized hardware accelerators that comprise the accelerator functionality within a configurable tile array of partially reconfigurable slots at run time. Quantitative results are presented that compares utilization, performance, and productivity of the approach to what would be achieved by full custom accelerators created through traditional CAD flows using hardware programming models and passing through synthesis
A Hybrid Partially Reconfigurable Overlay Supporting Just-In-Time Assembly of Custom Accelerators on FPGAs
The state of the art in design and development flows for FPGAs are not sufficiently mature to allow programmers to implement their applications through traditional software development flows. The stipulation of synthesis as well as the requirement of background knowledge on the FPGAs\u27 low-level physical hardware structure are major challenges that prevent programmers from using FPGAs. The reconfigurable computing community is seeking solutions to raise the level of design abstraction at which programmers must operate, and move the synthesis process out of the programmers\u27 path through the use of overlays. A recent approach, Just-In-Time Assembly (JITA), was proposed that enables hardware accelerators to be assembled at runtime, all from within a traditional software compilation flow. The JITA approach presents a promising path to constructing hardware designs on FPGAs using pre-synthesized parallel programming patterns, but suffers from two major limitations. First, all variant programming patterns must be pre-synthesized. Second, conditional operations are not supported.
In this thesis, I present a new reconfigurable overlay, URUK, that overcomes the two limitations imposed by the JITA approach. Similar to the original JITA approach, the proposed URUK overlay allows hardware accelerators to be constructed on FPGAs through software compilation flows. To this basic capability, URUK adds additional support to enable the assembly of presynthesized fine-grained computational operators to be assembled within the FPGA.
This thesis provides analysis of URUK from three different perspectives; utilization, performance, and productivity. The analysis includes comparisons against High-Level Synthesis (HLS) and the state of the art approach to creating static overlays. The tradeoffs conclude that URUK can achieve approximately equivalent performance for algebra operations compared to HLS custom accelerators, which are designed with simple experience on FPGAs. Further, URUK shows a high degree of flexibility for runtime placement and routing of the primitive operations. The analysis shows how this flexibility can be leveraged to reduce communication overhead among tiles, compared to traditional static overlays. The results also show URUK can enable software programmers without any hardware skills to create hardware accelerators at productivity levels consistent with software development and compilation
Actors: The Ideal Abstraction for Programming Kernel-Based Concurrency
GPU and multicore hardware architectures are commonly
used in many different application areas to accelerate problem solutions
relative to single CPU architectures. The typical approach to accessing
these hardware architectures requires embedding logic into the programming
language used to construct the application; the two primary forms
of embedding are: calls to API routines to access the concurrent functionality,
or pragmas providing concurrency hints to a language compiler
such that particular blocks of code are targeted to the concurrent functionality.
The former approach is verbose and semantically bankrupt,
while the success of the latter approach is restricted to simple, static
uses of the functionality.
Actor-based applications are constructed from independent, encapsulated
actors that interact through strongly-typed channels. This paper
presents a first attempt at using actors to program kernels targeted at
such concurrent hardware. Besides the glove-like fit of a kernel to the actor
abstraction, quantitative code analysis shows that actor-based kernels
are always significantly simpler than API-based coding, and generally
simpler than pragma-based coding. Additionally, performance measurements
show that the overheads of actor-based kernels are commensurate
to API-based kernels, and range from equivalent to vastly improved for
pragma-based annotations, both for sample and real-world applications
ASCR/HEP Exascale Requirements Review Report
This draft report summarizes and details the findings, results, and
recommendations derived from the ASCR/HEP Exascale Requirements Review meeting
held in June, 2015. The main conclusions are as follows. 1) Larger, more
capable computing and data facilities are needed to support HEP science goals
in all three frontiers: Energy, Intensity, and Cosmic. The expected scale of
the demand at the 2025 timescale is at least two orders of magnitude -- and in
some cases greater -- than that available currently. 2) The growth rate of data
produced by simulations is overwhelming the current ability, of both facilities
and researchers, to store and analyze it. Additional resources and new
techniques for data analysis are urgently needed. 3) Data rates and volumes
from HEP experimental facilities are also straining the ability to store and
analyze large and complex data volumes. Appropriately configured
leadership-class facilities can play a transformational role in enabling
scientific discovery from these datasets. 4) A close integration of HPC
simulation and data analysis will aid greatly in interpreting results from HEP
experiments. Such an integration will minimize data movement and facilitate
interdependent workflows. 5) Long-range planning between HEP and ASCR will be
required to meet HEP's research needs. To best use ASCR HPC resources the
experimental HEP program needs a) an established long-term plan for access to
ASCR computational and data resources, b) an ability to map workflows onto HPC
resources, c) the ability for ASCR facilities to accommodate workflows run by
collaborations that can have thousands of individual members, d) to transition
codes to the next-generation HPC platforms that will be available at ASCR
facilities, e) to build up and train a workforce capable of developing and
using simulations and analysis to support HEP scientific research on
next-generation systems.Comment: 77 pages, 13 Figures; draft report, subject to further revisio
Design of multimedia processor based on metric computation
Media-processing applications, such as signal processing, 2D and 3D graphics
rendering, and image compression, are the dominant workloads in many embedded
systems today. The real-time constraints of those media applications have
taxing demands on today's processor performances with low cost, low power and
reduced design delay. To satisfy those challenges, a fast and efficient
strategy consists in upgrading a low cost general purpose processor core. This
approach is based on the personalization of a general RISC processor core
according the target multimedia application requirements. Thus, if the extra
cost is justified, the general purpose processor GPP core can be enforced with
instruction level coprocessors, coarse grain dedicated hardware, ad hoc
memories or new GPP cores. In this way the final design solution is tailored to
the application requirements. The proposed approach is based on three main
steps: the first one is the analysis of the targeted application using
efficient metrics. The second step is the selection of the appropriate
architecture template according to the first step results and recommendations.
The third step is the architecture generation. This approach is experimented
using various image and video algorithms showing its feasibility
Managing Research Data in Big Science
The project which led to this report was funded by JISC in 2010--2011 as part of its 'Managing Research Data' programme, to examine the way in which Big Science data is managed, and produce any recommendations which may be appropriate. Big science data is different: it comes in large volumes, and it is shared and exploited in ways which may differ from other disciplines. This project has explored these differences using as a case-study Gravitational Wave data generated by the LSC, and has produced recommendations intended to be useful variously to JISC, the funding council (STFC) and the LSC community. In Sect. 1 we define what we mean by 'big science', describe the overall data culture there, laying stress on how it necessarily or contingently differs from other disciplines. In Sect. 2 we discuss the benefits of a formal data-preservation strategy, and the cases for open data and for well-preserved data that follow from that. This leads to our recommendations that, in essence, funders should adopt rather light-touch prescriptions regarding data preservation planning: normal data management practice, in the areas under study, corresponds to notably good practice in most other areas, so that the only change we suggest is to make this planning more formal, which makes it more easily auditable, and more amenable to constructive criticism. In Sect. 3 we briefly discuss the LIGO data management plan, and pull together whatever information is available on the estimation of digital preservation costs. The report is informed, throughout, by the OAIS reference model for an open archive
Managing Research Data: Gravitational Waves
The project which led to this report was funded by JISC in 2010–2011 as part of its
‘Managing Research Data’ programme, to examine the way in which Big Science data
is managed, and produce any recommendations which may be appropriate.
Big science data is different: it comes in large volumes, and it is shared and
exploited in ways which may differ from other disciplines. This project has explored
these differences using as a case-study Gravitational Wave data generated by the LSC,
and has produced recommendations intended to be useful variously to JISC, the funding
council (STFC) and the LSC community.
In Sect. 1 we define what we mean by ‘big science’, describe the overall data
culture there, laying stress on how it necessarily or contingently differs from other
disciplines.
In Sect. 2 we discuss the benefits of a formal data-preservation strategy, and the
cases for open data and for well-preserved data that follow from that. This leads to our
recommendations that, in essence, funders should adopt rather light-touch prescriptions
regarding data preservation planning: normal data management practice, in the areas
under study, corresponds to notably good practice in most other areas, so that the only
change we suggest is to make this planning more formal, which makes it more easily
auditable, and more amenable to constructive criticism.
In Sect. 3 we briefly discuss the LIGO data management plan, and pull together
whatever information is available on the estimation of digital preservation costs.
The report is informed, throughout, by the OAIS reference model for an open
archive. Some of the report’s findings and conclusions were summarised in [1].
See the document history on page 37
- …