626,603 research outputs found
Recommended from our members
ELGDF : design language for parallel programming
ELGDF (Extended Large Grain Data Flow) is a design language that allows representation of a wide variety of parallel programs. The syntax is graphical and hierarchical to allow construction and viewing of realistically sized programs. ELGDF language facilitates describing parallel programs in a natural way for both shared memory model as well as message passing model. The syntax poses high level structures such as replicators, loops , pipes, branches, and fans, as well as constructs for shared resources. ELGDF resolves arc overloading in current graphical languages by using different symbols and different attributes for different types of arcs.
The ELGDF serves as the foundation of a parallel programming environment under development at Oregon State University. The complete syntax of ELGDF helps the program designers to deal with parallelism in the manner most natural to the problem at hand. It also helps as a way to capture parallel program designs for the purpose of analysis such as scheduling and performance estimating. Thus, the goal of ELGDF is two-fold: 1) a program design notation and computer-aided software engineering tool, and 2) a software description notation for use by automated schedulers and performance analyzers
Recommended from our members
A C* development environment for the Meiko CS-2 multicomputer
With sequential computing technology reaching its speed limits, parallel processing is emerging as the key to very-high-speed computation. However, developing a parallel program is by no means a simple task; neither is analyzing the performance of parallel programs.
C* is a high-level data-parallel language that hides explicit message passing and provides an easy-to-understand virtual view of parallel computation. C* is a portable language for which a retargetable compiler has been implemented for mesh-connected MIMD multicomputers. Together with the compiler, a C* run.time library has been implemented using Intel NXlib. This project is aimed at developing a Cstar Development Environment-CSD for the Meiko CS-2 multicomputer. It includes three tasks. One is building a Meiko specific version of the C* run-time library, another is comparing the performance of the Meiko version with the performance of the original one, and the last is developing a graphical tool for C* programming and performance analysis. This paper introduces the instruments in CSDE and discusses in some detail the major implementation issues
FASTCUDA: Open Source FPGA Accelerator & Hardware-Software Codesign Toolset for CUDA Kernels
Using FPGAs as hardware accelerators that communicate with a central CPU is becoming a common practice in the embedded design world but there is no standard methodology and toolset to facilitate this path yet. On the other hand, languages such as CUDA and OpenCL provide standard development environments for Graphical Processing Unit (GPU) programming. FASTCUDA is a platform that provides the necessary software toolset, hardware architecture, and design methodology to efficiently adapt the CUDA approach into a new FPGA design flow. With FASTCUDA, the CUDA kernels of a CUDA-based application are partitioned into two groups with minimal user intervention: those that are compiled and executed in parallel software, and those that are synthesized and implemented in hardware. A modern low power FPGA can provide the processing power (via numerous embedded micro-CPUs) and the logic capacity for both the software and hardware implementations of the CUDA kernels. This paper describes the system requirements and the architectural decisions behind the FASTCUDA approach
High Performance Computing via High Level Synthesis
As more and more powerful integrated circuits are appearing on the market, more and more applications, with very different requirements and workloads, are making use of the available computing power. This thesis is in particular devoted to High Performance Computing applications, where those trends are carried to the extreme. In this domain, the primary aspects to be taken into consideration are (1) performance (by definition) and (2) energy consumption (since operational costs dominate over procurement costs).
These requirements can be satisfied more easily by deploying heterogeneous platforms, which include CPUs, GPUs and FPGAs to provide a broad range of performance and energy-per-operation choices. In particular, as we will see, FPGAs clearly dominate both CPUs and GPUs in terms of energy, and can provide comparable performance.
An important aspect of this trend is of course design technology, because these applications were traditionally programmed in high-level languages, while FPGAs required low-level RTL design. The OpenCL (Open Computing Language) developed by the Khronos group enables developers to program CPU, GPU and recently FPGAs using functionally portable (but sadly not performance portable) source code which creates new possibilities and challenges both for research and industry.
FPGAs have been always used for mid-size designs and ASIC prototyping thanks to their energy efficient and flexible hardware architecture, but their usage requires hardware design knowledge and laborious design cycles. Several approaches are developed and deployed to address this issue and shorten the gap between software and hardware in FPGA design flow, in order to enable FPGAs to capture a larger portion of the hardware acceleration market in data centers. Moreover, FPGAs usage in data centers is growing already, regardless of and in addition to their use as computational accelerators, because they can be used as high performance, low power and secure switches inside data-centers.
High-Level Synthesis (HLS) is the methodology that enables designers to map their applications on FPGAs (and ASICs). It synthesizes parallel hardware from a model originally written C-based programming languages .e.g. C/C++, SystemC and OpenCL. Design space exploration of the variety of implementations that can be obtained from this C model is possible through wide range of optimization techniques and directives, e.g. to pipeline loops and partition memories into multiple banks, which guide RTL generation toward application dependent hardware and benefit designers from flexible parallel architecture of FPGAs.
Model Based Design (MBD) is a high-level and visual process used to generate implementations that solve mathematical problems through a varied set of IP-blocks. MBD enables developers with different expertise, e.g. control theory, embedded software development, and hardware design to share a common design framework and contribute to a shared design using the same tool. Simulink, developed by MATLAB, is a model based design tool for simulation and development of complex dynamical systems. Moreover, Simulink embedded code generators can produce verified C/C++ and HDL code from the graphical model. This code can be used to program micro-controllers and FPGAs. This PhD thesis work presents a study using automatic code generator of Simulink to target Xilinx FPGAs using both HDL and C/C++ code to demonstrate capabilities and challenges of high-level synthesis process. To do so, firstly, digital signal processing unit of a real-time radar application is developed using Simulink blocks. Secondly, generated C based model was used for high level synthesis process and finally the implementation cost of HLS is compared to traditional HDL synthesis using Xilinx tool chain.
Alternative to model based design approach, this work also presents an analysis on FPGA programming via high-level synthesis techniques for computationally intensive algorithms and demonstrates the importance of HLS by comparing performance-per-watt of GPUs(NVIDIA) and FPGAs(Xilinx) manufactured in the same node running standard OpenCL benchmarks. We conclude that generation of high quality RTL from OpenCL model requires stronger hardware background with respect to the MBD approach, however, the availability of a fast and broad design space exploration ability and portability of the OpenCL code, e.g. to CPUs and GPUs, motivates FPGA industry leaders to provide users with OpenCL software development environment which promises FPGA programming in CPU/GPU-like fashion.
Our experiments, through extensive design space exploration(DSE), suggest that FPGAs have higher performance-per-watt with respect to two high-end GPUs manufactured in the same technology(28 nm). Moreover, FPGAs with more available resources and using a more modern process (20 nm) can outperform the tested GPUs while consuming much less power at the cost of more expensive devices
Parallel runway requirement analysis study. Volume 1: The analysis
The correlation of increased flight delays with the level of aviation activity is well recognized. A main contributor to these flight delays has been the capacity of airports. Though new airport and runway construction would significantly increase airport capacity, few programs of this type are currently underway, let alone planned, because of the high cost associated with such endeavors. Therefore, it is necessary to achieve the most efficient and cost effective use of existing fixed airport resources through better planning and control of traffic flows. In fact, during the past few years the FAA has initiated such an airport capacity program designed to provide additional capacity at existing airports. Some of the improvements that that program has generated thus far have been based on new Air Traffic Control procedures, terminal automation, additional Instrument Landing Systems, improved controller display aids, and improved utilization of multiple runways/Instrument Meteorological Conditions (IMC) approach procedures. A useful element to understanding potential operational capacity enhancements at high demand airports has been the development and use of an analysis tool called The PLAND_BLUNDER (PLB) Simulation Model. The objective for building this simulation was to develop a parametric model that could be used for analysis in determining the minimum safety level of parallel runway operations for various parameters representing the airplane, navigation, surveillance, and ATC system performance. This simulation is useful as: a quick and economical evaluation of existing environments that are experiencing IMC delays, an efficient way to study and validate proposed procedure modifications, an aid in evaluating requirements for new airports or new runways in old airports, a simple, parametric investigation of a wide range of issues and approaches, an ability to tradeoff air and ground technology and procedures contributions, and a way of considering probable blunder mechanisms and range of blunder scenarios. This study describes the steps of building the simulation and considers the input parameters, assumptions and limitations, and available outputs. Validation results and sensitivity analysis are addressed as well as outlining some IMC and Visual Meteorological Conditions (VMC) approaches to parallel runways. Also, present and future applicable technologies (e.g., Digital Autoland Systems, Traffic Collision and Avoidance System II, Enhanced Situational Awareness System, Global Positioning Systems for Landing, etc.) are assessed and recommendations made
- …