1,030 research outputs found
Non-power-of-Two FFTs: Exploring the Flexibility of the Montium TP
Coarse-grain reconfigurable architectures, like the Montium TP, have proven to be a very successful approach for low-power and high-performance computation of regular digital signal processing algorithms. This paper presents the implementation of a class of non-power-of-two FFTs to discover the limitations and Flexibility of the Montium TP for less regular algorithms. A non-power-of-two FFT is less regular compared to a traditional power-of-two FFT. The results of the implementation show the processing time, accuracy, energy consumption and Flexibility of the implementation
From FPGA to ASIC: A RISC-V processor experience
This work document a correct design flow using these tools in the Lagarto RISC- V Processor and the RTL design considerations that must be taken into account, to move from a design for FPGA to design for ASIC
A Scalable Correlator Architecture Based on Modular FPGA Hardware, Reuseable Gateware, and Data Packetization
A new generation of radio telescopes is achieving unprecedented levels of
sensitivity and resolution, as well as increased agility and field-of-view, by
employing high-performance digital signal processing hardware to phase and
correlate large numbers of antennas. The computational demands of these imaging
systems scale in proportion to BMN^2, where B is the signal bandwidth, M is the
number of independent beams, and N is the number of antennas. The
specifications of many new arrays lead to demands in excess of tens of PetaOps
per second.
To meet this challenge, we have developed a general purpose correlator
architecture using standard 10-Gbit Ethernet switches to pass data between
flexible hardware modules containing Field Programmable Gate Array (FPGA)
chips. These chips are programmed using open-source signal processing libraries
we have developed to be flexible, scalable, and chip-independent. This work
reduces the time and cost of implementing a wide range of signal processing
systems, with correlators foremost among them,and facilitates upgrading to new
generations of processing technology. We present several correlator
deployments, including a 16-antenna, 200-MHz bandwidth, 4-bit, full Stokes
parameter application deployed on the Precision Array for Probing the Epoch of
Reionization.Comment: Accepted to Publications of the Astronomy Society of the Pacific. 31
pages. v2: corrected typo, v3: corrected Fig. 1
Using Physical Compilation to Implement a System on Chip Platform
The goal of this thesis was to setup a complete design flow involving physical synthesis. The design chosen for this purpose was a system-on-chip (SoC) platform developed at the University of Tennessee. It involves a Leon Processor with a minimal cache configuration, an AMBA on-chip bus and an Advanced Encryption Standard module which performs decryption.
As transistor size has entered the deep submicron level, iterations involved in the design cycle have increased due to the domination of interconnect delays over cell delays. Traditionally, interconnect delay has been estimated through the use of wire-load models. However, since there is no physical placement information, the delay estimation may be ineffective and result in increased iterations. Hence, placement-based synthesis has recently been introduced to provide better interconnect delay estimation. The tool used in this thesis to implement the system-on-chip design using physical synthesis is Synopsys Physical Compiler. The flow has been setup through the use of the Galaxy Reference Flow scripts obtained from Synopsys.
As part of the thesis, an analysis of the differences between a physically synthesized design and a logically synthesized one in terms of area and delay is presented
Studies on Core-Based Testing of System-on-Chips Using Functional Bus and Network-on-Chip Interconnects
The tests of a complex system such as a microprocessor-based system-onchip
(SoC) or a network-on-chip (NoC) are difficult and expensive. In this thesis,
we propose three core-based test methods that reuse the existing functional
interconnects-a flat bus, hierarchical buses of multiprocessor SoC's (MPSoC),
and a N oC-in order to avoid the silicon area cost of a dedicated test access mechanism
(TAM). However, the use of functional interconnects as functional TAM's
introduces several new problems.
During tests, the interconnects-including the bus arbitrator, the bus bridges,
and the NoC routers-operate in the functional mode to transport the test stimuli
and responses, while the core under tests (CUT) operate in the test mode. Second,
the test data is transported to the CUT through the functional bus, and not
directly to the test port. Therefore, special core test wrappers that can provide
the necessary control signals required by the different functional interconnect are
proposed. We developed two types of wrappers, one buffer-based wrapper for the
bus-based systems and another pair of complementary wrappers for the NoCbased
systems.
Using the core test wrappers, we propose test scheduling schemes for the three
functionally different types of interconnects. The test scheduling scheme for a flat
bus is developed based on an efficient packet scheduling scheme that minimizes
both the buffer sizes and the test time under a power constraint. The schedulingscheme is then extended to take advantage of the hierarchical bus architecture of
the MPSoC systems. The third test scheduling scheme based on the bandwidth
sharing is developed specifically for the NoC-based systems. The test scheduling
is performed under the objective of co-optimizing the wrapper area cost and the
resulting test application time using the two complementary NoC wrappers.
For each of the proposed methodology for the three types of SoC architec ..
ture, we conducted a thorough experimental evaluation in order to verify their
effectiveness compared to other methods
Case Study: First-Time Success ASIC Design Methodology Applied to a Multi-Processor System-on-Chip
Achieving first-time success is crucial in the ASIC design league considering the soaring cost, tight time-to-market window, and competitive business environment. One key factor in ensuring first-time success is a well-defined ASIC design methodology. Here we propose a novel ASIC design methodology that has been proven for the RUMPS401 (Rahman University Multi-Processor System 401) Multiprocessor System-on-Chip (MPSoC) project. The MPSoC project is initiated by Universiti Tunku Abdul Rahman (UTAR) VLSI design center. The proposed methodology includes the use of Universal Verification Methodology (UVM). The use of electronic design automation (EDA) software during each step of the design methodology is also presented. The first-time success RUMPS401 demonstrates the use of the proposed ASIC design methodology and the good of using one. Especially this project is carried on in educational environment that is even more limited in budget, resources and know-how, compared to the business and industrial counterparts. Here a novel ASIC design methodology that is tailored to first-time success MPSoC is presented
Toward Fault-Tolerant Applications on Reconfigurable Systems-on-Chip
L'abstract è presente nell'allegato / the abstract is in the attachmen
An efficient MPI/OpenMP parallelization of the Hartree-Fock method for the second generation of Intel Xeon Phi processor
Modern OpenMP threading techniques are used to convert the MPI-only
Hartree-Fock code in the GAMESS program to a hybrid MPI/OpenMP algorithm. Two
separate implementations that differ by the sharing or replication of key data
structures among threads are considered, density and Fock matrices. All
implementations are benchmarked on a super-computer of 3,000 Intel Xeon Phi
processors. With 64 cores per processor, scaling numbers are reported on up to
192,000 cores. The hybrid MPI/OpenMP implementation reduces the memory
footprint by approximately 200 times compared to the legacy code. The
MPI/OpenMP code was shown to run up to six times faster than the original for a
range of molecular system sizes.Comment: SC17 conference paper, 12 pages, 7 figure
- …