3,321 research outputs found
On the Verification of a WiMax Design Using Symbolic Simulation
In top-down multi-level design methodologies, design descriptions at higher
levels of abstraction are incrementally refined to the final realizations.
Simulation based techniques have traditionally been used to verify that such
model refinements do not change the design functionality. Unfortunately, with
computer simulations it is not possible to completely check that a design
transformation is correct in a reasonable amount of time, as the number of test
patterns required to do so increase exponentially with the number of system
state variables. In this paper, we propose a methodology for the verification
of conformance of models generated at higher levels of abstraction in the
design process to the design specifications. We model the system behavior using
sequence of recurrence equations. We then use symbolic simulation together with
equivalence checking and property checking techniques for design verification.
Using our proposed method, we have verified the equivalence of three WiMax
system models at different levels of design abstraction, and the correctness of
various system properties on those models. Our symbolic modeling and
verification experiments show that the proposed verification methodology
provides performance advantage over its numerical counterpart.Comment: In Proceedings SCSS 2012, arXiv:1307.802
Application-tailored Linear Algebra Algorithms: A search-based Approach
In this paper, we tackle the problem of automatically generating algorithms
for linear algebra operations by taking advantage of problem-specific
knowledge. In most situations, users possess much more information about the
problem at hand than what current libraries and computing environments accept;
evidence shows that if properly exploited, such information leads to
uncommon/unexpected speedups. We introduce a knowledge-aware linear algebra
compiler that allows users to input matrix equations together with properties
about the operands and the problem itself; for instance, they can specify that
the equation is part of a sequence, and how successive instances are related to
one another. The compiler exploits all this information to guide the generation
of algorithms, to limit the size of the search space, and to avoid redundant
computations. We applied the compiler to equations arising as part of
sensitivity and genome studies; the algorithms produced exhibit, respectively,
100- and 1000-fold speedups
Mapping for maximum performance on FPGA DSP blocks
The digital signal processing (DSP) blocks on modern field programmable gate arrays (FPGAs) are highly capable and support a variety of different datapath configurations. Unfortunately, inference in synthesis tools can fail to result in circuits that reach maximum DSP block throughput. We have developed a tool that maps graphs of add/sub/mult nodes to DSP blocks on Xilinx FPGAs, ensuring maximum throughput. This is done by delaying scheduling until after the graph has been partitioned onto DSP blocks and scheduled based on their pipeline structure, resulting in a throughput optimized implementation. Our tool prepares equivalent implementations in a variety of other methods, including high-level synthesis (HLS) for comparison. We show that the proposed approach offers an improvement in frequency of 100% over standard pipelined code, and 23% over Vivado HLS synthesis implementation, while retaining code portability, at the cost of a modest increase in logic resource usage
The hArtes Tool Chain
This chapter describes the different design steps needed to go from legacy code to a transformed application that can be efficiently mapped on the hArtes platform
Workshop on Verification and Theorem Proving for Continuous Systems (NetCA Workshop 2005)
Oxford, UK, 26 August 200
On quantifying fault patterns of the mesh interconnect networks
One of the key issues in the design of Multiprocessors System-on-Chip (MP-SoCs), multicomputers, and peerto- peer networks is the development of an efficient communication network to provide high throughput and low latency and its ability to survive beyond the failure of individual components. Generally, the faulty components may be coalesced into fault regions, which are classified into convex and concave shapes. In this paper, we propose a mathematical solution for counting the number of common fault patterns in a 2-D mesh interconnect network including both convex (|-shape, | |-shape, ý-shape) and concave (L-shape, Ushape, T-shape, +-shape, H-shape) regions. The results presented in this paper which have been validated through simulation experiments can play a key role when studying, particularly, the performance analysis of fault-tolerant routing algorithms and measure of a network fault-tolerance expressed as the probability of a disconnection
Minimizing DSP block usage through multi-pumping
Resource sharing in the mapping of an algorithm to an architecture allows the same resource to be scheduled for different uses in different cycles, generally at the cost of increased schedule length. Multi-pumping is a method whereby a resource is clocked at a frequency that is a multiple of the surrounding circuit, thereby offering multiple executions per global clock, and therefore sharing in the same clock cycle. This concept maps well to FPGA architectures, where hard macro blocks are typically capable of running at higher frequencies than standard logic. While this technique has been demonstrated for multipliers, modern DSP blocks are more complex with multiple computational nodes. In this paper, we apply multi-pumping to minimise DSP block usage, while taking advantage of the multiple nodes they support. The proposed approach uses, on average, 39% fewer DSP blocks, at a cost of 19% more LUTs and 7% more registers
Neuroverkon inferenssi digitaalisessa signaalikäsittelyssä kovien reaaliaikavaatimusten alaisuudessa
The main objective of this thesis is to investigate how neural network inference can be efficiently implemented on a digital signal processor under hard real-time constraints from the execution speed point of view. Theories on digital signal processors and software optimization as well as neural networks are discussed. A neural network model for the specific use case is designed and a digital signal processor implementation is created based on the neural network model.
A neural network model for the use case is created based on the data from the Matlab simulation model. The neural network model is trained and validated using the Python programming language with the Keras package. The neural network model is implemented on the CEVA-XC4500 digital signal processor. The digital signal processor implementation is written in C++ language with the processor specific vector-processing intrinsics. The neural network model is evaluated based on the model accuracy, precision, recall and f1-score. The model performance is compared to the conventional use case implementation by calculating 3GPP specified metrics of misdetection probability, false alarm rate and bit error rate. The execution speed of the digital signal processor implementation is evaluated with the CEVA integrated development environment profiling tool and also with the Lauterbach PowerTrace profiling module attached to the real base station product.
Through this thesis, an optimized CEVA-XC4500 digital signal processor implementation was created for the specific neural network architecture. The optimized implementation showed to consume 88 percent less cycles than the conventional implementation. Also, the neural network model performance fulfills the 3GPP specification requirements.Tämän diplomityön tarkoituksena on tutkia miten neuroverkon inferenssi voidaan toteuttaa tehokkaasti digitaalisella signaaliprosessorilla suoritusnopeuden näkökulmasta, kun sovelluksella on kovat reaaliaikavaatimukset. Työssä käsitellään teoriaa digitaalisista signaaliprosessoreista, ohjelmistojen optimoinnista ja neuroverkoista. Työssä kehitetään neuroverkkomalli tiettyyn käyttötapaukseen, ja mallin pohjalta luodaan toteutus digitaaliselle signaaliprosessorille.
Neuroverkkomalli luodaan Matlab-simulointimallin avulla kerätystä datasta. Neuroverkkomalli opetetaan ja varmennetaan Python-ohjelmointikiellellä ja Keras-paketilla. Neuroverkkomalli toteutetaan CEVA-XC4500 digitaaliselle signaaliprosessorille. Digitaalisen signaaliprosessorin toteutus kirjoitetaan C++-ohjelmointikielellä ja prosessorikohtaisilla vektorilaskentaoperaatioilla. Neuroverkkomalli varmennetaan mallin tarkkuuden, precision-arvon, recall-arvon ja f1-arvon perusteella. Mallin suorituskykyä verrataan käyttötapauksen tavanomaiseen toteutukseen laskemalla 3GPP-spesifikaation mukaiset mittarit virhehavaintotodennäköisyys, väärien hälytysten lukumäärä ja bittivirhemäärä. Suoritusnopeus määritetään sekä CEVA-ohjelmointiympäristön profilointityökalulla että tukiasematuotteeseen kytketyllä Lauterbach PowerTrace-yksiköllä.
Työn tuloksena luotiin optimoitu CEVA-XC4500 digitaalinen signaaliprosessoritoteutus valitulle neuroverkkoarkkitehtuurille. Optimoitu toteutus kulutti 88% vähemmän laskentasyklejä kuin tavanomainen toteutus. Neuroverkkomalli täytti 3GPP-spesifikaation mukaiset vaatimukset
- …