Search CORE

21 research outputs found

The Impact of Single Event Effect Reliability of Convolution Neural Network Architectures and Hardening Approaches Implemented on SRAM FPGA

Author: Wang YiXiu
Publication venue: 'University of Saskatchewan Library'
Publication date: 06/08/2021
Field of study

Convolution neural networks (CNNs) have powerful data processing and learning capabilities, which have been widely applied to image processing related applications, especially in autonomous driving, medical image classification, space exploration and military applications. Due to the low power consumption, high flexibility, and parallel characteristics of modern field-programmable gate arrays (FPGAs), they are frequently used in CNN implementation as a hardware acceleration platform. Two architectures are mainly used to implement CNNs on FPGAs: the streaming architecture and single computation engines (SCEs) architecture. In the streaming architecture of a CNN, each layer is implemented with one distinct hardware block and each block can be optimized separately. On the other hand, the single computation engine architecture uses a systolic array of processing elements or a matrix multiplication unit as a computation engine to execute the CNN layers sequentially. The control of the hardware and the scheduling of operations is performed by a control unit and associated software. The advantage of this design paradigm is that it consists of a fixed architectural template that can be scaled based on the input of CNNs and the available FPGA resources. Therefore, it is suitable to implement modern complex CNNs that may not fit into the streaming architecture. SRAM-based FPGAs are sensitive to radiation effects, which can generate single event effects (SEEs) in the system. Designs are required to reduce the radiation effects in FPGA-based CNNs for many applications. Previous radiation effects studies mainly focused on streaming architecture and explored triple-modular redundancy (TMR) or selective hardening techniques. As far as the authors know, there are very few radiation effects studies on the CNNs implemented with SCEs architecture on FPGAs and no radiation effects evaluation between the two architectures with proton irradiation. In this thesis, we implement a Modified National Institute of Standards and Technology (MNIST) CNN with two mainstream architectures, both streaming architecture and SCEs architecture, on a Xilinx Zynq UltraScale+ multiprocessor system on a chip (MPSoC) ZCU-102 evaluation kit. Then we evaluate their error, hang, and total failure rate with proton irradiation test at Tri-University Meson Facility (TRIUMF). The cross-section results for different architectures showed that the SCEs design has higher error cross-sections and total failure cross-sections than that of the streaming architecture, even though SCEs architecture uses much fewer hardware resources in FPGA. In addition, two resilience techniques for SCEs architecture named spatial TMR and temporal TMR are designed and adopted for the SCEs architecture with the same hardware structure and utilization by reusing process elements (PEs) or using multiple PEs to carry out each calculation. As a result, the cross-sections of the spatial TMR and temporal TMR SCEs architecture designs are reduced by 34.9% and 59.2%, with an execution time overhead of 14.2% and 21.4% compared with non-harden one, respectively. Thus, the study shows that SCEs architecture for FPGA acceleration has excellent potential for applications in a radiation environment with minimal overhead due to its scalability and flexibility, and spatial TMR and temporal TMR could effectively reduce the error rate and total failure rate with no extra hardware resources. This suggests that spatial TMR and temporal TMR propose in my project seems to be generic for SCEs architecture, and it could be a better redundancy choice for complex CNNs implement with not enough hardware resources

University of Saskatchewan Research Archive

Optimal program variant generation for hybrid manycore systems

Author: Urlea Cristian
Publication venue
Publication date: 01/01/2021
Field of study

Field Programmable Gate Arrays promise to deliver superior energy efficiency in heterogeneous high performance computing, as compared to multicore CPUs and GPUs. The rate of adoption is however hampered by the relative difficulty of programming FPGAs. High-level synthesis tools such as Xilinx Vivado, Altera OpenCL or Intel's HLS address a large part of the programmability issue by synthesizing a Hardware Description Languages representation from a high-level specification of the application, given in programming languages such as OpenCL C, typically used to program CPUs and GPUs. Although HLS solutions make programming easier, they fail to also lighten the burden of optimization. Application developers must rely on expert knowledge to manually optimize their applications for each target device, meaning that traditional HLS solutions do not offer a solution to the issue of performance portability. This state of fact prompted the development of compiler frameworks such as TyTra that operate at an even higher level of abstraction that is amenable to the use of Design Space Exploration (DSE). With DSE the initial program specification can be seen as the starting location in a search-space of correct-by-construction program transformations. In TyTra the search-space is generated from the transitive-closure of term-level transformations derived from type-level transformations. Compiler frameworks such as TyTra theoretically solve the issue of performance portability by providing a way to automatically generate alternative correct program variants. They however suffer from the very practical issue that the generated space is often too large to fully explore. As a consequence, the globally optimal solution may be overlooked. In this work we provide a novel solution to issue performance portability by deriving an efficient yet effective DSE strategy for the TyTra compiler framework. We make use of categorical data types to derive categorical semantics for the formal languages that describe the terms, types, cost-performance estimates and their transformations. From these we define a category of interpretations for TyTra applications, from which we derive a DSE strategy that finds the globally optimal transformation sequence in polynomial time. This is achieved by reducing the size of the generated search space. We formally state and prove a theorem for this claim and then show that the polynomial run-time for our DSE strategy has practically negligible coefficients leading to sub-second exploration times for realistic applications

Glasgow Theses Service

Aeronautical Engineering: a Continuing Bibliography with Indexes (Supplement 243)

Author
Publication venue
Publication date
Field of study

This bibliography lists 423 reports, articles, and other documents introduced into the NASA scientific and technical information system in August 1989. Subject coverage includes: design, construction and testing of aircraft and aircraft engines; aircraft components, equipment and systems; ground support systems; and theoretical and applied aspects of aerodynamics and general fluid dynamics

NASA Technical Reports Server

Control system response for seed placement accuracy on row crop planters

Author: Badua Sylvester Alfredo
Publication venue
Publication date
Field of study

Doctor of PhilosophyDepartment of Biological & Agricultural EngineeringAjay ShardaPlanting is one of the most critical field operations that can highly influence early season vigor, final plant density and ultimately potential crop yield. It is the opportunity to place seeds at a uniform depth and spacing providing them the ideal environment for proper growth and development. However, inherent field spatial variability could influence seed placement and requires proper implementation of planter settings to prevent shallow seeding depth, sidewall compaction and uneven spacing. The overall goal of this research is to evaluate the response of the planter and crop to downforce control system implementation across a wide range of machine and field operating conditions. Planting operations were performed in corn production fields using a Horsch row-crop planter with 12 row units equipped with a hydraulic downforce system capable of implementing fixed and active downforce settings. A custom-made data acquisition system was developed to record sensor data at 10 Hz sampling frequency. From this study, the following conclusions were drawn. First, soil texture and soil compaction due to tractor tires influenced real-time gauge wheel load (GWL). Implementing a fixed downforce setting with target GWL set at 35 kg showed that 25% of the total planting time GWL was less than 0 suggesting areas planted with uncertain seeding depth due to potential loss of ground contact of the gauge wheels. Likewise, fewer row units per section could provide lower variability in GWL indicating the need for an automatic section control to maintain target GWL within an acceptable range for all row units. Second, implementing an active downforce setting showed no significant difference between downforce A (63 kg) and downforce B (100 kg) on plant spacing, although downforce setting B resulted to higher plant spacing accuracy. Higher variability in spacing was observed when ground speed is over 12 kph. To achieve desired seeding depth, downforce greater than 100 kg is needed when ground speed is over 7.2 kph on no-till field and when ground speed is over 12 kph on strip-tilled field. Third, response of row units segregated in sections revealed that row unit acceleration on wing, track and non-track sections increases with speed. Strip-tilled soil exhibited lower row unit acceleration by 18% compared to no-till soil. Finally, a proof-of-concept sensing and measurement (SAM) system was developed to calculate seed spacing, depth and geo-location of corn. This system could provide real-time feedback on seed spacing and depth allowing appropriate downforce control system management for more consistent seed placement during planting. In summary, advances in planter technology paved the way for the addition of more row units across on the planter to increase planting productivity. With increasing width of planter toolbar, each row unit may need different downforce control to varying field and machine operating conditions. Appropriate downforce control management should be implemented to compensate for increased dynamics of planter row units across a highly variable field conditions to achieve the desired seed placement accuracy

K-State Research Exchange

Active Materials

Author
Publication venue: 'Walter de Gruyter GmbH'
Publication date: 11/01/2022
Field of study

What is an active material? This book aims to redefine perceptions of the materials that respond to their environment. Through the theory of the structure and functionality of materials found in nature a scientific approach to active materials is first identified. Further interviews with experts from the natural sciences and humanities then seeks to question and redefine this view of materials to create a new definition of active materials

Directory of Open Access Books (DOAB)

Active Materials

Author
Publication venue: 'Walter de Gruyter GmbH'
Publication date
Field of study

OAPEN Library

Project and development of hardware accelerators for fast computing in multimedia processing

Author: Cappetta Carmine
Publication venue: Universita degli studi di Salerno
Publication date: 21/02/2019
Field of study

2017 - 2018The main aim of the present research work is to project and develop very large scale electronic integrated circuits, with particular attention to the ones devoted to image processing applications and the related topics. In particular, the candidate has mainly investigated four topics, detailed in the following. First, the candidate has developed a novel multiplier circuit capable of obtaining floating point (FP32) results, given as inputs an integer value from a fixed integer range and a set of fixed point (FI) values. The result has been accomplished exploiting a series of theorems and results on a number theory problem, known as Bachet’s problem, which allows the development of a new Distributed Arithmetic (DA) based on 3’s partitions. This kind of application results very fit for filtering applications working on an integer fixed input range, such in image processing applications, in which the pixels are coded on 8 bits per channel. In fact, in these applications the main problem is related to the high area and power consumption due to the presence of many Multiply and Accumulate (MAC) units, also compromising real-time requirements due to the complexity of FP32 operations. For these reasons, FI implementations are usually preferred, at the cost of lower accuracies. The results for the single multiplier and for a filter of dimensions 3x3 show respectively delay of 2.456 ns and 4.7 ns on FPGA platform and 2.18 ns and 4.426 ns on 90nm std_cell TSMC 90 nm implementation. Comparisons with state-of-the-art FP32 multipliers show a speed increase of up to 94.7% and an area reduction of 69.3% on FPGA platform. ... [edited by Author]XXXI cicl

EleA@UniSA - Università degli Studi di Salerno

Characterization, modeling and simulation of 4H-SiC power diodes

Author: Freda Albanese Loredana
Publication venue: Universita degli studi di Salerno
Publication date: 18/05/2011
Field of study

2009 - 2010Exploring the attractive electrical properties of the Silicon Carbide (SiC) for power devices, the characterization and the analysis of 4H-SiC pin diodes is the main topic of this Ph.D. document. In particular, the thesis concerns the development of an auto consistent, analytical, physics based model, created for accurately replicating the power diodes behavior, including both on-state and transient conditions. At the present, the fabrication of SiC devices with the given performances is not completely obvious because of the lack of knowledge still existing in the physical properties of the material, especially of those related to carrier transport and of their dependences on process parameters. Among these, one can cite the degree of doping activation, the carrier lifetime into epitaxial layers that will be employed and the sensitivity of some physical parameters to temperature changes. Therefore, a set of investigative tools, designed especially for SiC devices, cannot be regarded as secondary objective. It will be useful both for process monitoring, becoming essential to the tuning of technological processes used for the implementation of the final devices, and for a proper diagnostics of the realized devices. Following this need, in our research activity firstly a predictive, static analytical model, including temperature dependence, is developed. It is able to explain the carrier transport in diffused regions as function of the injection level and turns also useful for better understanding the influence of physical parameters, which depend in a significant way from the processed material, on device performances. The model solves the continuity equation in double carrier conditions, taking into account the effects due to varying doping profile of the junction, the spatial dependence of physical parameters on both doping and injection level and the modification of the electric field of the region with the injection regime. The model includes also the device characterization at high temperatures to analyze the influence of thermal issues on the overall behavior up to temperature of 250°C. The accuracy of the static model has been extensively demonstrated by numerous comparisons with numerical results obtained by the SILVACO commercial simulator. Secondly, with the aim to properly account for the dynamic electrical behavior of a diode with generic structure, the static model has been incorporated in a more general, self-consistent model, allowing the analysis of the device behavior when it is switched from an arbitrary forward-bias condition. In particular, the attention is focused on an abrupt variation of diode voltage due to an instantaneous interruption of the conduction current: although this situation is notably interesting for the study of the switching behavior of diodes, the voltage transitory is also traditionally used in different techniques of investigation to extract more information about the mean carrier lifetime. This occurs, for example, in the conventional Open Circuit Voltage Decay (OCVD) technique, where the voltage decay due to the current interruption is useful for an indirect measure of minority carrier lifetime in the epitaxial layer. Because of its heavy dependence on processes, the carrier lifetime is an important parameter to be monitored, especially in the case of bipolar devices, and it cannot be neglected. Due to the existent uncertainty about this parameter in SiC epi-layers, the OCVD method reveals itself a practical way to overcoming this limit. In detail, by using our self-consistent model, that exploits an improved method of the traditional OCVD technique, it is possible to characterize the carrier lifetime into 4H-SiC epitaxial layer of a generic diode under test, obtaining the spatial distributions of the minority carrier concentration and carrier lifetime at any injection regime. The overall model performances are compared to both device simulations and experimental results performed on Si and 4H-SiC rectifier structures with various physical and electrical characteristics. From the comparisons, the model results to have good predictive capabilities for describing the spatial–temporal variation of carriers and currents along the whole epi-layer, proving contextually the validity of the used approximations and allowing also to resolve some ambiguities reported in the literature, such as the stated inapplicability of the OCVD method on thick epitaxial layers, the reasons of the observed non linear decay of the voltage with time, and the effects of junction properties on voltage transient. Finally, with the imposition of right boundary conditions, it is possible to use the versatility of the developed model for extending the analysis and obtaining a physical insight of any arbitrary switching condition of 4H-SiC power diodes. [edited by author]IX n.s

EleA@UniSA - Università degli Studi di Salerno

Studies of inspection algorithms and associated microprogrammable hardware implementations

Author: Edmonds John Mark
Publication venue
Publication date: 01/01/1988
Field of study

This work is concerned with the design and development of real-time algorithms for industrial inspection applications. Rather than implement algorithms in dedicated hardware, microprogrammable machines were considered essential in order to maintain flexibility. After a survey of image pattern recognition where algorithms applicable to real-time use are cited, this thesis presents industrial inspection algorithms that locate and scrutinise actual manufactured products. These are fast and robust - a necessary requirement in industrial environments. The National Physical Laboratory have developed a Linear Array Processor (LAP) specifically designed for industrial recognition work. As with most array processors, the LAP has a greater performance than conventional processors, yet is strictly limited to parallel algorithms for optimum performance. It was therefore necessary to incorporate sequentialism into the design of a multiprocessor system. A microcoded bit-slice Sequential Image Processor (SIP) has been designed and built at RHBNC in conjunction with the NPL. This was primarily intended as a post-processor for the LAP based on the VMEbus but in fact has proved its usefulness as a stand-alone processor. This is described along with an assembler written for SIP which translates assembly language mnemonics to microcode. This work, which includes a review of current architectures, leads to the specification of a hybrid (SIMD/NIMD) architecture consisting of multiple autonomous sequential processors. This involves an analysis of various configurations and entails an investigation of the source of bottlenecks within each design. Such systems require a significant amount of interprocessor communication: methods for achieving this are discussed, some of which have only become practical with the decrease incost of electronic components. This eventually leads to a system for which algorithm execution speed increases approximately linearly with the number of processors. The algorithms described in earlier chapters are examined on the system and the practicalities of such a design are analysed in detail. Overall, this thesis has arrived at designs of programmable real-time inspection systems, and has obtained guidelines which will help with the implementation of future inspection systems.<p

Royal Holloway Research Online