475 research outputs found

    Design of the goat/sheep holding cage slaughtering system (cage for animal slaughter): innovations and prospect

    Get PDF
    The main objective of inventing the goat/sheep holding cage-slaughtering mechanism or cage for animal slaughter was to seek solutions for the slaughtering mechanism from the traditional operation with four to five persons manning it to a one-person operation. The development of this innovation is for Chak Chee Bor Enterprise. This mechanism consists of a goat/sheep holding cage of 1.23m (height) X 1.60m (length) X 0.97m (width). The overall purpose of using this goat/sheep holding cage is to keep the goat/sheep calm, whilst minimizing the danger of unnecessary injury to both the animal and worker. The goat/sheep holding cage-slaughtering mechanism consists of a head latch (neck yoke or head gate) to hold the animal‘s neck and head, and two wooden boards to hold or gently clamp the body of the animal, with the purpose to calm the animal and ensure that it does not move. The round-shaped iron pieces at the end of both sides of the holding cage enable the mechanism to be swung aside or tilted at a 45o angle before the final stage of the ritual. This holding cage-slaughtering mechanism that comes with an adjustable head latch is able to accommodate different sizes of animals

    A model-based design flow for embedded vision applications on heterogeneous architectures

    Get PDF
    The ability to gather information from images is straightforward to human, and one of the principal input to understand external world. Computer vision (CV) is the process to extract such knowledge from the visual domain in an algorithmic fashion. The requested computational power to process these information is very high. Until recently, the only feasible way to meet non-functional requirements like performance was to develop custom hardware, which is costly, time-consuming and can not be reused in a general purpose. The recent introduction of low-power and low-cost heterogeneous embedded boards, in which CPUs are combine with heterogeneous accelerators like GPUs, DSPs and FPGAs, can combine the hardware efficiency needed for non-functional requirements with the flexibility of software development. Embedded vision is the term used to identify the application of the aforementioned CV algorithms applied in the embedded field, which usually requires to satisfy, other than functional requirements, also non-functional requirements such as real-time performance, power, and energy efficiency. Rapid prototyping, early algorithm parametrization, testing, and validation of complex embedded video applications for such heterogeneous architectures is a very challenging task. This thesis presents a comprehensive framework that: 1) Is based on a model-based paradigm. Differently from the standard approaches at the state of the art that require designers to manually model the algorithm in any programming language, the proposed approach allows for a rapid prototyping, algorithm validation and parametrization in a model-based design environment (i.e., Matlab/Simulink). The framework relies on a multi-level design and verification flow by which the high-level model is then semi-automatically refined towards the final automatic synthesis into the target hardware device. 2) Relies on a polyglot parallel programming model. The proposed model combines different programming languages and environments such as C/C++, OpenMP, PThreads, OpenVX, OpenCV, and CUDA to best exploit different levels of parallelism while guaranteeing a semi-automatic customization. 3) Optimizes the application performance and energy efficiency through a novel algorithm for the mapping and scheduling of the application 3 tasks on the heterogeneous computing elements of the device. Such an algorithm, called exclusive earliest finish time (XEFT), takes into consideration the possible multiple implementation of tasks for different computing elements (e.g., a task primitive for CPU and an equivalent parallel implementation for GPU). It introduces and takes advantage of the notion of exclusive overlap between primitives to improve the load balancing. This thesis is the result of three years of research activity, during which all the incremental steps made to compose the framework have been tested on real case studie

    Advances in Architectures and Tools for FPGAs and their Impact on the Design of Complex Systems for Particle Physics

    Get PDF
    The continual improvement of semiconductor technology has provided rapid advancements in device frequency and density. Designers of electronics systems for high-energy physics (HEP) have benefited from these advancements, transitioning many designs from fixed-function ASICs to more flexible FPGA-based platforms. Today’s FPGA devices provide a significantly higher amount of resources than those available during the initial Large Hadron Collider design phase. To take advantage of the capabilities of future FPGAs in the next generation of HEP experiments, designers must not only anticipate further improvements in FPGA hardware, but must also adopt design tools and methodologies that can scale along with that hardware. In this paper, we outline the major trends in FPGA hardware, describe the design challenges these trends will present to developers of HEP electronics, and discuss a range of techniques that can be adopted to overcome these challenges

    Rapid prototyping from algorithm to FPGA prototype

    Get PDF
    Abstract. Wireless data usage continuously increases in today’s world setting higher requirements for wireless networks. Ever increasing requirements result in more complex hardware (HW) implementation, especially telecommunication System-on-Chips (SoC) performance is playing a key-role in this development. Complexity increases design workload, therefore, it makes design flow times longer. High-Level Synthesis (HLS) tools have been designed to automate and accelerate design by moving manual work on a higher level. This Master’s Thesis studies MathWorks HLS workflow usage for rapid prototyping of Wireless Communication SoC Intellectual Property (IP). This thesis introduces design and FPGA prototyping flow of Application-Specific Integrated Circuit (ASIC). It presents good design practices targeted for HLS. It also studies MathWorks Hardware Description Language (HDL) generation flow with HDL Coder, possible problems during the flow and solutions to overcome the problems. The HLS flow is examined with an example design that scales and limits the power of IQ-data. This work verifies the design in a Field-Programmable Gate Array (FPGA) environment. It concentrates on evaluating the usage and benefits of MathWorks HLS workflow targeted for rapid prototyping of SoCs. The Example IP is a Simulink model containing MATLAB algorithms and System Objects. The design is optimized on algorithm level and synthesized into VHDL. The generated Register-Transfer Level (RTL) is verified in co-simulation against the algorithm model. Optimization and verification methods are evaluated. The HDL model is further processed through logic-synthesis using the 3rd party synthesis tool run automatically with a script created by MathWorks workflow. The generated design is tested on FPGA with FPGA-in-the-loop simulation configuration. FPGA prototyping flow benefits for rapid prototyping are evaluated. Coding styles to generate synthesizable HDL code and simulation methods to improve simulation speed of hardware-like algorithm were discussed. MathWorks HLS workflow was evaluated for rapid prototype purposes from algorithm to FPGA. Optimization methods and capability for production quality RTL for ASIC target were also discussed. MathWorks’ tool flow provided promising results for rapid prototyping. It generated human-readable HDL that was successfully synthesized on FPGA. The FPGA model was simulated in FPGA-in-the-loop configuration successfully. It also provided good area and speed results for the ASIC target when the algorithm was written strictly from the hardware perspective. The process was found to be distinct and efficient.Nopea prototypointi algoritmista FPGA-prototyypiksi. Tiivistelmä. Langattoman datan käyttö kasvaa jatkuvasti nykymaailmassa ja asettaa korkeammat vaatimukset langattomille verkoille. Kasvavat vaatimukset tekevät laitteistototeutuksesta kompleksisempaa, erityisesti tietoliikenteessä käytettävien järjestelmäpiirien (SoC) tehokkuus on avainasemassa. Tämä kasvattaa suunnittelun työmäärää ja näin ollen suunnitteluvuohon kuluva aika pidentyy. Korkean tason synteesi (HLS) on kehitetty automatisoimaan ja nopeuttamaan digitaalisuunnittelua siirtämällä manuaalista työtä korkeammalle tasolle. Tämä diplomityö tutkii MathWorks:n HLS-vuon käyttöä langattomaan viestintään suunniteltavien SoC:ien tekijänoikeudenalaisten standardoitujen lohkojen (IP) nopeaan prototypointiin. Työ esittelee perinteisen asiakaspiirin (ASIC) suunnitteluvuon, FPGA-prototypointivuon ja suunnitteluperiaatteet HLS:ää varten. Työssä käydään läpi MathWorks:n laitteistokuvauskielen (HDL) generointivuo HDL Coder:lla, mahdollisia ongelmakohtia vuossa ja ratkaisuja ongelmiin. HLS-vuota tutkitaan esimerkkimallin avulla, joka skaalaa ja rajoittaa IQ-datan tehoa. Esimerkkimallin toiminta tarkistetaan ohjelmoitavan logiikkapiirin (FPGA) kanssa. Työ keskittyy arvioimaan MathWorks:n HLS-vuon käyttöä ja hyötyä nopeaan prototypointiin SoC:ien kehityksessä. Esimerkkinä käytetään Simulink-mallia, joka sisältää MATLAB-funktioita ja System Object-olioita. Algoritmitasolla optimoitu malli syntesoidaan VHDL:ksi ja rekisterinsiirtotason (RTL) mallin toiminta tarkistetaan yhteissimulaatiolla alkuperäistä algoritmimallia vasten. Optimointi- ja verifiointimenetelmien toimivuutta ja tehokkuutta arvioidaan. Generoitu HDL-malli syntesoidaan kolmannen osapuolen logiikkasynteesi-työkalulla, joka käynnistetään MathWorks:n työkaluvuon generoimalla komentosarjalla. Luotu malli ohjelmoidaan FPGA:lle ja sen toiminta tarkistetaan FPGA-simulaatiolla. Syntesoituvan HDL-koodin generointiin vaadittavia koodaustyylejä ja algoritmimallin simulointinopeutta parantavia menetelmiä tutkittiin. MathWorks:n HLS-vuon soveltuvuutta nopeaan prototypointiin algoritmista FPGA-prototyypiksi pohdittiin. Lisäksi optimointimenetelmiä ja vuon soveltuvuutta tuotantolaatuisen RTL:n generoimiseen arvioitiin. MathWorks:n työkaluvuo osoitti lupaavia tuloksia nopean prototypoinnin näkökulmasta. Se loi luettavaa HDL-koodia, joka syntesoitui FPGA:lle. Malli ajettiin onnistuneesti FPGA:lla. Vuon avulla saavutettiin hyviä tuloksia pinta-alan ja nopeuden suhteen, kun malli optimoitiin asiakaspiirille. Tämä vaati mallin kuvaamista tarkasti laitteiston näkökulmasta. Prosessi oli kokonaisuudessaan selkeä ja tehokas

    Integrating Simulink, OpenVX, and ROS for Model-Based Design of Embedded Vision Applications

    Get PDF
    OpenVX is increasingly gaining consensus as standard platform to develop portable, optimized and power-efficient embedded vision applications. Nevertheless, adopting OpenVX for rapid prototyping, early algorithm parametrization and validation of complex embedded applications is a very challenging task. This paper presents a comprehensive framework that integrates Simulink, OpenVX, and ROS for model-based design of embedded vision applications. The framework allows applying Matlab-Simulink for the model-based design, parametrization, and validation of computer vision applications. Then, it allows for the automatic synthesis of the application model into an OpenVX description for the hardware and constraints-aware application tuning. Finally, the methodology allows integrating the OpenVX application with Robot Operating System (ROS), which is the de-facto reference standard for developing robotic software applications. The OpenVX-ROS interface allows co-simulating and parametrizing the application by considering the actual robotic environment and the application reuse in any ROS-compliant system. Experimental results have been conducted with two real case studies: An application for digital image stabilization and the ORB descriptor for simultaneous localization and mapping (SLAM), which have been developed through Simulink and, then, automatically synthesized into OpenVX-VisionWorks code for an NVIDIA Jetson TX2 boar

    Efficient Neural Network Implementations on Parallel Embedded Platforms Applied to Real-Time Torque-Vectoring Optimization Using Predictions for Multi-Motor Electric Vehicles

    Get PDF
    The combination of machine learning and heterogeneous embedded platforms enables new potential for developing sophisticated control concepts which are applicable to the field of vehicle dynamics and ADAS. This interdisciplinary work provides enabler solutions -ultimately implementing fast predictions using neural networks (NNs) on field programmable gate arrays (FPGAs) and graphical processing units (GPUs)- while applying them to a challenging application: Torque Vectoring on a multi-electric-motor vehicle for enhanced vehicle dynamics. The foundation motivating this work is provided by discussing multiple domains of the technological context as well as the constraints related to the automotive field, which contrast with the attractiveness of exploiting the capabilities of new embedded platforms to apply advanced control algorithms for complex control problems. In this particular case we target enhanced vehicle dynamics on a multi-motor electric vehicle benefiting from the greater degrees of freedom and controllability offered by such powertrains. Considering the constraints of the application and the implications of the selected multivariable optimization challenge, we propose a NN to provide batch predictions for real-time optimization. This leads to the major contribution of this work: efficient NN implementations on two intrinsically parallel embedded platforms, a GPU and a FPGA, following an analysis of theoretical and practical implications of their different operating paradigms, in order to efficiently harness their computing potential while gaining insight into their peculiarities. The achieved results exceed the expectations and additionally provide a representative illustration of the strengths and weaknesses of each kind of platform. Consequently, having shown the applicability of the proposed solutions, this work contributes valuable enablers also for further developments following similar fundamental principles.Some of the results presented in this work are related to activities within the 3Ccar project, which has received funding from ECSEL Joint Undertaking under grant agreement No. 662192. This Joint Undertaking received support from the European Union’s Horizon 2020 research and innovation programme and Germany, Austria, Czech Republic, Romania, Belgium, United Kingdom, France, Netherlands, Latvia, Finland, Spain, Italy, Lithuania. This work was also partly supported by the project ENABLES3, which received funding from ECSEL Joint Undertaking under grant agreement No. 692455-2

    High Performance Computing via High Level Synthesis

    Get PDF
    As more and more powerful integrated circuits are appearing on the market, more and more applications, with very different requirements and workloads, are making use of the available computing power. This thesis is in particular devoted to High Performance Computing applications, where those trends are carried to the extreme. In this domain, the primary aspects to be taken into consideration are (1) performance (by definition) and (2) energy consumption (since operational costs dominate over procurement costs). These requirements can be satisfied more easily by deploying heterogeneous platforms, which include CPUs, GPUs and FPGAs to provide a broad range of performance and energy-per-operation choices. In particular, as we will see, FPGAs clearly dominate both CPUs and GPUs in terms of energy, and can provide comparable performance. An important aspect of this trend is of course design technology, because these applications were traditionally programmed in high-level languages, while FPGAs required low-level RTL design. The OpenCL (Open Computing Language) developed by the Khronos group enables developers to program CPU, GPU and recently FPGAs using functionally portable (but sadly not performance portable) source code which creates new possibilities and challenges both for research and industry. FPGAs have been always used for mid-size designs and ASIC prototyping thanks to their energy efficient and flexible hardware architecture, but their usage requires hardware design knowledge and laborious design cycles. Several approaches are developed and deployed to address this issue and shorten the gap between software and hardware in FPGA design flow, in order to enable FPGAs to capture a larger portion of the hardware acceleration market in data centers. Moreover, FPGAs usage in data centers is growing already, regardless of and in addition to their use as computational accelerators, because they can be used as high performance, low power and secure switches inside data-centers. High-Level Synthesis (HLS) is the methodology that enables designers to map their applications on FPGAs (and ASICs). It synthesizes parallel hardware from a model originally written C-based programming languages .e.g. C/C++, SystemC and OpenCL. Design space exploration of the variety of implementations that can be obtained from this C model is possible through wide range of optimization techniques and directives, e.g. to pipeline loops and partition memories into multiple banks, which guide RTL generation toward application dependent hardware and benefit designers from flexible parallel architecture of FPGAs. Model Based Design (MBD) is a high-level and visual process used to generate implementations that solve mathematical problems through a varied set of IP-blocks. MBD enables developers with different expertise, e.g. control theory, embedded software development, and hardware design to share a common design framework and contribute to a shared design using the same tool. Simulink, developed by MATLAB, is a model based design tool for simulation and development of complex dynamical systems. Moreover, Simulink embedded code generators can produce verified C/C++ and HDL code from the graphical model. This code can be used to program micro-controllers and FPGAs. This PhD thesis work presents a study using automatic code generator of Simulink to target Xilinx FPGAs using both HDL and C/C++ code to demonstrate capabilities and challenges of high-level synthesis process. To do so, firstly, digital signal processing unit of a real-time radar application is developed using Simulink blocks. Secondly, generated C based model was used for high level synthesis process and finally the implementation cost of HLS is compared to traditional HDL synthesis using Xilinx tool chain. Alternative to model based design approach, this work also presents an analysis on FPGA programming via high-level synthesis techniques for computationally intensive algorithms and demonstrates the importance of HLS by comparing performance-per-watt of GPUs(NVIDIA) and FPGAs(Xilinx) manufactured in the same node running standard OpenCL benchmarks. We conclude that generation of high quality RTL from OpenCL model requires stronger hardware background with respect to the MBD approach, however, the availability of a fast and broad design space exploration ability and portability of the OpenCL code, e.g. to CPUs and GPUs, motivates FPGA industry leaders to provide users with OpenCL software development environment which promises FPGA programming in CPU/GPU-like fashion. Our experiments, through extensive design space exploration(DSE), suggest that FPGAs have higher performance-per-watt with respect to two high-end GPUs manufactured in the same technology(28 nm). Moreover, FPGAs with more available resources and using a more modern process (20 nm) can outperform the tested GPUs while consuming much less power at the cost of more expensive devices
    corecore