281 research outputs found

    A Flexible Crypto-system Based upon the REDEFINE Polymorphic ASIC Architecture

    Get PDF
    The highest levels of security can be achieved through the use of more than one type of cryptographic algorithm for each security function. In this paper, the REDEFINE polymorphic architecture is presented as an architecture framework that can optimally support a varied set of crypto algorithms without losing high performance. The presented solution is capable of accelerating the advanced encryption standard (AES) and elliptic curve cryptography (ECC) cryptographic protocols, while still supporting different flavors of these algorithms as well as different underlying finite field sizes. The compelling feature of this cryptosystem is the ability to provide acceleration support for new field sizes as well as new (possibly proprietary) cryptographic algorithms decided upon after the cryptosystem is deployed.Defence Science Journal, 2012, 62(1), pp.25-31, DOI:http://dx.doi.org/10.14429/dsj.62.143

    Efficient Hardware Architectures for Accelerating Deep Neural Networks: Survey

    Get PDF
    In the modern-day era of technology, a paradigm shift has been witnessed in the areas involving applications of Artificial Intelligence (AI), Machine Learning (ML), and Deep Learning (DL). Specifically, Deep Neural Networks (DNNs) have emerged as a popular field of interest in most AI applications such as computer vision, image and video processing, robotics, etc. In the context of developed digital technologies and the availability of authentic data and data handling infrastructure, DNNs have been a credible choice for solving more complex real-life problems. The performance and accuracy of a DNN is a way better than human intelligence in certain situations. However, it is noteworthy that the DNN is computationally too cumbersome in terms of the resources and time to handle these computations. Furthermore, general-purpose architectures like CPUs have issues in handling such computationally intensive algorithms. Therefore, a lot of interest and efforts have been invested by the research fraternity in specialized hardware architectures such as Graphics Processing Unit (GPU), Field Programmable Gate Array (FPGA), Application Specific Integrated Circuit (ASIC), and Coarse Grained Reconfigurable Array (CGRA) in the context of effective implementation of computationally intensive algorithms. This paper brings forward the various research works carried out on the development and deployment of DNNs using the aforementioned specialized hardware architectures and embedded AI accelerators. The review discusses the detailed description of the specialized hardware-based accelerators used in the training and/or inference of DNN. A comparative study based on factors like power, area, and throughput, is also made on the various accelerators discussed. Finally, future research and development directions are discussed, such as future trends in DNN implementation on specialized hardware accelerators. This review article is intended to serve as a guide for hardware architectures for accelerating and improving the effectiveness of deep learning research.publishedVersio

    Design of Multi-Gigabit Network Interconnect Elements and Protocols for a Data Acquisition System in Radiation Environments

    Get PDF
    Modern High Energy Physics experiments (HEP) explore the fundamental nature of matter in more depth than ever before and thereby benefit greatly from the advances in the field of communication technology. The huge data volumes generated by the increasingly precise detector setups pose severe problems for the Data Acquisition Systems (DAQ), which are used to process and store this information. In addition, detector setups and their read-out electronics need to be synchronized precisely to allow a later correlation of experiment events accurately in time. Moreover, the substantial presence of charged particles from accelerator-generated beams results in strong ionizing radiation levels, which has a severe impact on the electronic systems. This thesis recommends an architecture for unified network protocol IP cores with custom developed physical interfaces for the use of reliable data acquisition systems in strong radiation environments. Special configured serial bidirectional point-to-point interconnects are proposed to realize high speed data transmission, slow control access, synchronization and global clock distribution on unified links to reduce costs and to gain compact and efficient read-out setups. Special features are the developed radiation hardened functional units against single and multiple bit upsets, and the common interface for statistical error and diagnosis information, which integrates well into the protocol capabilities and eases the error handling in large experiment setups. Many innovative designs for several custom FPGA and ASIC platforms have been implemented and are described in detail. Special focus is placed on the physical layers and network interface elements from high-speed serial LVDS interconnects up to 20 Gb/s SSTL links in state-of-the-art process technology. The developed IP cores are fully tested by an adapted verification environment for electronic design automation tools and also by live application. They are available in a global repository allowing a broad usage within further HEP experiments

    Approximate and timing-speculative hardware design for high-performance and energy-efficient video processing

    Get PDF
    Since the end of transistor scaling in 2-D appeared on the horizon, innovative circuit design paradigms have been on the rise to go beyond the well-established and ultraconservative exact computing. Many compute-intensive applications – such as video processing – exhibit an intrinsic error resilience and do not necessarily require perfect accuracy in their numerical operations. Approximate computing (AxC) is emerging as a design alternative to improve the performance and energy-efficiency requirements for many applications by trading its intrinsic error tolerance with algorithm and circuit efficiency. Exact computing also imposes a worst-case timing to the conventional design of hardware accelerators to ensure reliability, leading to an efficiency loss. Conversely, the timing-speculative (TS) hardware design paradigm allows increasing the frequency or decreasing the voltage beyond the limits determined by static timing analysis (STA), thereby narrowing pessimistic safety margins that conventional design methods implement to prevent hardware timing errors. Timing errors should be evaluated by an accurate gate-level simulation, but a significant gap remains: How these timing errors propagate from the underlying hardware all the way up to the entire algorithm behavior, where they just may degrade the performance and quality of service of the application at stake? This thesis tackles this issue by developing and demonstrating a cross-layer framework capable of performing investigations of both AxC (i.e., from approximate arithmetic operators, approximate synthesis, gate-level pruning) and TS hardware design (i.e., from voltage over-scaling, frequency over-clocking, temperature rising, and device aging). The cross-layer framework can simulate both timing errors and logic errors at the gate-level by crossing them dynamically, linking the hardware result with the algorithm-level, and vice versa during the evolution of the application’s runtime. Existing frameworks perform investigations of AxC and TS techniques at circuit-level (i.e., at the output of the accelerator) agnostic to the ultimate impact at the application level (i.e., where the impact is truly manifested), leading to less optimization. Unlike state of the art, the framework proposed offers a holistic approach to assessing the tradeoff of AxC and TS techniques at the application-level. This framework maximizes energy efficiency and performance by identifying the maximum approximation levels at the application level to fulfill the required good enough quality. This thesis evaluates the framework with an 8-way SAD (Sum of Absolute Differences) hardware accelerator operating into an HEVC encoder as a case study. Application-level results showed that the SAD based on the approximate adders achieve savings of up to 45% of energy/operation with an increase of only 1.9% in BD-BR. On the other hand, VOS (Voltage Over-Scaling) applied to the SAD generates savings of up to 16.5% in energy/operation with around 6% of increase in BD-BR. The framework also reveals that the boost of about 6.96% (at 50°) to 17.41% (at 75° with 10- Y aging) in the maximum clock frequency achieved with TS hardware design is totally lost by the processing overhead from 8.06% to 46.96% when choosing an unreliable algorithm to the blocking match algorithm (BMA). We also show that the overhead can be avoided by adopting a reliable BMA. This thesis also shows approximate DTT (Discrete Tchebichef Transform) hardware proposals by exploring a transform matrix approximation, truncation and pruning. The results show that the approximate DTT hardware proposal increases the maximum frequency up to 64%, minimizes the circuit area in up to 43.6%, and saves up to 65.4% in power dissipation. The DTT proposal mapped for FPGA shows an increase of up to 58.9% on the maximum frequency and savings of about 28.7% and 32.2% on slices and dynamic power, respectively compared with stat

    Developing a Silicon Pixel Detector for the Next Generation LHCb Experiment

    Get PDF
    The second long shutdown of the LHC presents an opportunity for the LHCb experiment to upgrade its detector systems and switch to a fully software triggered readout. Its first tracking layer, the VELO detector, is no exception to this and is undergoing an upgrade increasing the number of sensitive channels from 180 thousand silicon microstrips to about 41 million pixels. The new system will operate with zero-suppressed readout at 40 MHz, while cooled down using evaporative liquid CO2_2 in silicon microchannel plates. The VELO Upgrade will consist of 52 modules, placed around the beam-pipe, built at the University of Manchester and Nikhef. The construction of the modules is a complex process that consists of a number of tight tolerance steps, their results verified both in metrology and in the electrical and thermal performance testing. In order to store data and track the performance a database has been developed, used to automatically analyse the uploaded values as well as compute the grades and quality of the individual steps and final modules. By the end of August 2021, 42 modules have been produced in Manchester, 37 of them with high quality and no issues present. Due to the nature of the harsh radiation environment, the sensors have to withstand a fluence up to 1e16 1 MeV neqcm−2_\mathrm{eq} \mathrm{cm^{-2}} and still provide a good signal to noise ratio. A new method of a charge collection scan has been proposed, linking the commonly used voltage scan with a threshold scan and using the extrapolated tracking information to estimate the amount of collected charge. The simulation indicates that the scan of a subset of modules will take about 8 min, a feasible duration despite the impact on the physics data taking. A further upgrade of the LHCb is planned for Long Shutdown four of the LHC. This will operate at higher luminosities leading to a significant increase in the pile-up of the collisions from a single proton-proton bunch crossing. For this reason a precise time stamping O\mathcal{O}(50 ps) is to be added. This could be achieved in silicon detectors by using O\mathcal{O}(10) internal gain in the sensor. Simulations of the expected performance of a recently produced batch of sensors are presented. These characterise the anticipated performance of these O\mathcal{O}(50 μ\mum) segmented devices in a test beam, providing the impact of charge sharing and device response to an angular scan

    Research Naval Postgraduate School, v.13, no.1, February 2003

    Get PDF
    NPS Research is published by the Research and Sponsored Programs, Office of the Vice President and Dean of Research, in accordance with NAVSOP-35. Views and opinions expressed are not necessarily those of the Department of the Navy.Approved for public release; distribution is unlimited

    Atténuation des interactions électromagnétiques entre le module de détection LabPET II et l’IRM

    Get PDF
    Les scanners TEP/IRM simultanés offrent une occassion unique d'examiner en même temps les propriétés anatomiques et fonctionnelles des tissus malins, tout en évitant l'incertitude des systèmes séquentiels de TEP/IRM. Cependant, le couplage électromagnétique entre les deux modalités constitue un défi important à relever. Ces interférences électromagnétiques entravent les performances du scanner et altèrent la qualité d'image de chaque modalité. Bien que les métaux possèdent d'excellentes propriétés de blindage contre les fréquences radioélectriques, ils ne constituent pas nécessairement une option de blindage appropriée pour modifier les champs magnétiques induisant des courants de Foucault dans les couches métalliques. En conséquence, il existe une demande considérable pour un nouveau matériau de protection et une approche originale pour retirer les pièces métalliques du champ de vision IRM. L’objectif de ce projet était d’initier les études en vue de la réalisation d’un scanner TEP/IRM simultané basé sur des modules de détection LabPET II hautement pixélisés afin d’obtenir une résolution spatiale millimétrique pour le cerveau humain et le chien. L'électronique LabPET II comprend des circuits intégrés à application spécifique dans lesquels le signal est numérisé à proximité de la photodiode à avalanche et offre un environnement moins sensible aux interférences électromagnétiques. Pour atteindre l'objectif principal, premièrement, l'effet du matériau métallique des modules de détection LabPET II sur les performances de la TEP et de l'IRM est examiné théoriquement. Les résultats confirment que les composants métalliques du module de détection LabPET II altèrent le champ magnétique, génèrent des courants de Foucault ce qui augmente leur température. Ensuite, les performances électroniques des modules de détection LabPET II sous l’influence de bobines d’IRM faites sur mesure sont examinées. La résolution en énergie et la résolution temporelle se détériorent en présence de bobines RF et de bobines à gradient en raison des perturbations électromagnétiques. Subséquemment, un module de détection LabPET II blindé par une fine couche de composite cuivre-argent est étudié, prouvant que le blindage contre les interférences électromagnétiques avec le composite rétablit les performances en TEP, fournissant moins d'induction par courants de Foucault. En outre, une nouvelle configuration de blindage basée sur un composite de couche flexible de nanotubes de carbone a été fabriquée pour limiter les interférences électromagnétiques. Les composites de nanotubes de carbone créent une couche hautement conductrice avec des chemins conducteurs minimaux, ce qui permet de réduire les courants de Foucault. Le principal résultat scientifique de ce projet est que le blindage composite empêche les interférences de basses et hautes fréquences et réduit l'induction de courants de Foucault, offrant ainsi la flexibilité nécessaire pour acquérir une séquence rapide de commutation de gradients. D'un point de vue technique, le module de détection LabPET II ainsi blindé présente une excellente performance dans un environnement de type IRM, ce qui permet de concevoir un insert TEP basé sur la technologie LabPET II.Abstract: Simultaneous PET/ MRI scanners provide a unique opportunity to investigate anatomical and functional properties of malignant tissues at the same time while avoiding the uncertainty of a sequential PET/MRI systems. However, electromagnetic coupling between the two modalities is a significant challenge that needs to be addressed. These electromagnetic interferences (EMI) hinder the performance of both scanners and distort the image quality of each modality. Although metals have excellent radio-frequency shielding properties, they are not necessarily an appropriate shielding option for altering magnetic fields that induce eddy currents in any metallic layer. Thus, there is a considerable demand for a new shielding material and an original approach to remove metallic parts from the MRI field of view. The objective of this project was to initiate the realization of a simultaneous PET/MRI scanner based on highly pixelated LabPET II detection modules to achieve millimeter spatial resolution for the human brain and dogs. The LabPET II electronics include application specific integrated circuits where the signal is digitized near the avalanche photodiode and offers an environment less susceptible to EMI. To fulfill the main aim, for the first time, the effect of the metallic material of LabPET II on PET and MRI performance was theoretically examined. Results confirm that metallic components of the LabPET II detection modules distort the magnetic field, generate eddy currents, and increase temperature. Then, the LabPET II electronics performance under the influence of custom-made MRI coils was investigated. Its energy and timing resolutions deteriorate in the presence of both RF and gradient signals because of EMIs. Thus, a LabPET II detection module shielded by a thin layer of the copper-silver composite was investigated, proving that shielding EMIs with the composite restores the PET performance, with less eddy current induction. Besides, a new shielding configuration based on a flexible layer of carbon nanotube (CNT) composite was fabricated to limit the EMIs. The CNT composite creates a highly conductive layer with minimal conductive paths that allows eddy currents to be decreased. The primary scientific outcome of this project is that the novel composite shielding rejects both low and high-frequency interferences and reduces eddy current induction, offering the flexibility to acquire a fast gradient switching sequence. From a technical point of view, the shielded LabPET II detection module demonstrates an excellent performance in an MRI-like environment supporting the feasibility of designing a PET-insert based on LabPET II technology

    CBM Progress Report 2009

    Get PDF
    • …
    corecore