157 research outputs found

    Self-Partial and Dynamic Reconfiguration Implementation for AES using FPGA

    Get PDF
    This paper addresses efficient hardware/software implementation approaches for the AES (Advanced Encryption Standard) algorithm and describes the design and performance testing algorithm for embedded system. Also, with the spread of reconfigurable hardware such as FPGAs (Field Programmable Gate Array) embedded cryptographic hardware became cost-effective. Nevertheless, it is worthy to note that nowadays, even hardwired cryptographic algorithms are not so safe. From another side, the self-reconfiguring platform is reported that enables an FPGA to dynamically reconfigure itself under the control of an embedded microprocessor. Hardware acceleration significantly increases the performance of embedded systems built on programmable logic. Allowing a FPGA-based MicroBlaze processor to self-select the coprocessors uses can help reduce area requirements and increase a system's versatility. The architecture proposed in this paper is an optimal hardware implementation algorithm and takes dynamic partially reconfigurable of FPGA. This implementation is good solution to preserve confidentiality and accessibility to the information in the numeric communication

    Quantum-dot Cellular Automata: Review Paper

    Get PDF
    Quantum-dot Cellular Automata (QCA) is one of the most important discoveries that will be the successful alternative for CMOS technology in the near future. An important feature of this technique, which has attracted the attention of many researchers, is that it is characterized by its low energy consumption, high speed and small size compared with CMOS.  Inverter and majority gate are the basic building blocks for QCA circuits where it can design the most logical circuit using these gates with help of QCA wire. Due to the lack of availability of review papers, this paper will be a destination for many people who are interested in the QCA field and to know how it works and why it had taken lots of attention recentl

    Efficient Traffic State Forecasting using Spatio-Temporal Network Dependencies: A Sparse Graph Neural Network Approach

    Full text link
    Traffic state prediction in a transportation network is paramount for effective traffic operations and management, as well as informed user and system-level decision-making. However, long-term traffic prediction (beyond 30 minutes into the future) remains challenging in current research. In this work, we integrate the spatio-temporal dependencies in the transportation network from network modeling, together with the graph convolutional network (GCN) and graph attention network (GAT). To further tackle the dramatic computation and memory cost caused by the giant model size (i.e., number of weights) caused by multiple cascaded layers, we propose sparse training to mitigate the training cost, while preserving the prediction accuracy. It is a process of training using a fixed number of nonzero weights in each layer in each iteration. We consider the problem of long-term traffic speed forecasting for a real large-scale transportation network data from the California Department of Transportation (Caltrans) Performance Measurement System (PeMS). Experimental results show that the proposed GCN-STGT and GAT-STGT models achieve low prediction errors on short-, mid- and long-term prediction horizons, of 15, 30 and 45 minutes in duration, respectively. Using our sparse training, we could train from scratch with high sparsity (e.g., up to 90%), equivalent to 10 times floating point operations per second (FLOPs) reduction on computational cost using the same epochs as dense training, and arrive at a model with very small accuracy loss compared with the original dense trainin

    Battery-aware design exploration of scheduling policies for multi-sensor devices

    Get PDF
    Lifetime maximization is a key challenge in battery-powered multi-sensor devices. Battery-aware power management strategies combine task scheduling with dynamic voltage scaling (DVS), accounting for the fact that the power drawn by the device is different from that provided by the battery due to its many non-idealities. However, state-of-the-art techniques in this field do not take into account several important aspects, such as the impact of sensing tasks on the overall power demand, the (operating point dependent) losses due to multiple DC-DC conversions, and the dynamic modifications in battery efficiency caused by different distributions of the currents in the temporal and in the frequency domains. In this work, we propose a novel approach to identify optimal power management solutions, that addresses all these limitations. Specifically, using advanced battery and DC-DC converter models, we propose methods to explore the scheduling space both statically (at design time) and dynamically (at run-time), accounting not only for computation tasks, but also for communication and sensing. With this method, we show that the battery lifetime can be increased by as much as 23.36% if an optimal power management strategy is adopted

    Buffered Steiner Trees for difficult instances

    Get PDF
    Buffer insertion has become an increasingly critical optimization in high performance design. The problem of finding a delay-optimal buffered Steiner tree has been an active area of research, and excellent solutions exist for most instances. However, current approaches fail to adequately solve a particular class of real-world "difficult" instances which are characterized by a large number of sinks, variations in sink criticalities, and varying polarity requirements. We propose a new Steiner tree construction called C-Tree for these instance types. When combined with van Ginneken style buffer insertion, C-Tree achieves higher quality solutions with fewer resources compared to traditional approaches

    Variability-Aware Design of Subthreshold Devices

    Get PDF
    Over the last 10 years, digital subthreshold logic circuits have been developed for applications in the ultra-low power design domain, where performance is not the priority. Recently, devices optimized for subthreshold operation have been introduced as potential construction blocks. However, for these devices, a strong sensitivity to process variations is expected due to the exponential relationship of the subthreshold drive current and the threshold voltage. In this thesis, a yield optimization technique is proposed to suppress the variability of a device optimized for subthreshold operation. The goal of this technique is to construct and inscribe a maximum yield cube in the 3-D feasible region composed of oxide thickness, gate length, and channel doping concentration. The center of this cube is chosen as the maximum yield design point with the highest immunity against variations. By using the technique, a transistor is optimized for subthreshold operation in terms of the desired total leakage current and intrinsic delay bounds. To develop the concept of the technique, sample devices are designed for 90nm and 65nm technologies. Monte Carlo simulations verify the accuracy of the technique for meeting power and delay constraints under technology-specific variances of the design parameters of the device

    Design and development from single core reconfigurable accelerators to a heterogeneous accelerator-rich platform

    Get PDF
    The performance of a platform is evaluated based on its ability to deal with the processing of multiple applications of different nature. In this context, the platform under evaluation can be of homogeneous, heterogeneous or of hybrid architecture. The selection of an architecture type is generally based on the set of different target applications and performance parameters, where the applications can be of serial or parallel nature. The evaluation is normally based on different performance metrics, e.g., resource/area utilization, execution time, power and energy consumption. This process can also include high-level performance metrics, e.g., Operations Per Second (OPS), OPS/Watt, OPS/Hz, Watt/Area etc. An example of architecture selection can be related to a wireless communication system where the processing of computationally-intensive signal-processing algorithms has strict execution-time constraints and in this case, a platform with special-purpose accelerators is relatively more suitable than a typical homogeneous platform. A couple of decades ago, it was expensive to plant many special-purpose accelerators on a chip as the cost per unit area was relatively higher than today. The utilization wall is also becoming a limiting factor in homogeneous multicore scaling which means that all the cores on a platform cannot be operated at their maximum frequency due to a possible thermal meltdown. In this case, some of the processing cores have to be turned-off or to be operated at very low frequencies making most of the part of the chip to stay underutilized. A possible solution lies in the use of heterogeneous multicore platforms where many application-specific cores operate at lower frequencies, therefore reducing power dissipation density and increasing other performance parameters. However, to achieve maximum flexibility in processing, a general-purpose flavor can also be introduced by adding a few Reduced Instruction-Set Computing (RISC) cores. A power class of heterogeneous multicore platforms is an accelerator-rich platform where many application-specific accelerators are loosely connected with each other for work load distribution or to execute the tasks independently. This research work spans from the design and development of three different types of template-based Coarse-Grain Reconfigurable Arrays (CGRAs), i.e., CREMA, AVATAR and SCREMA to a Heterogeneous Accelerator-Rich Platform (HARP). The accelerators generated from the three CGRAs could perform different lengths and types of Fast Fourier Transform (FFT), real and complex Matrix-Vector Multiplication (MVM) algorithms. CREMA and AVATAR were fixed CGRAs with eight and sixteen number of Processing Element (PE) columns, respectively. SCREMA could flex between four, eight, sixteen and thirty two number of PE columns. Many case studies were conducted to evaluate the performance of the reconfigurable accelerators generated from these CGRA templates. All of these CGRAs work in a processor/coprocessor model tightly integrated with a Direct Memory Access (DMA) device. Apart from these platforms, a reconfigurable Application-Specific Instruction-set Processor (rASIP) is also designed, tested for FFT execution under IEEE-802.11n timing constraints and evaluated against a processor/coprocessor model. It was designed by integrating AVATAR generated radix-(2, 4) FFT accelerator into the datapath of a RISC processor. The instruction set of the RISC processor was extended to perform additional operations related to AVATAR. As mentioned earlier, the underutilized part of the chip, now-a-days called Dark Silicon is posing many challenges for the designers. Apart from software optimizations, clock gating, dynamic voltage/frequency scaling and other high-level techniques, one way of dealing with this problem is to use many application-specific cores. In an effort to maximize the number of reconfigurable processing resources on a platform, the accelerator-rich architecture HARP was designed and evaluated in terms of different performance metrics. HARP is constructed on a Network-on-Chip (NoC) of 3x3 nodes where with every node, a CGRA of application-specific size is integrated other than the central node which is attached to a RISC processor. The RISC establishes synchronization between the nodes for data transfer and also performs the supervisory control. While using the NoC as the backbone of communication between the cores, it becomes possible for all the cores to address each other and also perform execution simultaneously and independently of each other. The performance of accelerators generated from CREMA, AVATAR and SCREMA templates were evaluated individually and also when attached to HARP's NoC nodes. The individual CGRAs show promising results in their own capacity but when integrated all together in the framework of HARP, interesting comparisons were established in terms of overall execution times, resource utilization, operating frequencies, power and energy consumption. In evaluating HARP, estimates and measurements were also made in some advanced performance metrics, e.g., in MOPS/mW and MOPS/MHz. The overall research work promotes the idea of heterogeneous accelerator-rich platform as a solution to current problems and future needs of industry and academia

    Air Force Institute of Technology Research Report 2010

    Get PDF
    This report summarizes the research activities of the Air Force Institute of Technology’s Graduate School of Engineering and Management. It describes research interests and faculty expertise; lists student theses/dissertations; identifies research sponsors and contributions; and outlines the procedures for contacting the school. Included in the report are: faculty publications, conference presentations, consultations, and funded research projects. Research was conducted in the areas of Aeronautical and Astronautical Engineering, Electrical Engineering and Electro-Optics, Computer Engineering and Computer Science, Systems and Engineering Management, Operational Sciences, Mathematics, Statistics and Engineering Physic
    • …