21 research outputs found

    Adaptive and hybrid schemes for efficient parallel squaring and cubing units

    Get PDF
    Squaring (X2) and cubing (X3) units are special operations of multiplication used in many applications, such as image compression, equalization, decoding and demodulation, 3D graphics, scientific computing, artificial neural networks, logarithmic number system, and multimedia application. They can also be an efficient way to compute other basic functions. Therefore, improving their performances is a goal for many researchers. This dissertation will discuss modification to algorithms to compute parallel squaring and cubing units in both signed and unsigned representation. After that, truncated technique is applied to improve their performance. Each unit is modeled and estimated to obtain its area, delay by using linear evaluation model. A C program was written to generate Hardware Description Language files for each unit. These units are simulated and verified in simulation. Moreover, area, delay, and power consumption are calculated for each unit and compared with those ones in previous approaches for both Virtex 5 Xilinx FPGA and IBM 65nm ASIC technologies

    Optimized linear, quadratic and cubic interpolators for elementary function hardware implementations

    Get PDF
    This paper presents a method for designing linear, quadratic and cubic interpolators that compute elementary functions using truncated multipliers, squarers and cubers. Initial coefficient values are obtained using a Chebyshev series approximation. A direct search algorithm is then used to optimize the quantized coefficient values to meet a user-specified error constraint. The algorithm minimizes coefficient lengths to reduce lookup table requirements, maximizes the number of truncated columns to reduce the area, delay and power of the arithmetic units, and minimizes the maximum absolute error of the interpolator output. The method can be used to design interpolators to approximate any function to a user-specified accuracy, up to and beyond 53-bits of precision (e.g., IEEE double precision significand). Linear, quadratic and cubic interpolator designs that approximate reciprocal, square root, reciprocal square root and sine are presented and analyzed. Area, delay and power estimates are given for 16, 24 and 32-bit interpolators that compute the reciprocal function, targeting a 65 nm CMOS technology from IBM. Results indicate the proposed method uses smaller arithmetic units and has reduced lookup table sizes compared to previously proposed methods. The method can be used to optimize coefficients in other systems while accounting for coefficient quantization as well as truncation and rounding effects of multiple arithmetic units.Peer reviewedElectrical and Computer Engineerin

    An FPGA-based programmable processor for bilinear pairings

    Get PDF
    Bilinear pairings on elliptic curves are an active research field in cryptography. First cryptographic protocols based on bilinear pairings were proposed by the year 2000 and they are promising solutions to security concerns in different domains, as in Pervasive Computing and Cloud Computing. The computation of bilinear pairings that relies on arithmetic over finite fields is the most time-consuming in Pairing-based cryptosystems. That has motivated the research on efficient hardware architectures that improve the performance of security protocols. In the literature, several works have focused in the design of custom hardware architectures for pairings, however, flexible designs provide advantages due to the fact that there are several types of pairings and algorithms to compute them. This work presents the design and implementation of a novel programmable cryptoprocessor for computing bilinear pairings over binary fields in FPGAs, which is able to support different pairing algorithms and parameters as the elliptic curve, the tower field and the distortion map. The results show that high flexibility is achieved by the proposed cryptoprocessor at a competitive timing and area usage when it is compared to custom designs for pairings defined over singular/supersingular elliptic curves at a 128-bit security level

    SELF-ADAPTING PARALLEL FRAMEWORK FOR LONG-TERM OBJECT TRACKING

    Get PDF
    Object tracking is a crucial field in computer vision that has many uses in human-computer interaction, security and surveillance, video communication and compression, augmented reality, traffic control, etc. Many implementations are introduced in practice, and yet recent methods emphasize on tracking objects adaptively by learning the object’s perspectives and rediscovering it when it becomes untraceable, so that object’s absence problem (in case of occlusion, cluttering or blurring) is resolved. Most of these algorithms have high computational burden on the computational units and need powerful CPUs to attain real-time tracking and high bitrate video processing. These computational units may handle no more than a single video source, making it unsuitable for large-scale implementations like multiple sources or higher resolution videos. In this thesis, we choose one popular algorithm called TLD, Tracking-Learning-Detection, study the core components of the algorithm that impede its performance, and implement these components in a parallel computational environment such as multi-core CPUs, GPUs, etc., also known as heterogeneous computing. OpenCL is used as a development platform to produce parallel kernels for the algorithm. The goals are to create an acceptable heterogeneous computing environment through utilizing current computer technologies, to imbue real-time applications with an alternative implementation methodology, and to circumvent the upcoming limitations of hardware in terms of cost, power, and speedup. We are able to bring true parallel speedup to the existing implementations, which greatly improves the frame rate for long-term object tracking and with some algorithm parameter modification, it provides more accurate object tracking. According to the experiments, developed kernels have achieved a range of performance improvement. As for reduction based kernels, a maximum of 78X speedup is achieved. While for window based kernels, a range of couple hundreds to 2000X speedup is achieved. And for the optical flow tracking kernel, a maximum of 5.7X speedup is recorded. Global speedup is highly dependent on the hardware specifications, especially for memory transfers. With the use of a medium sized input, the self-adapting parallel framework has successfully obtained a fast learning curve and converged to an average of 1.6X speedup compared to the original implementation. Lastly, for future programming convenience, an OpenCL based library is built to facilitate the use of OpenCL programming on parallel hardware devices, hide the complexity of building and compiling OpenCL kernels, and provide a C-based latency measurement tool that is compatible with several operating systems

    From profession to teaching: Manuel Trillo and his trips to England and the collective housing in "La Motilla"

    Get PDF
    Este artículo analiza la relación entre los viajes a Inglaterra de Manuel Trillo, el proyecto de viviendas colectivas en La Motilla y su investigación docente sobre vivienda y ciudad. La construcción del Edificio de Oficinas del Sevilla 1, obligó a Manuel Trillo en el año 1971, a realizar un primer viaje a Inglaterra descubriendo una arquitectura nueva y tecnológica. El interés por las teorías de Archigram, la arquitectura de Stirling y Alison&Peter Smithson motivaron, al poco tiempo, un segundo viaje. El aprendizaje que deriva de ellos, tuvo una primera consecuencia en las viviendas de La Motilla, trasladando a Sevilla modelos residenciales visitados en Inglaterra de los Smithson y de Stirling. La creación de ciudad con vivienda colectiva se convertirá en el principal campo de reflexión, investigación y práctica docente del profesor Manuel Trillo: desde la década de los setenta hasta sus últimos años en activo; en la escuela de arquitectura de Sevilla y en su breve pero intensa estancia en la escuela de arquitectura de Valladolid. En el año 2003, volverá a realizar un tercer viaje a Inglaterra; la historia parecerá repetirse, proponiendo de nuevo otras arquitecturas posibles para la ciudad de Sevilla cuando, al paso de los años, ni la tecnología ni la industrialización eran ya obstáculos para su realización.This article analyses the relationship between Manuel Trillo’s trips to England, the collective housing project in La Motilla and his academic research on housing and the city. The construction of the Seville 1 office building, forced Manuel Trillo in 1971, to make his first trip to England and discover a new technological architecture. The interest in theories of Archigram, architecture by Stirling and Alison&Peter Smithson motivated, soon after, a second trip. Learning that derived from them, had a first result in the homes of La Motilla, transferring to Seville residential models by Smithson and Stirling in England. The creation of the city with collective housing will become the main field of reflection, research and teaching practice of Professor Manuel Trillo: from the seventies until his last active years; in the School of Architecture of Seville and in his brief but intense stay in the School of Architecture of Valladolid. In 2003, he made a third trip to England; it seemed that history repeated itself, proposing new and other possible architectures for Seville when, over the years, neither technology nor industrialisation were obstacles to its execution

    Leveraging Signal Transfer Characteristics and Parasitics of Spintronic Circuits for Area and Energy-Optimized Hybrid Digital and Analog Arithmetic

    Get PDF
    While Internet of Things (IoT) sensors offer numerous benefits in diverse applications, they are limited by stringent constraints in energy, processing area and memory. These constraints are especially challenging within applications such as Compressive Sensing (CS) and Machine Learning (ML) via Deep Neural Networks (DNNs), which require dot product computations on large data sets. A solution to these challenges has been offered by the development of crossbar array architectures, enabled by recent advances in spintronic devices such as Magnetic Tunnel Junctions (MTJs). Crossbar arrays offer a compact, low-energy and in-memory approach to dot product computation in the analog domain by leveraging intrinsic signal-transfer characteristics of the embedded MTJ devices. The first phase of this dissertation research seeks to build on these benefits by optimizing resource allocation within spintronic crossbar arrays. A hardware approach to non-uniform CS is developed, which dynamically configures sampling rates by deriving necessary control signals using circuit parasitics. Next, an alternate approach to non-uniform CS based on adaptive quantization is developed, which reduces circuit area in addition to energy consumption. Adaptive quantization is then applied to DNNs by developing an architecture allowing for layer-wise quantization based on relative robustness levels. The second phase of this research focuses on extension of the analog computation paradigm by development of an operational amplifier-based arithmetic unit for generalized scalar operations. This approach allows for 95% area reduction in scalar multiplications, compared to the state-of-the-art digital alternative. Moreover, analog computation of enhanced activation functions allows for significant improvement in DNN accuracy, which can be harnessed through triple modular redundancy to yield 81.2% reduction in power at the cost of only 4% accuracy loss, compared to a larger network. Together these results substantiate promising approaches to several challenges facing the design of future IoT sensors within the targeted applications of CS and ML

    Quantum Compiling Methods for Fault-Tolerant Gate Sets of Dimension Greater than Two

    Get PDF
    Fault-tolerant gate sets whose generators belong to the Clifford hierarchy form the basis of many protocols for scalable quantum computing architectures. At the beginning of the decade, number-theoretic techniques were employed to analyze circuits over these gate sets on single qubits, providing the basis for a number of state-of-the-art quantum compiling algorithms. In this dissertation, I further this program by employing number-theoretic techniques for higher-dimensional gate sets on both qudit and multi-qubit circuits. First, I introduce canonical forms for single qutrit Clifford+T circuits and prove that every single-qutrit Clifford+T operator admits a unique such canonical form. I show that these canonical forms are T-optimal and describe an algorithm which takes as input a Clifford+T circuit and outputs the canonical form for that operator. The algorithm runs in time linear in the number of gates of the circuit. Our results provide a higher-dimensional generalization of prior work by Matsumoto and Amano who introduced similar canonical forms for single-qubit Clifford+T circuits. Finally, we show that a similar extension of these normal forms to higher dimensions exists, but do not establish uniqueness. Moving to multi-qubit circuits, I provide number-theoretic characterizations for certain restricted Clifford+T circuits by considering unitary matrices over subrings of Z[1/√2, i]. We focus on the subrings Z[1/2], Z[1/√2], Z[1/√−2], and Z[1/2, i], and we prove that unitary matrices with entries in these rings correspond to circuits over well-known universal gate sets. In each case, the desired gate set is obtained by extending the set of classical reversible gates {X, CX, CCX} with an analogue of the Hadamard gate and an optional phase gate. I then establish the existence and uniqueness of a normal form for one of these gate sets, the two-qubit gate set of Clifford+Controlled Phase gate CS. This normal form is optimal in the number of CS gates, making it the first normal form that is non-Clifford optimal for a fault tolerant universal multi-qubit gate set. We provide a synthesis algorithm that runs in a time linear in the gate count and outputs the equivalent normal form. In proving the existence and uniqueness of the normal form, we likewise establish the generators and relations for the two-qubit Clifford+CS group. Finally, we demonstrate that a lower bound of 5 log2 (1/ε) + O(1) CS gates are required to ε-approximate any 4 × 4 unitary matrix. Lastly, using the characterization of circuits over the Clifford+CS gate set and the existence of an optimal normal form, I provide an ancilla-free inexact synthesis algorithm for two-qubit unitaries using the Clifford+SC gate set for Pauli-rotations. These operators require 6 log2 (1/ε) + O(1) CS gates to synthesize in the typical case and 8 log2 (1/ε) + O(1) in the worst case

    Modern Machine Learning for LHC Physicists

    Full text link
    Modern machine learning is transforming particle physics, faster than we can follow, and bullying its way into our numerical tool box. For young researchers it is crucial to stay on top of this development, which means applying cutting-edge methods and tools to the full range of LHC physics problems. These lecture notes are meant to lead students with basic knowledge of particle physics and significant enthusiasm for machine learning to relevant applications as fast as possible. They start with an LHC-specific motivation and a non-standard introduction to neural networks and then cover classification, unsupervised classification, generative networks, and inverse problems. Two themes defining much of the discussion are well-defined loss functions reflecting the problem at hand and uncertainty-aware networks. As part of the applications, the notes include some aspects of theoretical LHC physics. All examples are chosen from particle physics publications of the last few years. Given that these notes will be outdated already at the time of submission, the week of ML4Jets 2022, they will be updated frequently.Comment: First version, we very much appreciate feedbac

    A Salad of Block Ciphers

    Get PDF
    This book is a survey on the state of the art in block cipher design and analysis. It is work in progress, and it has been for the good part of the last three years -- sadly, for various reasons no significant change has been made during the last twelve months. However, it is also in a self-contained, useable, and relatively polished state, and for this reason I have decided to release this \textit{snapshot} onto the public as a service to the cryptographic community, both in order to obtain feedback, and also as a means to give something back to the community from which I have learned much. At some point I will produce a final version -- whatever being a ``final version\u27\u27 means in the constantly evolving field of block cipher design -- and I will publish it. In the meantime I hope the material contained here will be useful to other people
    corecore