5,843 research outputs found

    Developing a compiler for the XeonPhi (TR-2014-341)

    Get PDF
    The XeonPhi is a highly parallel x86 architecture chip made by Intel. It has a number of novel features which make it a particularly challenging target for the compiler writer. This paper describes the techniques used to port the Glasgow Vector Pascal Compiler (VPC) to this architecture and assess its performance by comparisons of the XeonPhi with 3 other machines running the same algorithms

    Performance Evaluation of Optimal Ate Pairing on Low-Cost Single Microprocessor Platform

    Get PDF
    The framework of low-cost interconnected devices forms a new kind of cryptographic environment with diverse requirements. Due to the minimal resource capacity of the devices, light-weight cryptographic algorithms are favored. Many applications of IoT work autonomously and process sensible data, which emphasizes security needs, and might also cause a need for specific security measures. A bilinear pairing is a mapping based on groups formed by elliptic curves over extension fields. The pairings are the key-enabler for versatile cryptosystems, such as certificateless signatures and searchable encryption. However, they have a major computational overhead, which coincides with the requirements of the low-cost devices. Nonetheless, the bilinear pairings are the only known approach for many cryptographic protocols so their feasibility should certainly be studied, as they might turn out to be necessary for some future IoT solutions. Promising results already exist for high-frequency CPU:s and platforms with hardware extensions. In this work, we study the feasibility of computing the optimal ate pairing over the BN254 curve, on a 64 MHz Cortex-M33 based platform by utilizing an optimized open-source library. The project is carried out for the company Nordic Semiconductor. As a result, the pairing was effectively computed in under 26* 10^6 cycles, or in 410 ms. The resulting pairing enables a limited usage of pairing-based cryptography, with a capacity of at most few cryptographic operations, such as ID-based key verifications per second. Referring to other relevant works, a competent pairing application would require either a high-frequency - and thus high consuming - microprocessor, or a customized FPGA. Moreover, it is noted that the research in efficient pairing-based cryptography is constantly taking steps forward in every front-line: efficient algorithms, protocols, and hardware-solutions

    Distributed associative memories for high-speed symbolic reasoning

    Get PDF
    This paper briefly introduces a novel symbolic reasoning system based upon distributed associative memories which are constructed from correlation matrix memories (CMM). The system is aimed at high-speed rule-based symbolic operations. It has the advantage of very fast rule matching without the long training times normally associated with neural-network-based symbolic manipulation systems. In particular, the network is able to perform partial matching on symbolic information at high speed. As such, the system is aimed at the practical use of neural networks in high-speed reasoning systems. The paper describes the advantages and disadvantages of using CMM and shows how the approach overcomes those disadvantages. It then briefly describes a system incorporating CMM

    Low power digital signal processing

    Get PDF

    Neuroverkon inferenssi digitaalisessa signaalikäsittelyssä kovien reaaliaikavaatimusten alaisuudessa

    Get PDF
    The main objective of this thesis is to investigate how neural network inference can be efficiently implemented on a digital signal processor under hard real-time constraints from the execution speed point of view. Theories on digital signal processors and software optimization as well as neural networks are discussed. A neural network model for the specific use case is designed and a digital signal processor implementation is created based on the neural network model. A neural network model for the use case is created based on the data from the Matlab simulation model. The neural network model is trained and validated using the Python programming language with the Keras package. The neural network model is implemented on the CEVA-XC4500 digital signal processor. The digital signal processor implementation is written in C++ language with the processor specific vector-processing intrinsics. The neural network model is evaluated based on the model accuracy, precision, recall and f1-score. The model performance is compared to the conventional use case implementation by calculating 3GPP specified metrics of misdetection probability, false alarm rate and bit error rate. The execution speed of the digital signal processor implementation is evaluated with the CEVA integrated development environment profiling tool and also with the Lauterbach PowerTrace profiling module attached to the real base station product. Through this thesis, an optimized CEVA-XC4500 digital signal processor implementation was created for the specific neural network architecture. The optimized implementation showed to consume 88 percent less cycles than the conventional implementation. Also, the neural network model performance fulfills the 3GPP specification requirements.Tämän diplomityön tarkoituksena on tutkia miten neuroverkon inferenssi voidaan toteuttaa tehokkaasti digitaalisella signaaliprosessorilla suoritusnopeuden näkökulmasta, kun sovelluksella on kovat reaaliaikavaatimukset. Työssä käsitellään teoriaa digitaalisista signaaliprosessoreista, ohjelmistojen optimoinnista ja neuroverkoista. Työssä kehitetään neuroverkkomalli tiettyyn käyttötapaukseen, ja mallin pohjalta luodaan toteutus digitaaliselle signaaliprosessorille. Neuroverkkomalli luodaan Matlab-simulointimallin avulla kerätystä datasta. Neuroverkkomalli opetetaan ja varmennetaan Python-ohjelmointikiellellä ja Keras-paketilla. Neuroverkkomalli toteutetaan CEVA-XC4500 digitaaliselle signaaliprosessorille. Digitaalisen signaaliprosessorin toteutus kirjoitetaan C++-ohjelmointikielellä ja prosessorikohtaisilla vektorilaskentaoperaatioilla. Neuroverkkomalli varmennetaan mallin tarkkuuden, precision-arvon, recall-arvon ja f1-arvon perusteella. Mallin suorituskykyä verrataan käyttötapauksen tavanomaiseen toteutukseen laskemalla 3GPP-spesifikaation mukaiset mittarit virhehavaintotodennäköisyys, väärien hälytysten lukumäärä ja bittivirhemäärä. Suoritusnopeus määritetään sekä CEVA-ohjelmointiympäristön profilointityökalulla että tukiasematuotteeseen kytketyllä Lauterbach PowerTrace-yksiköllä. Työn tuloksena luotiin optimoitu CEVA-XC4500 digitaalinen signaaliprosessoritoteutus valitulle neuroverkkoarkkitehtuurille. Optimoitu toteutus kulutti 88% vähemmän laskentasyklejä kuin tavanomainen toteutus. Neuroverkkomalli täytti 3GPP-spesifikaation mukaiset vaatimukset

    On the design and implementation of a control system processor

    Get PDF
    In general digital control algorithms are multi-input multi-output (MIMO) recursive digital filters, but there are particular numerical requirements in control system processing for which standard processor devices are not well suited, in particular arising in systems with high sample rates. There is therefore a clear need to understand the numerical requirements properly, to identity optimised forms for implementing control laws, and to translate these into efficient processor architectures. By taking a considered view of the numerical and calculation requirements of control algorithms, it is possible to consider special purpose processors that provide well-targeted support of control laws. This thesis describes a compact, high-speed, special-purpose processor which offers a low-cost solution to implementing linear time invariant controllers. [Continues.

    Domain-Specific Computing Architectures and Paradigms

    Full text link
    We live in an exciting era where artificial intelligence (AI) is fundamentally shifting the dynamics of industries and businesses around the world. AI algorithms such as deep learning (DL) have drastically advanced the state-of-the-art cognition and learning capabilities. However, the power of modern AI algorithms can only be enabled if the underlying domain-specific computing hardware can deliver orders of magnitude more performance and energy efficiency. This work focuses on this goal and explores three parts of the domain-specific computing acceleration problem; encapsulating specialized hardware and software architectures and paradigms that support the ever-growing processing demand of modern AI applications from the edge to the cloud. This first part of this work investigates the optimizations of a sparse spatio-temporal (ST) cognitive system-on-a-chip (SoC). This design extracts ST features from videos and leverages sparse inference and kernel compression to efficiently perform action classification and motion tracking. The second part of this work explores the significance of dataflows and reduction mechanisms for sparse deep neural network (DNN) acceleration. This design features a dynamic, look-ahead index matching unit in hardware to efficiently discover fine-grained parallelism, achieving high energy efficiency and low control complexity for a wide variety of DNN layers. Lastly, this work expands the scope to real-time machine learning (RTML) acceleration. A new high-level architecture modeling framework is proposed. Specifically, this framework consists of a set of high-performance RTML-specific architecture design templates, and a Python-based high-level modeling and compiler tool chain for efficient cross-stack architecture design and exploration.PHDElectrical and Computer EngineeringUniversity of Michigan, Horace H. Rackham School of Graduate Studieshttp://deepblue.lib.umich.edu/bitstream/2027.42/162870/1/lchingen_1.pd
    corecore