316,354 research outputs found

    Notes on implementation of sparsely distributed memory

    Get PDF
    The Sparsely Distributed Memory (SDM) developed by Kanerva is an unconventional memory design with very interesting and desirable properties. The memory works in a manner that is closely related to modern theories of human memory. The SDM model is discussed in terms of its implementation in hardware. Two appendices discuss the unconventional approaches of the SDM: Appendix A treats a resistive circuit for fast, parallel address decoding; and Appendix B treats a systolic array for high throughput read and write operations

    SIRENA: A CAD environment for behavioural modelling and simulation of VLSI cellular neural network chips

    Get PDF
    This paper presents SIRENA, a CAD environment for the simulation and modelling of mixed-signal VLSI parallel processing chips based on cellular neural networks. SIRENA includes capabilities for: (a) the description of nominal and non-ideal operation of CNN analogue circuitry at the behavioural level; (b) performing realistic simulations of the transient evolution of physical CNNs including deviations due to second-order effects of the hardware; and, (c) evaluating sensitivity figures, and realize noise and Monte Carlo simulations in the time domain. These capabilities portray SIRENA as better suited for CNN chip development than algorithmic simulation packages (such as OpenSimulator, Sesame) or conventional neural networks simulators (RCS, GENESIS, SFINX), which are not oriented to the evaluation of hardware non-idealities. As compared to conventional electrical simulators (such as HSPICE or ELDO-FAS), SIRENA provides easier modelling of the hardware parasitics, a significant reduction in computation time, and similar accuracy levels. Consequently, iteration during the design procedure becomes possible, supporting decision making regarding design strategies and dimensioning. SIRENA has been developed using object-oriented programming techniques in C, and currently runs under the UNIX operating system and X-Windows framework. It employs a dedicated high-level hardware description language: DECEL, fitted to the description of non-idealities arising in CNN hardware. This language has been developed aiming generality, in the sense of making no restrictions on the network models that can be implemented. SIRENA is highly modular and composed of independent tools. This simplifies future expansions and improvements.Comisión Interministerial de Ciencia y Tecnología TIC96-1392-C02-0

    A design methodology for portable software on parallel computers

    Get PDF
    This final report for research that was supported by grant number NAG-1-995 documents our progress in addressing two difficulties in parallel programming. The first difficulty is developing software that will execute quickly on a parallel computer. The second difficulty is transporting software between dissimilar parallel computers. In general, we expect that more hardware-specific information will be included in software designs for parallel computers than in designs for sequential computers. This inclusion is an instance of portability being sacrificed for high performance. New parallel computers are being introduced frequently. Trying to keep one's software on the current high performance hardware, a software developer almost continually faces yet another expensive software transportation. The problem of the proposed research is to create a design methodology that helps designers to more precisely control both portability and hardware-specific programming details. The proposed research emphasizes programming for scientific applications. We completed our study of the parallelizability of a subsystem of the NASA Earth Radiation Budget Experiment (ERBE) data processing system. This work is summarized in section two. A more detailed description is provided in Appendix A ('Programming Practices to Support Eventual Parallelism'). Mr. Chrisman, a graduate student, wrote and successfully defended a Ph.D. dissertation proposal which describes our research associated with the issues of software portability and high performance. The list of research tasks are specified in the proposal. The proposal 'A Design Methodology for Portable Software on Parallel Computers' is summarized in section three and is provided in its entirety in Appendix B. We are currently studying a proposed subsystem of the NASA Clouds and the Earth's Radiant Energy System (CERES) data processing system. This software is the proof-of-concept for the Ph.D. dissertation. We have implemented and measured the performance of a portion of this subsystem on the Intel iPSC/2 parallel computer. These results are provided in section four. Our future work is summarized in section five, our acknowledgements are stated in section six, and references for published papers associated with NAG-1-995 are provided in section seven

    Runtime-aware architectures

    Get PDF
    In the last few years, the traditional ways to keep the increase of hardware performance to the rate predicted by the Moore’s Law have vanished. When uni-cores were the norm, hardware design was decoupled from the software stack thanks to a well defined Instruction Set Architecture (ISA). This simple interface allowed developing applications without worrying too much about the underlying hardware, while hardware designers were able to aggressively exploit instruction-level parallelism (ILP) in superscalar processors. Current multi-cores are designed as simple symmetric multiprocessors (SMP) on a chip. However, we believe that this is not enough to overcome all the problems that multi-cores face. The runtime system of the parallel programming model has to drive the design of future multi-cores to overcome the restrictions in terms of power, memory, programmability and resilience that multi-cores have. In the paper, we introduce an approach towards a Runtime-Aware Architecture (RAA), a massively parallel architecture designed from the runtime’s perspective.This work has been partially supported by the European Research Council under the European Union’s 7th FP, ERC Grant Agreement number 321253, by the Spanish Ministry of Science and Innovation under grant TIN2012-34557 and by the HiPEAC Network of Excellence. M. Moreto has been partially supported by the Ministry of Economy and Competitiveness under Juan de la Cierva postdoctoral fellowship number JCI- 2012-15047, and M. Casas is supported by the Secretary for Universities and Research of the Ministry of Economy and Knowledge of the Government of Catalonia and the Co-fund programme of the Marie Curie Actions of the 7th R&D Framework Programme of the European Union (Contract 2013 BP B 00243).Peer ReviewedPostprint (author's final draft

    セキュアRFIDタグチップの設計論

    Get PDF
    In this thesis, we focus on radio frequency identification (RFID) tag. We design, implement, and evaluate hardware performance of a secure tag that runs the authentication protocol based on cryptographic algorithms. The cryptographic algorithm and the pseudorandom number generator are required to be implemented in the tag. To realize the secure tag, we tackle the following four steps: (A) decision of hardware architecture for the authentication protocol, (B) selection of the cryptographic algorithm, (C) establishment of a pseudorandom number generating method, and (D) implementation and performance evaluation of a silicon chip on an RFID system.(A) The cryptographic algorithm and the pseudorandom number generator are repeatedly called for each authentication. Therefore, the impact of the time needed for the cryptographic processes on the hardware performance of the tag can be large. While low-area requirements have been mainly discussed in the previous studies, it is needed to discuss the hardware architecture for the authentication protocol from the viewpoint of the operating time. In this thesis, in order to decide the hardware architecture, we evaluate hardware performance in the sense of the operating time. As a result, the parallel architecture is suitable for hash functions that are widely used for tag authentication protocols.(B) A lot of cryptographic algorithms have been developed and hardware performance of the algorithms have been evaluated on different conditions. However, as the evaluation results depend on the conditions, it is hard to compare the previous results. In addition, the interface of the cryptographic circuits has not been paid attention. In this thesis, in order to select a cryptographic algorithm, we design the interface of the cryptographic circuits to meet with the tag, and evaluate hardware performance of the circuits on the same condition. As a result, the lightweight hash function SPONGENT-160 achieves well-balanced hardware performance.(C) Implementation of a pseudorandom number generator based on the performance evaluation results on (B) can be a method to generate pseudorandom number on the tag. On the other hand, as the cryptographic algorithm and the pseudorandom number generator are not used simultaneously on the authentication protocol. Therefore, if the cryptographic circuit could be used for pseudorandom number generation, the hardware resource on the tag can be exploited efficiently. In this thesis, we propose a pseudorandom number generating method using a hash function that is a cryptographic component of the authentication protocol. Through the evaluation of our proposed method, we establish a lightweight pseudorandom number generating method for the tag.(D) Tag authentication protocols using a cryptographic algorithm have been developed in the previous studies. However, hardware implementation and performance evaluation of a tag, which runs authentication processes, have not been studied. In this thesis, we design and do a single chip implementation of an analog front-end block and a digital processing block including the results on (A), (B), and (C). Then, we evaluate hardware performance of the tag. As a result, we show that a tag, which runs the authentication protocol based on cryptographic algorithms, is feasible.電気通信大学201

    Impliance: A Next Generation Information Management Appliance

    Full text link
    ably successful in building a large market and adapting to the changes of the last three decades, its impact on the broader market of information management is surprisingly limited. If we were to design an information management system from scratch, based upon today's requirements and hardware capabilities, would it look anything like today's database systems?" In this paper, we introduce Impliance, a next-generation information management system consisting of hardware and software components integrated to form an easy-to-administer appliance that can store, retrieve, and analyze all types of structured, semi-structured, and unstructured information. We first summarize the trends that will shape information management for the foreseeable future. Those trends imply three major requirements for Impliance: (1) to be able to store, manage, and uniformly query all data, not just structured records; (2) to be able to scale out as the volume of this data grows; and (3) to be simple and robust in operation. We then describe four key ideas that are uniquely combined in Impliance to address these requirements, namely the ideas of: (a) integrating software and off-the-shelf hardware into a generic information appliance; (b) automatically discovering, organizing, and managing all data - unstructured as well as structured - in a uniform way; (c) achieving scale-out by exploiting simple, massive parallel processing, and (d) virtualizing compute and storage resources to unify, simplify, and streamline the management of Impliance. Impliance is an ambitious, long-term effort to define simpler, more robust, and more scalable information systems for tomorrow's enterprises.Comment: This article is published under a Creative Commons License Agreement (http://creativecommons.org/licenses/by/2.5/.) You may copy, distribute, display, and perform the work, make derivative works and make commercial use of the work, but, you must attribute the work to the author and CIDR 2007. 3rd Biennial Conference on Innovative Data Systems Research (CIDR) January 710, 2007, Asilomar, California, US

    PYDAC: A DISTRIBUTED RUNTIME SYSTEM AND PROGRAMMING MODEL FOR A HETEROGENEOUS MANY-CORE ARCHITECTURE

    Get PDF
    Heterogeneous many-core architectures that consist of big, fast cores and small, energy-efficient cores are very promising for future high-performance computing (HPC) systems. These architectures offer a good balance between single-threaded perfor- mance and multithreaded throughput. Such systems impose challenges on the design of programming model and runtime system. Specifically, these challenges include (a) how to fully utilize the chip’s performance, (b) how to manage heterogeneous, un- reliable hardware resources, and (c) how to generate and manage a large amount of parallel tasks. This dissertation proposes and evaluates a Python-based programming framework called PyDac. PyDac supports a two-level programming model. At the high level, a programmer creates a very large number of tasks, using the divide-and-conquer strategy. At the low level, tasks are written in imperative programming style. The runtime system seamlessly manages the parallel tasks, system resilience, and inter- task communication with architecture support. PyDac has been implemented on both an field-programmable gate array (FPGA) emulation of an unconventional het- erogeneous architecture and a conventional multicore microprocessor. To evaluate the performance, resilience, and programmability of the proposed system, several micro-benchmarks were developed. We found that (a) the PyDac abstracts away task communication and achieves programmability, (b) the micro-benchmarks are scalable on the hardware prototype, but (predictably) serial operation limits some micro-benchmarks, and (c) the degree of protection versus speed could be varied in redundant threading that is transparent to programmers

    PaPILO: A Parallel Presolving Library for Integer and Linear Programming with Multiprecision Support

    Full text link
    Presolving has become an essential component of modern MIP solvers both in terms of computational performance and numerical robustness. In this paper, we present PaPILO, a new C++ header-only library that provides a large set of presolving routines for MIP and LP problems from the literature. The creation of PaPILO was motivated by the current lack of (a) solver-independent implementations that (b) exploit parallel hardware, and (c) support multiprecision arithmetic. Traditionally, presolving is designed to be fast. Whenever necessary, its low computational overhead is usually achieved by strict working limits. PaPILO's parallelization framework aims at reducing the computational overhead also when presolving is executed more aggressively or is applied to large-scale problems. To rule out conflicts between parallel presolve reductions, PaPILO uses a transaction-based design. This helps to avoid both the memory-intensive allocation of multiple copies of the problem and special synchronization between presolvers. Additionally, the use of Intel's TBB library aids PaPILO to efficiently exploit recursive parallelism within expensive presolving routines such as probing, dominated columns, or constraint sparsification. We provide an overview of PaPILO's capabilities and insights into important design choices

    Lattice Quantum Chromodynamics on Intel Xeon Phi based supercomputers

    Get PDF
    Preface The aim of this master\u2019s thesis project was to expand the QPhiX library for twisted-mass fermions with and without clover term. To this end, I continued work initiated by Mario Schr\uf6ck et al. [63]. In writing this thesis, I was following two main goals. Firstly, I wanted to stress the intricate interplay of the four pillars of High Performance Computing: Algorithms, Hardware, Software and Performance Evaluation. Surely, algorithmic development is utterly important in Scientific Computing, in particular in LQCD, where it even outweighed the improvements made in Hardware architecture in the last decade\u2014cf. the section about computational costs of LQCD. It is strongly influenced by the available hardware\u2014think of the advent of parallel algorithms\u2014but in turn also influenced the design of hardware itself. The IBM BlueGene series is only one of many examples in LQCD. Furthermore, there will be no benefit from the best algorithms, when one cannot implement the ideas into correct, performant, user-friendly, read- and maintainable (sometimes over several decades) software code. But again, truly outstanding HPC software cannot be written without a profound knowledge of its target hardware. Lastly, an HPC software architect and computational scientist has to be able to evaluate and benchmark the performance of a software program, in the often very heterogeneous environment of supercomputers with multiple software and hardware layers. My second goal in writing this thesis was to produce a self-contained introduction into the computational aspects of LQCD and in particular, to the features of QPhiX, so the reader would be able to compile, read and understand the code of one truly amazing pearl of HPC [40]. It is a pleasure to thank S. Cozzini, R. Frezzotti, E. Gregory, B. Jo\uf3, B. Kostrzewa, S. Krieg, T. Luu, G. Martinelli, R. Percacci, S. Simula, M. Ueding, C. Urbach, M. Werner, the Intel company for providing me with a copy of [55], and the J\ufclich Supercomputing Center for granting me access to their KNL test cluster DEE
    corecore