406 research outputs found

    Light field super resolution through controlled micro-shifts of light field sensor

    Get PDF
    Light field cameras enable new capabilities, such as post-capture refocusing and aperture control, through capturing directional and spatial distribution of light rays in space. Micro-lens array based light field camera design is often preferred due to its light transmission efficiency, cost-effectiveness and compactness. One drawback of the micro-lens array based light field cameras is low spatial resolution due to the fact that a single sensor is shared to capture both spatial and angular information. To address the low spatial resolution issue, we present a light field imaging approach, where multiple light fields are captured and fused to improve the spatial resolution. For each capture, the light field sensor is shifted by a pre-determined fraction of a micro-lens size using an XY translation stage for optimal performance

    Compiling Programs for Nonshared Memory Machines

    Get PDF
    Nonshared-memory parallel computers promise scalable performance for scientific computing needs. Unfortunately, these machines are now difficult to program because the message-passing languages available for them do not reflect the computational models used in designing algorithms. This introduces a semantic gap in the programming process which is difficult for the programmer to fill. The purpose of this research is to show how nonshared-memory machines can be programmed at a higher level than is currently possible. We do this by developing techniques for compiling shared-memory programs for execution on those architectures. The heart of the compilation process is translating references to shared memory into explicit messages between processors. To do this, we first define a formal model for distribution data structures across processor memories. Several abstract results describing the messages needed to execute a program are immediately derived from this formalism. We then develop two distinct forms of analysis to translate these formulas into actual programs. Compile-time analysis is used when enough information is available to the compiler to completely characterize the data sent in the messages. This allows excellent code to be generated for a program. Run-time analysis produces code to examine data references while the program is running. This allows dynamic generation of messages and a correct implementation of the program. While the over-head of the run-time approach is higher than the compile-time approach, run-time analysis is applicable to any program. Performance data from an initial implementation show that both approaches are practical and produce code with acceptable efficiency

    DCT Implementation on GPU

    Get PDF
    There has been a great progress in the field of graphics processors. Since, there is no rise in the speed of the normal CPU processors; Designers are coming up with multi-core, parallel processors. Because of their popularity in parallel processing, GPUs are becoming more and more attractive for many applications. With the increasing demand in utilizing GPUs, there is a great need to develop operating systems that handle the GPU to full capacity. GPUs offer a very efficient environment for many image processing applications. This thesis explores the processing power of GPUs for digital image compression using Discrete cosine transform

    The Fifth NASA Symposium on VLSI Design

    Get PDF
    The fifth annual NASA Symposium on VLSI Design had 13 sessions including Radiation Effects, Architectures, Mixed Signal, Design Techniques, Fault Testing, Synthesis, Signal Processing, and other Featured Presentations. The symposium provides insights into developments in VLSI and digital systems which can be used to increase data systems performance. The presentations share insights into next generation advances that will serve as a basis for future VLSI design

    A Comprehensive Methodology for Algorithm Characterization, Regularization and Mapping Into Optimal VLSI Arrays.

    Get PDF
    This dissertation provides a fairly comprehensive treatment of a broad class of algorithms as it pertains to systolic implementation. We describe some formal algorithmic transformations that can be utilized to map regular and some irregular compute-bound algorithms into the best fit time-optimal systolic architectures. The resulted architectures can be one-dimensional, two-dimensional, three-dimensional or nonplanar. The methodology detailed in the dissertation employs, like other methods, the concept of dependence vector to order, in space and time, the index points representing the algorithm. However, by differentiating between two types of dependence vectors, the ordering procedure is allowed to be flexible and time optimal. Furthermore, unlike other methodologies, the approach reported here does not put constraints on the topology or dimensionality of the target architecture. The ordered index points are represented by nodes in a diagram called Systolic Precedence Diagram (SPD). The SPD is a form of precedence graph that takes into account the systolic operation requirements of strictly local communications and regular data flow. Therefore, any algorithm with variable dependence vectors has to be transformed into a regular indexed set of computations with local dependencies. This can be done by replacing variable dependence vectors with sets of fixed dependence vectors. The SPD is transformed into an acyclic, labeled, directed graph called the Systolic Directed Graph (SDG). The SDG models the data flow as well as the timing for the execution of the given algorithm on a time-optimal array. The target architectures are obtained by projecting the SDG along defined directions. If more than one valid projection direction exists, different designs are obtained. The resulting architectures are then evaluated to determine if an improvement in the performance can be achieved by increasing PE fan-out. If so, the methodology provides the corresponding systolic implementation. By employing a new graph transformation, the SDG is manipulated so that it can be mapped into fixed-size and fixed-depth multi-linear arrays. The latter is a new concept of systolic arrays that is adaptable to changes in the state of technology. It promises a bonded clock skew, higher throughput and better performance than the linear implementation

    Evaluation of Design Tools for Rapid Prototyping of Parallel Signal Processing Algorithms

    Get PDF
    Digital signal processing (DSP) has become a popular method for handling not only signal processing, but communications, and control system applications. A DSP application of interest to the Air Force is high speed avionics processing. The real time computing requirements of avionics processing exceed the capabilities of current single chip DSP processors, and parallelization of multiple DSP processors is a solution to handle such requirements. Designing and implementing a parallel DSP algorithm has been a lengthy process often requiring different design tools and extensive programming experience. Through the use of integrated software development tools, rapid prototyping becomes possible by simulating algorithms, generating code for workstations or DSP microprocessors, and generating hardware description language code for hardware synthesis. This research examines the use of one such tool, the Signal Processing WorkSystem (SPW) by the Alta Group of Cadence Design Systems, Inc., and how SPW supports the rapid prototyping process from an avionics algorithm design through simulation and hardware implementation. Throughout this process, SPW is evaluated as an aid to the avionics designer to meet design objectives and evaluate tradeoffs to find the best blend of efficiency and effectiveness. By designing a two dimensional fast Fourier transform algorithm as a specific avionics algorithm and exploring implementation options, SPW is shown to be a viable rapid prototyping solution allowing an avionics designer to focus on design trade-offs instead of implementation details while using parallelization to meet real-time application requirements

    Multipurpose Programmable Integrated Photonics: Principles and Applications

    Full text link
    [ES] En los últimos años, la fotónica integrada programable ha evolucionado desde considerarse un paradigma nuevo y prometedor para implementar la fotónica a una escala más amplia hacia convertirse una realidad sólida y revolucionaria, capturando la atención de numerosos grupos de investigación e industrias. Basada en el mismo fundamento teórico que las matrices de puertas lógicas programables en campo (o FPGAs, en inglés), esta tecnología se sustenta en la disposición bidimensional de bloques unitarios de lógica programable (en inglés: PUCs) que -mediante una programación adecuada de sus actuadores de fase- pueden implementar una gran variedad de funcionalidades que pueden ser elaboradas para operaciones básicas o más complejas en muchos campos de aplicación como la inteligencia artificial, el aprendizaje profundo, los sistemas de información cuántica, las telecomunicaciones 5/6-G, en redes de conmutación, formando interconexiones en centros de datos, en la aceleración de hardware o en sistemas de detección, entre otros. En este trabajo, nos dedicaremos a explorar varias aplicaciones software de estos procesadores en diferentes diseños de chips. Exploraremos diferentes enfoques de vanguardia basados en la optimización computacional y la teoría de grafos para controlar y configurar con precisión estos dispositivos. Uno de estos enfoques, la autoconfiguración, consiste en la síntesis automática de circuitos ópticos -incluso en presencia de efectos parasitarios como distribuciones de pérdidas no uniformes a lo largo del diseño hardware, o bajo interferencias ópticas y eléctricas- sin conocimiento previo sobre el estado del dispositivo. Hay ocasiones, sin embargo, en las que el acceso a esta información puede ser útil. Las herramientas de autocalibración y autocaracterización nos permiten realizar una comprobación rápida del estado de nuestro procesador fotónico, lo que nos permite extraer información útil como la corriente eléctrica que suministrar a cada actuador de fase para cambiar el estado de su PUC correspondiente, o las pérdidas de inserción de cada unidad programable y de las interconexiones ópticas que rodean a la estructura. Estos mecanismos no solo nos permiten identificar rápidamente cualquier PUC o región del chip defectuosa en nuestro diseño, sino que también revelan otra alternativa para programar circuitos fotónicos en nuestro diseño a partir de valores de corriente predefinidos. Estas estrategias constituyen un paso significativo para aprovechar todo el potencial de estos dispositivos. Proporcionan soluciones para manejar cientos de variables y gestionar simultáneamente múltiples acciones de configuración, una de las principales limitaciones que impiden que esta tecnología se extienda y se convierta en disruptiva en los próximos años.[CA] En els darrers anys, la fotònica integrada programable ha evolucionat des de considerarse un paradigma nou i prometedor per implementar la fotònica a una escala més ampla cap a convertir-se en una realitat sòlida i revolucionària, capturant l'atenció de nombrosos grups d'investigaciò i indústries. Basada en el mateix fonament teòric que les matrius de portes lògiques programable en camp (o FPGAs, en anglès), aquesta tecnología es sustenta en la disposición bidimensional de blocs units lògics programables (en anglès: PUCs) que -mitjançant una programación adequada dels seus actuadors de fase- poden implementar una gran varietat de funcionalitats que poden ser elaborades per a operacions bàsiques o més complexes en molts camps d'aplicació com la intel·ligència artificial, l'aprenentatge profund, els sistemes d'informació quàntica, les telecomunicacions 5/6-G, en xarxes de comutació, formant interconnexions en centres de dades, en l'acceleració de hardware o en sistemes de detecció, entre d'altres. En aquest treball, ens dedicarem a explorar diverses capatitats de programari d'aquests processadors en diferents dissenys de xips. Explorem diferents enfocaments de vanguardia basats en l'optimització computacional i la teoría de grafs per controlar i configurar amb precisió aquests dispositius. Un d'aquests enfocaments, l'autoconfiguració, tracta de la síntesi automática de circuits òptics -fins i tot en presencia d'efectes parasitaris com ara pèrdues no uniformes o crosstalk òptic i elèctric- sense cap coneixement previ sobre l'estat del dispositiu. Tanmateix, hi ha ocasions en les quals l'accés a aquesta información pot ser útil. Les eines d'autocalibració i autocaracterització ens permeten realizar una comprovació ràpida de l'estat del nostre procesador fotònic, el que ens permet obtener informació útil com la corrent eléctrica necessària per alimentar cada actuador de fase per canviar l'estat del seu PUC corresponent o la pèrdua d'inserció de cada unitat programable i de les interconnexions òptiques que envolten l'estructura. Aquests mecanisms no només ens permeten identificar ràpidament qualsevol PUC o área del xip defectuosa en el nostre disseny , sinó que també ens mostren una altra alternativa per programar circuits fotònics en el nostre disseny a partir de valors de corrent predefinits. Aquestes estratègies constitueixen un pas gegant per a aprofitar tot el potencial d'aquests dispositius. Proporcionen solucions per a gestionar centenars de variables i alhora administrar múltiples accions de configuració, una de les principals limitacions que impideixen que aquesta tecnología esdevingui disruptiva en els pròxims anys.[EN] In recent years, programmable integrated photonics (PIP) has evolved from a promising, new paradigm to deploy photonics to a larger scale to a solid, revolutionary reality, bringing up the attention of numerous research and industry players. Based on the same theoretical foundations than field-programmable gate arrays (FPGAs), this technology relies on common, two-dimensional integrated optical hardware configurations based on the interconnection of programmable unit cells (PUCs), which -by suitable programming of their phase actuators- can implement a variety of functionalities that can be elaborated for basic or more complex operation in many application fields, such as artificial intelligence, deep learning, quantum information systems, 5/6-G telecommunications, switching, data center interconnections, hardware acceleration and sensing, amongst others. In this work, we will dedicate ourselves to explore several software capabilities of these processors under different chip designs. We explore different cutting-edge approaches based on computational optimization and graph theory to precisely control and configure these devices. One of these, self-configuration, deals with the automated synthesis of optical circuit configurations -even in presence of parasitic effects such as nonuniform losses, optical and electrical crosstalk- without any need for prior knowledge about hardware state. There are occasions, though, in which accessing to this information may be of use. Self-calibration and self-characterization tools allow us to perform a quick check to our photonic processor's status, allowing us to retrieve useful pieces of information such as the electrical current needed to supply to each phase actuator to change its corresponding PUC state arbitrarily or the insertion loss of every unit cell and optical interconnection surrounding the structure. These mechanisms not only allow us to quickly identify any malfunctioning PUCs or chip areas in our design, but also reveal another alternative to program photonic circuits in our design from current pre-sets. These strategies constitute a gigantic step to unleash all the potential of these devices. They provide solutions to handle with hundreds of variables and simultaneously manage multiple configuration actions, one of the main limitations that prevent this technology to scale up and become disruptive in the years to come.López Hernández, A. (2023). Multipurpose Programmable Integrated Photonics: Principles and Applications [Tesis doctoral]. Universitat Politècnica de València. https://doi.org/10.4995/Thesis/10251/19686

    Controlling Ionic Transport in RRAM for Memory and Neuromorphic Computing Applications

    Full text link
    Resistive random-access memory, based on a simple two-terminal device structure, has attracted tremendous interest recently for applications ranging from non-volatile data storage to neuromorphic computing. Resistive switching (RS) effects in RRAM devices originate from internal, microscopic ionic migration and the associated electrochemical processes which modify the materials’ chemical composition and subsequently their electrical and other physical properties. Therefore, controlling the internal ionic transport and redox reaction processes, ideally at the atomic scale, is necessary to optimize the device performance for practical applications with large-size arrays. In this thesis we present our efforts in understanding and controlling the ionic processes in RRAM devices. This thesis presents a comprehensive study on the fundamental understanding on physical mechanism of the ionic processes and the optimization of materials and device structures to achieve desirable device performance based on theoretical calculations and experimental engineering. First, I investigate the electronic structure of Ta2O5 polymorphs, a resistive switching material, and the formation and interaction of oxygen vacancies in amorphous Ta2O5, an important mobile defect responsible for the resistive switching process, using first-principles calculations. Based on the understanding of the fundamental properties of the switching material and the defect, we perform detailed theoretical and experimental analyses that reveal the dynamic vacancy charge transition processes, further helping the design and optimization of the oxide-based RRAM devices. Next, we develop a novel structure including engineered nanoporous graphene to control the internal ionic transport and redox reaction processes at the atomic level, leading to improved device performance. We demonstrate that the RS characteristics can be systematically tuned by inserting a graphene layer with engineered nanopores at a vacancy-exchange interface. The amount of vacancies injected in the switching layer and the size of the conducting filaments can be effectively controlled by the graphene layer working as an atomically-thin ion-blocking material in which ionic transports/reactions are allowed only through the engineered nanosized openings. Lastly, better incremental switching characteristics with improved linearity are obtained through optimization of the switching material density. These improvements allow us to build RRAM crossbar networks for data clustering analysis through unsupervised, online learning in both neuromorphic applications and arithmetic applications in which accurate vector-matrix multiplications are required. We expect the optimization approaches and the optimized devices can be used in other machine learning and arithmetic computing systems, and broaden the range of problems RRAM based network can solve.PHDMaterials Science and EngineeringUniversity of Michigan, Horace H. Rackham School of Graduate Studieshttps://deepblue.lib.umich.edu/bitstream/2027.42/146119/1/jihang_1.pd

    An instruction systolic array architecture for multiple neural network types

    Get PDF
    Modern electronic systems, especially sensor and imaging systems, are beginning to incorporate their own neural network subsystems. In order for these neural systems to learn in real-time they must be implemented using VLSI technology, with as much of the learning processes incorporated on-chip as is possible. The majority of current VLSI implementations literally implement a series of neural processing cells, which can be connected together in an arbitrary fashion. Many do not perform the entire neural learning process on-chip, instead relying on other external systems to carry out part of the computation requirements of the algorithm. The work presented here utilises two dimensional instruction systolic arrays in an attempt to define a general neural architecture which is closer to the biological basis of neural networks - it is the synapses themselves, rather than the neurons, that have dedicated processing units. A unified architecture is described which can be programmed at the microcode level in order to facilitate the processing of multiple neural network types. An essential part of neural network processing is the neuron activation function, which can range from a sequential algorithm to a discrete mathematical expression. The architecture presented can easily carry out the sequential functions, and introduces a fast method of mathematical approximation for the more complex functions. This can be evaluated on-chip, thus implementing the entire neural process within a single system. VHDL circuit descriptions for the chip have been generated, and the systolic processing algorithms and associated microcode instruction set for three different neural paradigms have been designed. A software simulator of the architecture has been written, giving results for several common applications in the field
    corecore