349 research outputs found

    On Real-Time AER 2-D Convolutions Hardware for Neuromorphic Spike-Based Cortical Processing

    Get PDF
    In this paper, a chip that performs real-time image convolutions with programmable kernels of arbitrary shape is presented. The chip is a first experimental prototype of reduced size to validate the implemented circuits and system level techniques. The convolution processing is based on the address–event-representation (AER) technique, which is a spike-based biologically inspired image and video representation technique that favors communication bandwidth for pixels with more information. As a first test prototype, a pixel array of 16x16 has been implemented with programmable kernel size of up to 16x16. The chip has been fabricated in a standard 0.35- m complimentary metal–oxide–semiconductor (CMOS) process. The technique also allows to process larger size images by assembling 2-D arrays of such chips. Pixel operation exploits low-power mixed analog–digital circuit techniques. Because of the low currents involved (down to nanoamperes or even picoamperes), an important amount of pixel area is devoted to mismatch calibration. The rest of the chip uses digital circuit techniques, both synchronous and asynchronous. The fabricated chip has been thoroughly tested, both at the pixel level and at the system level. Specific computer interfaces have been developed for generating AER streams from conventional computers and feeding them as inputs to the convolution chip, and for grabbing AER streams coming out of the convolution chip and storing and analyzing them on computers. Extensive experimental results are provided. At the end of this paper, we provide discussions and results on scaling up the approach for larger pixel arrays and multilayer cortical AER systems.Commission of the European Communities IST-2001-34124 (CAVIAR)Commission of the European Communities 216777 (NABAB)Ministerio de Educación y Ciencia TIC-2000-0406-P4Ministerio de Educación y Ciencia TIC-2003-08164-C03-01Ministerio de Educación y Ciencia TEC2006-11730-C03-01Junta de Andalucía TIC-141

    NASA Tech Briefs, March 2010

    Get PDF
    Topics covered include: Software Tool Integrating Data Flow Diagrams and Petri Nets; Adaptive Nulling for Interferometric Detection of Planets; Reducing the Volume of NASA Earth-Science Data; Reception of Multiple Telemetry Signals via One Dish Antenna; Space-Qualified Traveling-Wave Tube; Smart Power Supply for Battery-Powered Systems; Parallel Processing of Broad-Band PPM Signals; Inexpensive Implementation of Many Strain Gauges; Constant-Differential-Pressure Two-Fluid Accumulator; Inflatable Tubular Structures Rigidized with Foams; Power Generator with Thermo-Differential Modules; Mechanical Extraction of Power From Ocean Currents and Tides; Nitrous Oxide/Paraffin Hybrid Rocket Engines; Optimized Li-Ion Electrolytes Containing Fluorinated Ester Co-Solvents; Probabilistic Multi-Factor Interaction Model for Complex Material Behavior; Foldable Instrumented Bits for Ultrasonic/Sonic Penetrators; Compact Rare Earth Emitter Hollow Cathode; High-Precision Shape Control of In-Space Deployable Large Membrane/Thin-Shell Reflectors; Rapid Active Sampling Package; Miniature Lightweight Ion Pump; Cryogenic Transport of High-Pressure-System Recharge Gas; Water-Vapor Raman Lidar System Reaches Higher Altitude; Compact Ku-Band T/R Module for High-Resolution Radar Imaging of Cold Land Processes; Wide-Field-of-View, High-Resolution, Stereoscopic Imager; Electrical Capacitance Volume Tomography with High-Contrast Dielectrics; Wavefront Control and Image Restoration with Less Computing; Polarization Imaging Apparatus; Stereoscopic Machine-Vision System Using Projected Circles; Metal Vapor Arcing Risk Assessment Tool; Performance Bounds on Two Concatenated, Interleaved Codes; Parameterizing Coefficients of a POD-Based Dynamical System; Confidence-Based Feature Acquisition; Algorithm for Lossless Compression of Calibrated Hyperspectral Imagery; Universal Decoder for PPM of any Order; Algorithm for Stabilizing a POD-Based Dynamical System; Mission Reliability Estimation for Repairable Robot Teams; Processing AIRS Scientific Data Through Level 3; Web-Based Requesting and Scheduling Use of Facilities; AutoGen Version 5.0; Time-Tag Generation Script; PPM Receiver Implemented in Software; Tropospheric Emission Spectrometer Product File Readers; Reporting Differences Between Spacecraft Sequence Files; Coordinating "Execute" Data for ISS and Space Shuttle; Database for Safety-Oriented Tracking of Chemicals; Apparatus for Cold, Pressurized Biogeochemical Experiments; Growing B Lymphocytes in a Three-Dimensional Culture System; Tissue-like 3D Assemblies of Human Broncho-Epithelial Cells; Isolation of Resistance-Bearing Microorganisms; Oscillating Cell Culture Bioreactor; and Liquid Cooling/Warming Garment

    GPU data structures for graphics and vision

    Get PDF
    Graphics hardware has in recent years become increasingly programmable, and its programming APIs use the stream processor model to expose massive parallelization to the programmer. Unfortunately, the inherent restrictions of the stream processor model, used by the GPU in order to maintain high performance, often pose a problem in porting CPU algorithms for both video and volume processing to graphics hardware. Serial data dependencies which accelerate CPU processing are counterproductive for the data-parallel GPU. This thesis demonstrates new ways for tackling well-known problems of large scale video/volume analysis. In some instances, we enable processing on the restricted hardware model by re-introducing algorithms from early computer graphics research. On other occasions, we use newly discovered, hierarchical data structures to circumvent the random-access read/fixed write restriction that had previously kept sophisticated analysis algorithms from running solely on graphics hardware. For 3D processing, we apply known game graphics concepts such as mip-maps, projective texturing, and dependent texture lookups to show how video/volume processing can benefit algorithmically from being implemented in a graphics API. The novel GPU data structures provide drastically increased processing speed, and lift processing heavy operations to real-time performance levels, paving the way for new and interactive vision/graphics applications.Graphikhardware wurde in den letzen Jahren immer weiter programmierbar. Ihre APIs verwenden das Streamprozessor-Modell, um die massive Parallelisierung auch für den Programmierer verfügbar zu machen. Leider folgen aus dem strikten Streamprozessor-Modell, welches die GPU für ihre hohe Rechenleistung benötigt, auch Hindernisse in der Portierung von CPU-Algorithmen zur Video- und Volumenverarbeitung auf die GPU. Serielle Datenabhängigkeiten beschleunigen zwar CPU-Verarbeitung, sind aber für die daten-parallele GPU kontraproduktiv . Diese Arbeit präsentiert neue Herangehensweisen für bekannte Probleme der Video- und Volumensverarbeitung. Teilweise wird die Verarbeitung mit Hilfe von modifizierten Algorithmen aus der frühen Computergraphik-Forschung an das beschränkte Hardwaremodell angepasst. Anderswo helfen neu entdeckte, hierarchische Datenstrukturen beim Umgang mit den Schreibzugriff-Restriktionen die lange die Portierung von komplexeren Bildanalyseverfahren verhindert hatten. In der 3D-Verarbeitung nutzen wir bekannte Konzepte aus der Computerspielegraphik wie Mipmaps, projektive Texturierung, oder verkettete Texturzugriffe, und zeigen auf welche Vorteile die Video- und Volumenverarbeitung aus hardwarebeschleunigter Graphik-API-Implementation ziehen kann. Die präsentierten GPU-Datenstrukturen bieten drastisch schnellere Verarbeitung und heben rechenintensive Operationen auf Echtzeit-Niveau. Damit werden neue, interaktive Bildverarbeitungs- und Graphik-Anwendungen möglich

    Reconfigurable Computing For Video Coding

    Get PDF
    Video coding is widely used in our daily life. Due to its high computational complexity, hardware implementation is usually preferred. In this research, we investigate both ASIC hardware design approach and reconfigurable hardware design approach for video coding applications. First, we present a unified architecture that can perform Discrete Cosine Transform (DCT), Inverse Discrete Cosine Transform (IDCT), DCT domain motion estimation and compensation (DCT-ME/MC). Our proposed architecture is a Wavefront Array-based Processor with a highly modular structure consisting of 8*8 Processing Elements (PEs). By utilizing statistical properties and arithmetic operations, it can be used as a high performance hardware accelerator for video transcoding applications. We show how different core algorithms can be mapped onto the same hardware fabric and can be executed through the pre-defined PEs. In addition to the simplified design process of the proposed architecture and savings of the hardware resources, we also demonstrate that high throughput rate can be achieved for IDCT and DCT-MC by fully utilizing the sparseness property of DCT coefficient matrix. Compared to fixed hardware architecture using ASIC design approach, reconfigurable hardware design approach has higher flexibility, lower cost, and faster time-to-market. We propose a self-reconfigurable platform which can reconfigure the architecture of DCT computations during run-time using dynamic partial reconfiguration. The scalable architecture for DCT computations can compute different number of DCT coefficients in the zig-zag scan order to adapt to different requirements, such as power consumption, hardware resource, and performance. We propose a configuration manager which is implemented in the embedded processor in order to adaptively control the reconfiguration of scalable DCT architecture during run-time. In addition, we use LZSS algorithm for compression of the partial bitstreams and on-chip BlockRAM as a cache to reduce latency overhead for loading the partial bitstreams from the off-chip memory for run-time reconfiguration. A hardware module is designed for parallel reconfiguration of the partial bitstreams. The experimental results show that our approach can reduce the external memory accesses by 69% and can achieve 400 MBytes/s reconfiguration rate. Detailed trade-offs of power, throughput, and quality are investigated, and used as a criterion for self-reconfiguration. Prediction algorithm of zero quantized DCT (ZQDCT) to control the run-time reconfiguration of the proposed scalable architecture has been used, and 12 different modes of DCT computations including zonal coding, multi-block processing, and parallel-sequential stage modes are supported to reduce power consumptions, required hardware resources, and computation time with a small quality degradation. Detailed trade-offs of power, throughput, and quality are investigated, and used as a criterion for self-reconfiguration to meet the requirements set by the users

    Efficiently and Transparently Maintaining High SIMD Occupancy in the Presence of Wavefront Irregularity

    Get PDF
    Demand is increasing for high throughput processing of irregular streaming applications; examples of such applications from scientific and engineering domains include biological sequence alignment, network packet filtering, automated face detection, and big graph algorithms. With wide SIMD, lightweight threads, and low-cost thread-context switching, wide-SIMD architectures such as GPUs allow considerable flexibility in the way application work is assigned to threads. However, irregular applications are challenging to map efficiently onto wide SIMD because data-dependent filtering or replication of items creates an unpredictable data wavefront of items ready for further processing. Straightforward implementations of irregular applications on a wide-SIMD architecture are prone to load imbalance and reduced occupancy, while more sophisticated implementations require advanced use of parallel GPU operations to redistribute work efficiently among threads. This dissertation will present strategies for addressing the performance challenges of wavefront- irregular applications on wide-SIMD architectures. These strategies are embodied in a developer framework called Mercator that (1) allows developers to map irregular applications onto GPUs ac- cording to the streaming paradigm while abstracting from low-level data movement and (2) includes generalized techniques for transparently overcoming the obstacles to high throughput presented by wavefront-irregular applications on a GPU. Mercator forms the centerpiece of this dissertation, and we present its motivation, performance model, implementation, and extensions in this work

    Signal Subspace Processing in the Beam Space of a True Time Delay Beamformer Bank

    Get PDF
    A number of techniques for Radio Frequency (RF) source location for wide bandwidth signals have been described that utilize coherent signal subspace processing, but often suffer from limitations such as the requirement for preliminary source location estimation, the need to apply the technique iteratively, computational expense or others. This dissertation examines a method that performs subspace processing of the data from a bank of true time delay beamformers. The spatial diversity of the beamformer bank alleviates the need for a preliminary estimate while simultaneously reducing the dimensionality of subsequent signal subspace processing resulting in computational efficiency. The pointing direction of the true time delay beams is independent of frequency, which results in a mapping from element space to beam space that is wide bandwidth in nature. This dissertation reviews previous methods, introduces the present method, presents simulation results that demonstrate the assertions, discusses an analysis of performance in relation to the Cramer-Rao Lower Bound (CRLB) with various levels of noise in the system, and discusses computational efficiency. One limitation of the method is that in practice it may be appropriate for systems that can tolerate a limited field of view. The application of Electronic Intelligence is one such application. This application is discussed as one that is appropriate for a method exhibiting high resolution of very wide bandwidth closely spaced sources and often does not require a wide field of view. In relation to system applications, this dissertation also discusses practical employment of the novel method in terms of antenna elements, arrays, platforms, engagement geometries, and other parameters. The true time delay beam space method is shown through modeling and simulation to be capable of resolving closely spaced very wideband sources over a relevant field of view in a single algorithmic pass, requiring no course preliminary estimation, and exhibiting low computational expense superior to many previous wideband coherent integration techniques

    Signal Subspace Processing in the Beam Space of a True Time Delay Beamformer Bank

    Get PDF
    A number of techniques for Radio Frequency (RF) source location for wide bandwidth signals have been described that utilize coherent signal subspace processing, but often suffer from limitations such as the requirement for preliminary source location estimation, the need to apply the technique iteratively, computational expense or others. This dissertation examines a method that performs subspace processing of the data from a bank of true time delay beamformers. The spatial diversity of the beamformer bank alleviates the need for a preliminary estimate while simultaneously reducing the dimensionality of subsequent signal subspace processing resulting in computational efficiency. The pointing direction of the true time delay beams is independent of frequency, which results in a mapping from element space to beam space that is wide bandwidth in nature. This dissertation reviews previous methods, introduces the present method, presents simulation results that demonstrate the assertions, discusses an analysis of performance in relation to the Cramer-Rao Lower Bound (CRLB) with various levels of noise in the system, and discusses computational efficiency. One limitation of the method is that in practice it may be appropriate for systems that can tolerate a limited field of view. The application of Electronic Intelligence is one such application. This application is discussed as one that is appropriate for a method exhibiting high resolution of very wide bandwidth closely spaced sources and often does not require a wide field of view. In relation to system applications, this dissertation also discusses practical employment of the novel method in terms of antenna elements, arrays, platforms, engagement geometries, and other parameters. The true time delay beam space method is shown through modeling and simulation to be capable of resolving closely spaced very wideband sources over a relevant field of view in a single algorithmic pass, requiring no course preliminary estimation, and exhibiting low computational expense superior to many previous wideband coherent integration techniques

    NASA Tech Briefs, September 2011

    Get PDF
    Topics covered include: Fused Reality for Enhanced Flight Test Capabilities; Thermography to Inspect Insulation of Large Cryogenic Tanks; Crush Test Abuse Stand; Test Generator for MATLAB Simulations; Dynamic Monitoring of Cleanroom Fallout Using an Air Particle Counter; Enhancement to Non-Contacting Stress Measurement of Blade Vibration Frequency; Positively Verifying Mating of Previously Unverifiable Flight Connectors; Radiation-Tolerant Intelligent Memory Stack - RTIMS; Ultra-Low-Dropout Linear Regulator; Excitation of a Parallel Plate Waveguide by an Array of Rectangular Waveguides; FPGA for Power Control of MSL Avionics; UAVSAR Active Electronically Scanned Array; Lockout/Tagout (LOTO) Simulator; Silicon Carbide Mounts for Fabry-Perot Interferometers; Measuring the In-Process Figure, Final Prescription, and System Alignment of Large; Optics and Segmented Mirrors Using Lidar Metrology; Fiber-Reinforced Reactive Nano-Epoxy Composites; Polymerization Initiated at the Sidewalls of Carbon Nanotubes; Metal-Matrix/Hollow-Ceramic-Sphere Composites; Piezoelectrically Enhanced Photocathodes; Iridium-Doped Ruthenium Oxide Catalyst for Oxygen Evolution; Improved Mo-Re VPS Alloys for High-Temperature Uses; Data Service Provider Cost Estimation Tool; Hybrid Power Management-Based Vehicle Architecture; Force Limit System; Levitated Duct Fan (LDF) Aircraft Auxiliary Generator; Compact, Two-Sided Structural Cold Plate Configuration; AN Fitting Reconditioning Tool; Active Response Gravity Offload System; Method and Apparatus for Forming Nanodroplets; Rapid Detection of the Varicella Zoster Virus in Saliva; Improved Devices for Collecting Sweat for Chemical Analysis; Phase-Controlled Magnetic Mirror for Wavefront Correction; and Frame-Transfer Gating Raman Spectroscopy for Time-Resolved Multiscalar Combustion Diagnostics

    NASA Tech Briefs, June 1996

    Get PDF
    Topics: New Computer Hardware; Electronic Components and Circuits; Electronic Systems; Physical Sciences; Materials; Computer Programs; Mechanics; Machinery/Automation; Manufacturing/Fabrication; Mathematics and Information Sciences;Books and Reports
    corecore