Search CORE

11 research outputs found

Recommended from our members

Scalable Emulation of Heterogeneous Systems

Author: Garcia Cota Emilio
Publication venue: 'Columbia University Libraries/Information Services'
Publication date: 01/01/2019
Field of study

The breakdown of Dennard's transistor scaling has driven computing systems toward application-specific accelerators, which can provide orders-of-magnitude improvements in performance and energy efficiency over general-purpose processors. To enable the radical departures from conventional approaches that heterogeneous systems entail, research infrastructure must be able to model processors, memory and accelerators, as well as system-level changes---such as operating system or instruction set architecture (ISA) innovations---that might be needed to realize the accelerators' potential. Unfortunately, existing simulation tools that can support such system-level research are limited by the lack of fast, scalable machine emulators to drive execution. To fill this need, in this dissertation we first present a novel machine emulator design based on dynamic binary translation that makes the following improvements over the state of the art: it scales on multicore hosts while remaining memory efficient, correctly handles cross-ISA differences in atomic instruction semantics, leverages the host floating point (FP) unit to speed up FP emulation without sacrificing correctness, and can be efficiently instrumented to---among other possible uses---drive the execution of a full-system, cross-ISA simulator with support for accelerators. We then demonstrate the utility of machine emulation for studying heterogeneous systems by leveraging it to make two additional contributions. First, we quantify the trade-offs in different coupling models for on-chip accelerators. Second, we present a technique to reuse the private memories of on-chip accelerators when they are otherwise inactive to expand the system's last-level cache, thereby reducing the opportunity cost of the accelerators' integration

Columbia University Academic Commons

Adaptive Distributed Architectures for Future Semiconductor Technologies.

Author: Pellegrini Andrea
Publication venue
Publication date: 01/01/2013
Field of study

Year after year semiconductor manufacturing has been able to integrate more components in a single computer chip. These improvements have been possible through systematic shrinking in the size of its basic computational element, the transistor. This trend has allowed computers to progressively become faster, more efficient and less expensive. As this trend continues, experts foresee that current computer designs will face new challenges, in utilizing the minuscule devices made available by future semiconductor technologies. Today's microprocessor designs are not fit to overcome these challenges, since they are constrained by their inability to handle component failures by their lack of adaptability to a wide range of custom modules optimized for specific applications and by their limited design modularity. The focus of this thesis is to develop original computer architectures, that can not only survive these new challenges, but also leverage the vast number of transistors available to unlock better performance and efficiency. The work explores and evaluates new software and hardware techniques to enable the development of novel adaptive and modular computer designs. The thesis first explores an infrastructure to quantitatively assess the fallacies of current systems and their inadequacy to operate on unreliable silicon. In light of these findings, specific solutions are then proposed to strengthen digital system architectures, both through hardware and software techniques. The thesis culminates with the proposal of a radically new architecture design that can fully adapt dynamically to operate on the hardware resources available on chip, however limited or abundant those may be.PHDComputer Science and EngineeringUniversity of Michigan, Horace H. Rackham School of Graduate Studieshttp://deepblue.lib.umich.edu/bitstream/2027.42/102405/1/apellegr_1.pd

Deep Blue Documents at the University of Michigan

Verkkoliikenteen hajauttaminen rinnakkaisprosessoitavaksi ohjelmoitavan piirin avulla

Author: Korpinen Pekka
Publication venue: Teknillinen korkeakoulu
Publication date: 01/01/2009
Field of study

The expanding diversity and amount of traffic in the Internet requires increasingly higher performing devices for protecting our networks against malicious activities. The computational load of these devices may be divided over multiple processing nodes operating in parallel to reduce the computation load of a single node. However, this requires a dedicated controller that can distribute the traffic to and from the nodes at wire-speed. This thesis concentrates on the system topologies and on the implementation aspects of the controller. A field-programmable gate array (FPGA) device, based on a reconfigurable logic array, is used for implementation because of its integrated circuit like performance and high-grain programmability. Two hardware implementations were developed; a straightforward design for 1-gigabit Ethernet, and a modular, highly parameterizable design for 10-gigabit Ethernet. The designs were verified by simulations and synthesizable testbenches. The designs were synthesized on different FPGA devices while varying parameters to analyze the achieved performance. High-end FPGA devices, such as Altera Stratix family, met the target processing speed of 10-gigabit Ethernet. The measurements show that the controller's latency is comparable to a typical switch. The results confirm that reconfigurable hardware is the proper platform for low-level network processing where the performance is prioritized over other features. The designed architecture is versatile and adaptable to applications expecting similar characteristics.Internetin edelleen lisääntyvä ja monipuolistuva liikenne vaatii entistä tehokkaampia laitteita suojaamaan tietoliikenneverkkoja tunkeutumisia vastaan. Tietoliikennelaitteiden kuormaa voidaan jakaa rinnakkaisille yksiköille, jolloin yksittäisen laitteen kuorma pienenee. Tämä kuitenkin vaatii erityisen kontrolloijan, joka kykenee hajauttamaan liikennettä yksiköille linjanopeudella. Tämä tutkimus keskittyy em. kontrolloijan järjestelmätopologioiden tutkimiseen sekä kontrolloijan toteuttamiseen ohjelmoitavalla piirillä, kuten kenttäohjelmoitava järjestelmäpiiri (eng. field programmable gate-array, FPGA). Kontrolloijasta tehtiin yksinkertainen toteutus 1-gigabitin Ethernet-verkkoihin sekä modulaarinen ja parametrisoitu toteutus 10-gigabitin Ethernet-verkkoihin. Toteutukset verifioitiin simuloimalla sekä käyttämällä syntetisoituvia testirakenteita. Toteutukset syntetisoitiin eri FPGA-piireille vaihtelemalla samalla myös toteutuksen parametrejä. Tehokkaimmat FPGA-piirit, kuten Altera Stratix -piirit, saavuttivat 10-gigabitin prosessointivaatimukset. Mittaustulokset osoittavat, että kontrollerin vasteaika ei poikkea tavallisesta verkkokytkimestä. Työn tulokset vahvistavat käsitystä, että ohjelmoitavat piirit soveltuvat hyvin verkkoliikenteen matalantason prosessointiin, missä vaaditaan ensisijaisesti suorituskykyä. Suunniteltu arkkitehtuuri on monipuolinen ja soveltuu joustavuutensa ansiosta muihin samantyyppiseen sovelluksiin

Aaltodoc Publication Archive

Topical Workshop on Electronics for Particle Physics

Author: Claude Sandra
Publication venue: CERN
Publication date: 01/01/2007
Field of study

CERN Document Server

Diseño CMOS de un sistema de visión “on-chip” para aplicaciones de muy alta velocidad

Author: Jiménez Garrido Francisco José
Publication venue
Publication date: 08/02/2016
Field of study

Falta palabras claveEsta Tesis presenta arquitecturas, circuitos y chips para el diseño de sensores de visión CMOS con procesamiento paralelo embebido. La Tesis reporta dos chips, en concreto: El chip Q-Eye; El chip Eye-RIS_VSoC.. Y dos sistemas de visión construidos con estos chips y otros sistemas “off-chip” adicionales, como FPGAs, en concreto: El sistema Eye-RIS_v1; El sistema Eye-RIS_v2. Estos chips y sistemas están concebidos para ejecutar tareas de visión a muy alta velocidad y con consumos de potencia moderados. Los sistemas resultantes son, además, compactos y por lo tanto ventajosos en términos del factor SWaP cuando se los compara con arquitecturas convencionales formadas por sensores de imágenes convencionales seguidos de procesadores digitales. La clave de estas ventajas en términos de SWaP y velocidad radica en el uso de sensores-procesadores, en lugar de meros sensores, en la interface de los sistemas de visión. Estos sensores-procesadores embeben procesadores programables de señal-mixta dentro del pixel y son capaces tanto de adquirir imágenes como de pre-procesarlas para extraer características, eliminar información redundante y reducir el número de datos que se transmiten fuera del sensor para su procesamiento ulterior. El núcleo de la tesis es el sensor-procesador Q-Eye, que se usa como interface en los sistemas Eye-RIS. Este sensor-procesador embebe una arquitectura de procesamiento formada por procesadores de señal-mixta distribuidos por pixel. Sus píxeles son por tanto estructuras multi-funcionales complejas. De hecho, son programables, incorporan memorias e interactúan con sus vecinos para realizar una variedad de operaciones, tales como: Convoluciones lineales con máscaras programables; Difusiones controladas por tiempo y nivel de señal, a través de un “grid” resistivo embebido en el plano focal; Aritmética de imágenes; Flujo de programación dependiente de la señal; Conversión entre los dominios de datos: imagen en escala de grises e imagen binaria; Operaciones lógicas en imágenes binarias; Operaciones morfológicas en imágenes binarias. etc. Con respecto a otros píxeles multi-función y sensores-procesadores anteriores, el Q-Eye reporta entre otras las siguientes ventajas: Mayor calidad de la imagen y mejores prestaciones de las funcionalidades embebidas en el chip; Mayor velocidad de operación y mejor gestión de la energía disponible; Mayor versatilidad para integración en sistemas de visión industrial. De hecho, los sistemas Eye-RIS son los primeros sistemas de visión industriales dotados de las siguientes características: Procesamiento paralelo distribuido y progresivo; Procesadores de señal-mixta fiables, robustos y con errores controlados; Programabilidad distribuida. La Tesis incluye descripciones detalladas de la arquitectura y los circuitos usados en el pixel del Q-Eye, del propio chip Q-Eye y de los sistemas de visión construidos en base a este chip. Se incluyen también ejemplos de los distintos chips en operaciónThis Thesis presents architectures, circuits and chips for the implementation of CMOS VISION SENSORS with embedded parallel processing. The Thesis reports two chips, namely: Q-eye chip; Eye-RIS_VSoC chip, and two vision systems realized by using these chips and some additional “off-chip” circuitry, such as FPGAs. These vision systems are: Eye-RIS_v1 system; Eye-RIS_v2 system. The chips and systems reported in the Thesis are conceived to perform vision tasks at very high speed and with moderate power consumption. The proposed vision systems are also compact and advantageous in terms of SWaP factors as compared with conventional architectures consisting of standard image sensor followed by digital processors. The key of these advantages in terms of SWaP and speed lies in the use of sensors-processors, rather than mere sensors, in the front-end interface of vision systems. These sensors-processors embed mixed-signal programmable processors inside the pixel. Therefore, they are able to acquire images and process them to extract the features, removing the redundant information and reducing the data throughput for later processing. The core of the Thesis is the sensor-processor Q-Eye, which is used as front-end in the Eye-RIS systems. This sensor-processor embeds a processing architecture composed by mixed-signal processors distributed per pixel. Then, its pixels are complex multi-functional structures. In fact, they are programmable, incorporate memories and interact with its neighbors in order to carry out a set of operations, including: Linear convolutions with programmable linear masks; Time- and signal-controlled diffusions (by means of an embedded resistive grid); Image arithmetic; Signal-dependent data scheduling; Gray-scale to binary transformation; Logic operation on binary images; Mathematical morphology on binary images, etc. As compared with previous multi-function pixels and sensors-processors, the Q-Eye brings among other the following advantages: Higher image quality and better performances of functionalities embedded on chip; Higher operation speed and better management of energy budget; More versatility for integration in industrial vision systems. In fact, the Eye-RIS systems are the first industrial vision systems equipped with the following characteristics: Parallel distributed and progressive processing; Reliable, robust mixed-signal processors with handled errors; Distributed programmability. This Thesis includes detailed descriptions of architecture and circuits used in the Q-Eye pixel, in the Q-Eye chip itself and in the vision systems developed based on this chip. Also, several examples of chips and systems in operation are presented

idUS. Depósito de Investigación Universidad de Sevilla

Topical Workshop on Electronics for Particle Physics

Author: Dho Evelyne
Vasey François
Publication venue: CERN
Publication date: 01/01/2008
Field of study

The purpose of the workshop was to present results and original concepts for electronics research and development relevant to particle physics experiments as well as accelerator and beam instrumentation at future facilities; to review the status of electronics for the LHC experiments; to identify and encourage common efforts for the development of electronics; and to promote information exchange and collaboration in the relevant engineering and physics communities

CERN Document Server

Architectures and Design of VLSI Machine Learning Systems

Author: Wang Qian
Publication venue
Publication date: 22/09/2016
Field of study

Quintillions of bytes of data are generated every day in this era of big data. Machine learning techniques are utilized to perform predictive analysis on these data, to reveal hidden relationships and dependencies and perform predictions of outcomes and behaviors. The obtained predictive models are used to interpret the existing data and predict new data information. Nowadays, most machine learning algorithms are realized by software programs running on general-purpose processors, which usually takes a huge amount of CPU time and introduces unbelievably high energy consumption. In comparison, a dedicated hardware design is usually much more efficient than software programs running on general-purpose processors in terms of runtime and energy consumption. Therefore, the objective of this dissertation is to develop efficient hardware architectures for mainstream machine learning algorithms, to provide a promising solution to addressing the runtime and energy bottlenecks of machine learning applications. However, it is a really challenging task to map complex machine learning algorithms to efficient hardware architectures. In fact, many important design decisions need to be made during the hardware development for efficient tradeoffs. In this dissertation, a parallel digital VLSI architecture for combined SVM training and classification is proposed. For the first time, cascade SVM, a powerful training algorithm, is leveraged to significantly improve the scalability of hardware-based SVM training and develop an efficient parallel VLSI architecture. The parallel SVM processors provide a significant training time speedup and energy reduction compared with the software SVM algorithm running on a general-purpose CPU. Furthermore, a liquid state machine based neuromorphic learning processor with integrated training and recognition is proposed. A novel theoretical measure of computational power is proposed to facilitate fast design space exploration of the recurrent reservoir. Three low-power techniques are proposed to improve the energy efficiency. Meanwhile, a 2-layer spiking neural network with global inhibition is realized on Silicon. In addition, we also present architectural design exploration of a brain-inspired digital neuromorphic processor architecture with memristive synaptic crossbar array, and highlight several synaptic memory access styles. Various analog-to-digital converter schemes have been investigated to provide new insights into the tradeoff between the hardware cost and energy consumption

Texas A&M Repository

Miniature high dynamic range time-resolved CMOS SPAD image sensors

Author: Al Abbas Tarek
Publication venue: The University of Edinburgh
Publication date: 03/07/2019
Field of study

Since their integration in complementary metal oxide (CMOS) semiconductor technology in 2003, single photon avalanche diodes (SPADs) have inspired a new era of low cost high integration quantum-level image sensors. Their unique feature of discerning single photon detections, their ability to retain temporal information on every collected photon and their amenability to high speed image sensor architectures makes them prime candidates for low light and time-resolved applications. From the biomedical field of fluorescence lifetime imaging microscopy (FLIM) to extreme physical phenomena such as quantum entanglement, all the way to time of flight (ToF) consumer applications such as gesture recognition and more recently automotive light detection and ranging (LIDAR), huge steps in detector and sensor architectures have been made to address the design challenges of pixel sensitivity and functionality trade-off, scalability and handling of large data rates. The goal of this research is to explore the hypothesis that given the state of the art CMOS nodes and fabrication technologies, it is possible to design miniature SPAD image sensors for time-resolved applications with a small pixel pitch while maintaining both sensitivity and built -in functionality. Three key approaches are pursued to that purpose: leveraging the innate area reduction of logic gates and finer design rules of advanced CMOS nodes to balance the pixel’s fill factor and processing capability, smarter pixel designs with configurable functionality and novel system architectures that lift the processing burden off the pixel array and mediate data flow. Two pathfinder SPAD image sensors were designed and fabricated: a 96 × 40 planar front side illuminated (FSI) sensor with 66% fill factor at 8.25μm pixel pitch in an industrialised 40nm process and a 128 × 120 3D-stacked backside illuminated (BSI) sensor with 45% fill factor at 7.83μm pixel pitch. Both designs rely on a digital, configurable, 12-bit ripple counter pixel allowing for time-gated shot noise limited photon counting. The FSI sensor was operated as a quanta image sensor (QIS) achieving an extended dynamic range in excess of 100dB, utilising triple exposure windows and in-pixel data compression which reduces data rates by a factor of 3.75×. The stacked sensor is the first demonstration of a wafer scale SPAD imaging array with a 1-to-1 hybrid bond connection. Characterisation results of the detector and sensor performance are presented. Two other time-resolved 3D-stacked BSI SPAD image sensor architectures are proposed. The first is a fully integrated 5-wire interface system on chip (SoC), with built-in power management and off-focal plane data processing and storage for high dynamic range as well as autonomous video rate operation. Preliminary images and bring-up results of the fabricated 2mm² sensor are shown. The second is a highly configurable design capable of simultaneous multi-bit oversampled imaging and programmable region of interest (ROI) time correlated single photon counting (TCSPC) with on-chip histogram generation. The 6.48μm pitch array has been submitted for fabrication. In-depth design details of both architectures are discussed

Edinburgh Research Archive

Recommended from our members

Real time occupant detection in high dynamic range environments

Author: Koch C.
Publication venue
Publication date
Field of study

The aim of this thesis is to explore strategies for real-time image segmentation of non-rigid objects in a spatio-temporal domain with a stationary camera within an optical high dynamic range environment. Camera, illumination and segmentation techniques are discussed for image processing in environments which are characterized by large intensity fluctuations and hence a high optical dynamic range (HDR), in particular for vehicle interior surveillance. Since the introduction of the airbag in 1981 numberless lives were saved and bad injuries were avoided. But in recent years the airbag has frequently been in the headlines due to the increasing number of injuries caused by it. To avoid these injuries a new generation of ’smart airbags’ has been designed which shows the ability to inflate in multiple steps and with different volumes. In order to determine the optimal inflation mode for a crash it is necessary to consider information about the interior situation and the occupants of the vehicle. This thesis presents a real-time visual occupant detection and classification system for advanced airbag deployment, utilizing a custom CMOS camera and motion based image segmentation algorithms for embedded systems under adverse illumination conditions. A novel illumination method is presented which combines a set of images flashed with different radiant intensities, which significantly simplifies image segmentation in HDR environments. With a constant exposure time for the imager a single image can be produced with a compressed dynamic range and a simultaneously reduced offset. This makes it possible to capture a vehicle interior under adverse light conditions without using high dynamic range cameras and without losing image detail. The expansion of this active illumination experiment leads to a novel shadow detection and removal technique that produces a shadow-free scene by simulating an artificial infinite illuminant plane over the held of view. Finally a shadowless image without loss of texture details is obtained without any region extraction phase. Furthermore, a texture based segmentation approach for stationary cam-eras is presented which is neither effected by sudden illumination changes nor by shadow effects

City Research Online

Abstracts on Radio Direction Finding (1899 - 1995)

Author
Publication venue
Publication date: 01/01/2018
Field of study

The files on this record represent the various databases that originally composed the CD-ROM issue of "Abstracts on Radio Direction Finding" database, which is now part of the Dudley Knox Library's Abstracts and Selected Full Text Documents on Radio Direction Finding (1899 - 1995) Collection. (See Calhoun record https://calhoun.nps.edu/handle/10945/57364 for further information on this collection and the bibliography). Due to issues of technological obsolescence preventing current and future audiences from accessing the bibliography, DKL exported and converted into the three files on this record the various databases contained in the CD-ROM. The contents of these files are: 1) RDFA_CompleteBibliography_xls.zip [RDFA_CompleteBibliography.xls: Metadata for the complete bibliography, in Excel 97-2003 Workbook format; RDFA_Glossary.xls: Glossary of terms, in Excel 97-2003 Workbookformat; RDFA_Biographies.xls: Biographies of leading figures, in Excel 97-2003 Workbook format]; 2) RDFA_CompleteBibliography_csv.zip [RDFA_CompleteBibliography.TXT: Metadata for the complete bibliography, in CSV format; RDFA_Glossary.TXT: Glossary of terms, in CSV format; RDFA_Biographies.TXT: Biographies of leading figures, in CSV format]; 3) RDFA_CompleteBibliography.pdf: A human readable display of the bibliographic data, as a means of double-checking any possible deviations due to conversion

Calhoun, Institutional Archive of the Naval Postgraduate School