Search CORE

147 research outputs found

Automated Design of Approximate Accelerators

Author: Castro-Godínez Jorge
Publication venue: KIT-Bibliothek, Karlsruhe
Publication date: 04/01/2021
Field of study

In den letzten zehn Jahren hat das Bedürfnis nach Recheneffizienz die Entwicklung neuer Geräte, Architekturen und Entwurfstechniken motiviert. Approximate Computing hat sich als modernes, energieeffizientes Entwurfsparadigma für Anwendungen herausgestellt, die eine inhärente Fehlertoleranz aufweisen. Wenn die Genauigkeit der Ergebnisse in aktuellen Anwendungen wie Bildverarbeitung, Computer Vision und maschinellem Lernen auf ein akzeptables Maß reduziert wird, können Einsparungen im Schaltungsbereich, bei der Schaltkreisverzögerung und beim Stromverbrauch erzielt werden. Mit dem Aufkommen dieses Approximate Computing Paradigmas wurden in der Literatur viele approximierte Funktionseinheiten angegeben, insbesondere approximierte Addierer und Multiplizierer. Für eine Vielzahl solcher approximierter Schaltkreise und unter Berücksichtigung ihrer Verwendung als Bausteine für den Entwurf von approximierten Beschleunigern für fehlertolerante Anwendungen, ergibt sich eine Herausforderung: die Auswahl dieser approximierten Schaltkreise für eine bestimmte Anwendung, die die erforderlichen Ressourcen minimieren und gleichzeitig eine definierte Genauigkeit erfüllen. Diese Dissertation schlägt automatisierte Methoden zum Entwerfen und Implementieren von approximierten Beschleunigern vor, die aus approximierten arithmetischen Schaltungen aufgebaut sind. Um dies zu erreichen, befasst sich diese Dissertation mit folgenden Herausforderungen und liefert die nachfolgenden neuartigen Beiträge: In der Literatur wurden viele approximierte Addierer und Multiplizierer vorgestellt, indem entweder approximierte Entwürfe aus genauen Implementierungen wie dem Ripple-Carry-Addierer vorgeschlagen oder durch Approximate Logic Synthesis (ALS) Methoden generiert wurden. Ein repräsentativer Satz dieser approximierten Komponenten ist erforderlich, um approximierte Beschleuniger zu bauen. In diesem Sinne präsentiert diese Dissertation zwei Ansätze, um solche approximierte arithmetische Schaltungen zu erstellen. Zunächst wird AUGER vorgestellt, ein Tool, mit dem Register-Transfer Level (RTL) Beschreibungen für einen breiten Satz von approximierten Addierern und Multiplizierer für unterschiedliche Datenbitbreiten- und Genauigkeitskonfigurationen generiert werden können. Mit AUGER kann eine Design Space Exploration (DSE) von approximierten Komponenten durchgeführt werden, um diejenigen zu finden, die für eine gegebene Bitbreite, einen gegebenen Approximationsbereich und eine gegebene Schaltungsmetrik Pareto-optimal sind. Anschließend wird AxLS vorgestellt, ein Framework für ALS, das die Implementierung modernster Methoden und den Vorschlag neuartiger Methoden ermöglicht, um strukturelle Netzlistentransformationen durchzuführen und approximierte arithmetische Schaltungen aus genauen Schaltungen zu generieren. Darüber hinaus bieten beide Werkzeuge eine Fehlercharakterisierung in Form einer Fehlerverteilung und Schaltungseigenschaften (Fläche, Schaltkreisverzögerung und Leistung) für jede von ihnen erzeugte approximierte Schaltung. Diese Informationen sind für das Untersuchungsziel dieser Dissertation von wesentlicher Bedeutung. Trotz der Fehlertoleranz müssen approximierte Beschleuniger so ausgelegt sein, dass sie Genauigkeitsvorgaben erfüllen. Für den Entwurf solcher Beschleuniger unter Verwendung von approximierten arithmetischen Schaltungen ist es daher unerlässlich zu bewerten, wie sich die durch approximierte Schaltungen verursachten Fehler durch andere Berechnungen ausbreiten, entweder genau oder ungenau, und sich schließlich am Ausgang ansammeln. Diese Dissertation schlägt analytische Modelle vor, um die Fehlerpropagation durch genaue und approximierte Berechnungen zu beschreiben. Mit ihnen wird eine automatisierte, compilerbasierte Methodik vorgeschlagen, um die Fehlerpropagation auf approximierten Beschleunigerdesigns abzuschätzen. Diese Methode ist in ein Tool, CEDA, integriert, um schnelle, simulationsfreie Genauigkeitsschätzungen von approximierten Beschleunigermodellen durchzuführen, die unter Verwendung von C-Code beschrieben wurden. Beim Entwurf von approximierten Beschleunigern benötigen sich wiederholende Simulationen auf Gate-Level und die Schaltungssynthese viel Zeit, um viele oder sogar alle möglichen Kombinationen für einen gegebenen Satz von approximierten arithmetischen Schaltungen zu untersuchen. Andererseits basieren aktuelle Trends beim Entwerfen von Beschleunigern auf High-Level Synthesis (HLS) Werkzeugen. In dieser Dissertation werden analytische Modelle zur Schätzung der erforderlichen Rechenressourcen vorgestellt, wenn approximierte Addierer und Multiplizierer in Konstruktionen von approximierten Beschleunigern verwendet werden. Darüber hinaus werden diese Modelle zusammen mit den vorgeschlagenen analytischen Modellen zur Genauigkeitsschätzung in eine DSE-Methodik für fehlertolerante Anwendungen, DSEwam, integriert, um Pareto-optimale oder nahezu Pareto-optimale Lösungen für approximierte Beschleuniger zu identifizieren. DSEwam ist in ein HLS-Tool integriert, um automatisch RTL-Beschreibungen von approximierten Beschleunigern aus C-Sprachbeschreibungen für eine bestimmte Fehlerschwelle und ein bestimmtes Minimierungsziel zu generieren. Die Verwendung von approximierten Beschleunigern muss sicherstellen, dass Fehler, die aufgrund von approximierten Berechnungen erzeugt werden, innerhalb eines definierten Maximalwerts für eine gegebene Genauigkeitsmetrik bleiben. Die Fehler, die durch approximierte Beschleuniger erzeugt werden, hängen jedoch von den Eingabedaten ab, die hinsichtlich der für das Design verwendeten Daten unterschiedlich sein können. In dieser Dissertation wird ECAx vorgestellt, eine automatisierte Methode zur Untersuchung und Anwendung feinkörniger Fehlerkorrekturen mit geringem Overhead in approximierten Beschleunigern, um die Kosten für die Fehlerkorrektur auf Softwareebene (wie es in der Literatur gemacht wird) zu senken. Dies erfolgt durch selektive Korrektur der signifikantesten Fehler (in Bezug auf ihre Größenordnung), die von approximierten Komponenten erzeugt werden, ohne die Vorteile der Approximationen zu verlieren. Die experimentelle Auswertung zeigt Beschleunigungsverbesserungen für die Anwendung im Austausch für einen leicht gestiegenen Flächen- und Leistungsverbrauch im approximierten Beschleunigerdesign

KITopen

NEUROSURGERY ENTHUSIASTIC WOMEN SOCIETY

System-on-chip Computing and Interconnection Architectures for Telecommunications and Signal Processing

Author: L'INSALATA NICOLA EUGENIO
Publication venue: 'Pisa University Press'
Publication date: 09/06/2048
Field of study

This dissertation proposes novel architectures and design techniques targeting SoC building blocks for telecommunications and signal processing applications. Hardware implementation of Low-Density Parity-Check decoders is approached at both the algorithmic and the architecture level. Low-Density Parity-Check codes are a promising coding scheme for future communication standards due to their outstanding error correction performance. This work proposes a methodology for analyzing effects of finite precision arithmetic on error correction performance and hardware complexity. The methodology is throughout employed for co-designing the decoder. First, a low-complexity check node based on the P-output decoding principle is designed and characterized on a CMOS standard-cells library. Results demonstrate implementation loss below 0.2 dB down to BER of 10^{-8} and a saving in complexity up to 59% with respect to other works in recent literature. High-throughput and low-latency issues are addressed with modified single-phase decoding schedules. A new "memory-aware" schedule is proposed requiring down to 20% of memory with respect to the traditional two-phase flooding decoding. Additionally, throughput is doubled and logic complexity reduced of 12%. These advantages are traded-off with error correction performance, thus making the solution attractive only for long codes, as those adopted in the DVB-S2 standard. The "layered decoding" principle is extended to those codes not specifically conceived for this technique. Proposed architectures exhibit complexity savings in the order of 40% for both area and power consumption figures, while implementation loss is smaller than 0.05 dB. Most modern communication standards employ Orthogonal Frequency Division Multiplexing as part of their physical layer. The core of OFDM is the Fast Fourier Transform and its inverse in charge of symbols (de)modulation. Requirements on throughput and energy efficiency call for FFT hardware implementation, while ubiquity of FFT suggests the design of parametric, re-configurable and re-usable IP hardware macrocells. In this context, this thesis describes an FFT/IFFT core compiler particularly suited for implementation of OFDM communication systems. The tool employs an accuracy-driven configuration engine which automatically profiles the internal arithmetic and generates a core with minimum operands bit-width and thus minimum circuit complexity. The engine performs a closed-loop optimization over three different internal arithmetic models (fixed-point, block floating-point and convergent block floating-point) using the numerical accuracy budget given by the user as a reference point. The flexibility and re-usability of the proposed macrocell are illustrated through several case studies which encompass all current state-of-the-art OFDM communications standards (WLAN, WMAN, xDSL, DVB-T/H, DAB and UWB). Implementations results are presented for two deep sub-micron standard-cells libraries (65 and 90 nm) and commercially available FPGA devices. Compared with other FFT core compilers, the proposed environment produces macrocells with lower circuit complexity and same system level performance (throughput, transform size and numerical accuracy). The final part of this dissertation focuses on the Network-on-Chip design paradigm whose goal is building scalable communication infrastructures connecting hundreds of core. A low-complexity link architecture for mesochronous on-chip communication is discussed. The link enables skew constraint looseness in the clock tree synthesis, frequency speed-up, power consumption reduction and faster back-end turnarounds. The proposed architecture reaches a maximum clock frequency of 1 GHz on 65 nm low-leakage CMOS standard-cells library. In a complex test case with a full-blown NoC infrastructure, the link overhead is only 3% of chip area and 0.5% of leakage power consumption. Finally, a new methodology, named metacoding, is proposed. Metacoding generates correct-by-construction technology independent RTL codebases for NoC building blocks. The RTL coding phase is abstracted and modeled with an Object Oriented framework, integrated within a commercial tool for IP packaging (Synopsys CoreTools suite). Compared with traditional coding styles based on pre-processor directives, metacoding produces 65% smaller codebases and reduces the configurations to verify up to three orders of magnitude

Electronic Thesis and Dissertation Archive - Università di Pisa

Exploration of fine-grained self-healing concepts for approximate multipliers

Author: Peters Quinty C.
Publication venue
Publication date: 26/06/2022
Field of study

Pure OAI Repository

Practical Techniques for Improving Performance and Evaluating Security on Circuit Designs

Author: Xu Wenbin
Publication venue
Publication date: 20/11/2019
Field of study

As the modern semiconductor technology approaches to nanometer era, integrated circuits (ICs) are facing more and more challenges in meeting performance demand and security. With the expansion of markets in mobile and consumer electronics, the increasing demands require much faster delivery of reliable and secure IC products. In order to improve the performance and evaluate the security of emerging circuits, we present three practical techniques on approximate computing, split manufacturing and analog layout automation. Approximate computing is a promising approach for low-power IC design. Although a few accuracy-configurable adder (ACA) designs have been developed in the past, these designs tend to incur large area overheads as they rely on either redundant computing or complicated carry prediction. We investigate a simple ACA design that contains no redundancy or error detection/correction circuitry and uses very simple carry prediction. The simulation results show that our design dominates the latest previous work on accuracy-delay-power tradeoff while using 39% less area. One variant of this design provides finer-grained and larger tunability than that of the previous works. Moreover, we propose a delay-adaptive self-configuration technique to further improve the accuracy-delay-power tradeoff. Split manufacturing prevents attacks from an untrusted foundry. The untrusted foundry has front-end-of-line (FEOL) layout and the original circuit netlist and attempts to identify critical components on the layout for Trojan insertion. Although defense methods for this scenario have been developed, the corresponding attack technique is not well explored. Hence, the defense methods are mostly evaluated with the k-security metric without actual attacks. We develop a new attack technique based on structural pattern matching. Experimental comparison with existing attack shows that the new attack technique achieves about the same success rate with much faster speed for cases without the k-security defense, and has a much better success rate at the same runtime for cases with the k-security defense. The results offer an alternative and practical interpretation for k-security in split manufacturing. Analog layout automation is still far behind its digital counterpart. We develop the layout automation framework for analog/mixed-signal ICs. A hierarchical layout synthesis flow which works in bottom-up manner is presented. To ensure the qualified layouts for better circuit performance, we use the constraint-driven placement and routing methodology which employs the expert knowledge via design constraints. The constraint-driven placement uses simulated annealing process to find the optimal solution. The packing represented by sequence pairs and constraint graphs can simultaneously handle different kinds of placement constraints. The constraint-driven routing consists of two stages, integer linear programming (ILP) based global routing and sequential detailed routing. The experiment results demonstrate that our flow can handle complicated hierarchical designs with multiple design constraints. Furthermore, the placement performance can be further improved by using mixed-size block placement which works on large blocks in priority

Efficient fault tolerance for selected scientific computing algorithms on heterogeneous and approximate computer architectures

Author: Schöll Alexander
Publication venue
Publication date: 01/01/2018
Field of study

Scientific computing and simulation technology play an essential role to solve central challenges in science and engineering. The high computational power of heterogeneous computer architectures allows to accelerate applications in these domains, which are often dominated by compute-intensive mathematical tasks. Scientific, economic and political decision processes increasingly rely on such applications and therefore induce a strong demand to compute correct and trustworthy results. However, the continued semiconductor technology scaling increasingly imposes serious threats to the reliability and efficiency of upcoming devices. Different reliability threats can cause crashes or erroneous results without indication. Software-based fault tolerance techniques can protect algorithmic tasks by adding appropriate operations to detect and correct errors at runtime. Major challenges are induced by the runtime overhead of such operations and by rounding errors in floating-point arithmetic that can cause false positives. The end of Dennard scaling induces central challenges to further increase the compute efficiency between semiconductor technology generations. Approximate computing exploits the inherent error resilience of different applications to achieve efficiency gains with respect to, for instance, power, energy, and execution times. However, scientific applications often induce strict accuracy requirements which require careful utilization of approximation techniques. This thesis provides fault tolerance and approximate computing methods that enable the reliable and efficient execution of linear algebra operations and Conjugate Gradient solvers using heterogeneous and approximate computer architectures. The presented fault tolerance techniques detect and correct errors at runtime with low runtime overhead and high error coverage. At the same time, these fault tolerance techniques are exploited to enable the execution of the Conjugate Gradient solvers on approximate hardware by monitoring the underlying error resilience while adjusting the approximation error accordingly. Besides, parameter evaluation and estimation methods are presented that determine the computational efficiency of application executions on approximate hardware. An extensive experimental evaluation shows the efficiency and efficacy of the presented methods with respect to the runtime overhead to detect and correct errors, the error coverage as well as the achieved energy reduction in executing the Conjugate Gradient solvers on approximate hardware

Practical Techniques for Improving Performance and Evaluating Security on Circuit Designs

Author: Xu Wenbin
Publication venue
Publication date: 20/11/2019
Field of study

Texas A&M Repository

Strategies of development and maintenance in supervision, control, synchronization, data acquisition and processing in light sources

Author: Fernández Carreiras David
Publication venue
Publication date: 01/01/2020
Field of study

Programa Oficial de Doutoramento en Tecnoloxías da Información e as Comunicacións. 5032V01[Resumo] Os aceleradores de partículas e fontes de luz sincrotrón, evolucionan constantemente para estar na vangarda da tecnoloxía, levando os límites cada vez mais lonxe para explorar novos dominios e universos. Os sistemas de control son unha parte crucial desas instalacións científicas e buscan logra-la flexibilidade de manobra para poder facer experimentos moi variados, con configuracións diferentes que engloban moitos tipos de detectores, procedementos, mostras a estudar e contornas. As propostas de experimento son cada vez máis ambiciosas e van sempre un paso por diante do establecido. Precísanse detectores cada volta máis rápidos e eficientes, con máis ancho de banda e con máis resolución. Tamén é importante a operación simultánea de varios detectores tanto escalares como mono ou bidimensionáis, con mecanismos de sincronización de precisión que integren as singularidades de cada un. Este traballo estuda as solucións existentes no campo dos sistemas de control e adquisición de datos nos aceleradores de partículas e fontes de luz e raios X, ó tempo que explora novos requisitos e retos no que respecta á sincronización e velocidade de adquisición de datos para novos experimentos, a optimización do deseño, soporte, xestión de servizos e custos de operación. Tamén se estudan diferentes solucións adaptadas a cada contorna.[Resumen] Los aceleradores de partículas y fuentes de luz sincrotrón, evolucionan constantemente para estar en la vanguardia de la tecnología, y poder explorar nuevos dominios. Los sistemas de control son una parte fundamental de esas instalaciones científicas y buscan lograr la máxima flexibilidad para poder llevar a cabo experimentos más variados, con configuraciones diferentes que engloban varios tipos de detectores, procedimientos, muestras a estudiar y entornos. Los experimentos se proponen cada vez más ambiciosos y en ocasiones más allá de los límites establecidos. Se necesitan detectores cada vez más rápidos y eficientes, con más resolución y ancho de banda, que puedan sincronizarse simultáneamente con otros detectores tanto escalares como mono y bidimensionales, integrando las singularidades de cada uno y homogeneizando la adquisición de datos. Este trabajo estudia los sistemas de control y adquisición de datos de aceleradores de partículas y fuentes de luz y rayos X, y explora nuevos requisitos y retos en lo que respecta a la sincronización y velocidad de adquisición de datos, optimización y costo-eficiencia en el diseño, operación soporte, mantenimiento y gestión de servicios. También se estudian diferentes soluciones adaptadas a cada entorno.[Abstract] Particle accelerators and photon sources are constantly evolving, attaining the cutting-edge technologies to push the limits forward and explore new domains. The control systems are a crucial part of these installations and are required to provide flexible solutions to the new challenging experiments, with different kinds of detectors, setups, sample environments and procedures. Experiment proposals are more and more ambitious at each call and go often a step beyond the capabilities of the instrumentation. Detectors shall be faster, with higher efficiency, more resolution, more bandwidth and able to synchronize with other detectors of all kinds; scalars, one or two-dimensional, taking into account their singularities and homogenizing the data acquisition. This work examines the control and data acquisition systems for particle accelerators and X- ray / light sources and explores new requirements and challenges regarding synchronization and data acquisition bandwidth, optimization and cost-efficiency in the design / operation / support. It also studies different solutions depending on the environment

Repositorio da Universidade da Coruña

Embedded electronic systems driven by run-time reconfigurable hardware

Author: Fons Lluís Francisco
Publication venue: 'Universitat Rovira I Virgili'
Publication date: 01/01/2012
Field of study

Abstract This doctoral thesis addresses the design of embedded electronic systems based on run-time reconfigurable hardware technology –available through SRAM-based FPGA/SoC devices– aimed at contributing to enhance the life quality of the human beings. This work does research on the conception of the system architecture and the reconfiguration engine that provides to the FPGA the capability of dynamic partial reconfiguration in order to synthesize, by means of hardware/software co-design, a given application partitioned in processing tasks which are multiplexed in time and space, optimizing thus its physical implementation –silicon area, processing time, complexity, flexibility, functional density, cost and power consumption– in comparison with other alternatives based on static hardware (MCU, DSP, GPU, ASSP, ASIC, etc.). The design flow of such technology is evaluated through the prototyping of several engineering applications (control systems, mathematical coprocessors, complex image processors, etc.), showing a high enough level of maturity for its exploitation in the industry.Resumen Esta tesis doctoral abarca el diseño de sistemas electrónicos embebidos basados en tecnología hardware dinámicamente reconfigurable –disponible a través de dispositivos lógicos programables SRAM FPGA/SoC– que contribuyan a la mejora de la calidad de vida de la sociedad. Se investiga la arquitectura del sistema y del motor de reconfiguración que proporcione a la FPGA la capacidad de reconfiguración dinámica parcial de sus recursos programables, con objeto de sintetizar, mediante codiseño hardware/software, una determinada aplicación particionada en tareas multiplexadas en tiempo y en espacio, optimizando así su implementación física –área de silicio, tiempo de procesado, complejidad, flexibilidad, densidad funcional, coste y potencia disipada– comparada con otras alternativas basadas en hardware estático (MCU, DSP, GPU, ASSP, ASIC, etc.). Se evalúa el flujo de diseño de dicha tecnología a través del prototipado de varias aplicaciones de ingeniería (sistemas de control, coprocesadores aritméticos, procesadores de imagen, etc.), evidenciando un nivel de madurez viable ya para su explotación en la industria.Resum Aquesta tesi doctoral està orientada al disseny de sistemes electrònics empotrats basats en tecnologia hardware dinàmicament reconfigurable –disponible mitjançant dispositius lògics programables SRAM FPGA/SoC– que contribueixin a la millora de la qualitat de vida de la societat. S’investiga l’arquitectura del sistema i del motor de reconfiguració que proporcioni a la FPGA la capacitat de reconfiguració dinàmica parcial dels seus recursos programables, amb l’objectiu de sintetitzar, mitjançant codisseny hardware/software, una determinada aplicació particionada en tasques multiplexades en temps i en espai, optimizant així la seva implementació física –àrea de silici, temps de processat, complexitat, flexibilitat, densitat funcional, cost i potència dissipada– comparada amb altres alternatives basades en hardware estàtic (MCU, DSP, GPU, ASSP, ASIC, etc.). S’evalúa el fluxe de disseny d’aquesta tecnologia a través del prototipat de varies aplicacions d’enginyeria (sistemes de control, coprocessadors aritmètics, processadors d’imatge, etc.), demostrant un nivell de maduresa viable ja per a la seva explotació a la indústria

LAReferencia - Red Federada de Repositorios Institucionales de Publicaciones Científicas Latinoamericanas

Tesis Doctorals en Xarxa

Repositori Institucional URV

Application of novel technologies for the development of next generation MR compatible PET inserts

Author: Konstantinou Georgios
Publication venue
Publication date: 01/01/2017
Field of study

Multimodal imaging integrating Positron Emission Tomography and Magnetic Resonance Imaging (PET/MRI) has professed advantages as compared to other available combinations, allowing both functional and structural information to be acquired with very high precision and repeatability. However, it has yet to be adopted as the standard for experimental and clinical applications, due to a variety of reasons mainly related to system cost and flexibility. A hopeful existing approach of silicon photodetector-based MR compatible PET inserts comprised by very thin PET devices that can be inserted in the MRI bore, has been pioneered, without disrupting the market as expected. Technological solutions that exist and can make this type of inserts lighter, cost-effective and more adaptable to the application need to be researched further. In this context, we expand the study of sub-surface laser engraving (SSLE) for scintillators used for PET. Through acquiring, measuring and calibrating the use of a SSLE setting we study the effect of different engraving configurations on detection characteristics of the scintillation light by the photosensors. We demonstrate that apart from cost-effectiveness and ease of application, SSLE treated scintillators have similar spatial resolution and superior sensitivity and packing fraction as compared to standard pixelated arrays, allowing for shorter crystals to be used. Flexibility of design is benchmarked and adoption of honeycomb architecture due to geometrical advantages is proposed. Furthermore, a variety of depth-of-interaction (DoI) designs are engraved and studied, greatly enhancing applicability in small field-of-view tomographs, such as the intended inserts. To adapt to this need, a novel approach for multi-layer DoI characterization has been developed and is demonstrated. Apart from crystal treatment, considerations on signal transmission and processing are addressed. A double time-over-threshold (ToT) method is proposed, using the statistics of noise in order to enhance precision. This method is tested and linearity results demonstrate applicability for multiplexed readout designs. A study on analog optical wireless communication (aOWC) techniques is also performed and proof of concept results presented. Finally, a ToT readout firmware architecture, intended for low-cost FPGAs, has been developed and is described. By addressing the potential development, applicability and merits of a range of transdisciplinary solutions, we demonstrate that with these techniques it is possible to construct lighter, smaller, lower consumption, cost-effective MRI compatible PET inserts. Those designs can make PET/MRI multimodality the dominant clinical and experimental imaging approach, enhancing researcher and physician insight to the mysteries of life.La combinación multimodal de Tomografía por Emisión de Positrones con la Imagen de Resonancia Magnética (PET/MRI, de sus siglas en inglés) tiene clara ventajas en comparación con otras técnicas multimodales actualmente disponibles, dada su capacidad para registrar información funcional e información estructural con mucha precisión y repetibilidad. Sin embargo, esta técnica no acaba de penetrar en la práctica clínica debido en gran parte a alto coste. Las investigaciones que persiguen mejorar el desarrollo de insertos de PET basados en fotodetectores de silicio y compatibles con MRI, aunque han sido intensas y han generado soluciones ingeniosas, todavía no han conseguido encontrar las soluciones que necesita la industria. Sin embargo, existen opciones todavía sin explorar que podrían ayudar a evolucionar este tipo de insertos consiguiendo dispositivos más ligeros, baratos y con mejores prestaciones. Esta tesis profundiza en el estudio de grabación sub-superficie con láser (SSLE) para el diseño de los cristales centelladores usados en los sistemas PET. Para ello hemos caracterizado, medido y calibrado un procedimiento SSLE, y a continuación hemos estudiado el efecto que tienen sobre las especificaciones del detector las diferentes configuraciones del grabado. Demostramos que además de la rentabilidad y facilidad de uso de esta técnica, los centelladores SSLE tienen resolución espacial equivalente y sensibilidad y fracción de empaquetamiento superiores a las matrices de centelleo convencionales, lo que posibilita utilizar cristales más cortos para conseguir la misma sensibilidad. Estos diseños también permiten medir la profundidad de la interacción (DoI), lo que facilita el uso de estos diseños en tomógrafos de radio pequeño, como pueden ser los sistemas preclínicos, los dedicados (cabeza o mama) o los insertos para MRI. Además de trabajar en el tratamiento de cristal de centelleo, hemos considerado nuevas aproximaciones al procesamiento y transmisión de la señal. Proponemos un método innovador de doble medida de tiempo sobre el umbral (ToT) que integra una evaluación de la estadística del ruido con el propósito de mejorar la precisión. El método se ha validado y los resultados demuestran su viabilidad de uso incluso en conjuntos de señales multiplexadas. Un estudio de las técnicas de comunicación óptica analógica e inalámbrica (aOWC) ha permitido el desarrollo de una nueva propuesta para comunicar las señales del detector PET insertado en el gantry a un el procesador de señal externo, técnica que se ha validado en un demostrador. Finalmente, se ha propuesto y demostrado una nueva arquitectura de análisis de señal ToT implementada en firmware en FPGAs de bajo coste. La concepción y desarrollo de estas ideas, así como la evaluación de los méritos de las diferentes soluciones propuestas, demuestran que con estas técnicas es posible construir insertos de PET compatibles con sistemas MRI, que serán más ligeros y compactos, con un reducido consumo y menor coste. De esta forma se contribuye a que la técnica multimodal PET/MRI pueda penetrar en la clínica, mejorando la comprensión que médicos e investigadores puedan alcanzar en su estudio de los misterios de la vida.Programa Oficial de Doctorado en Ingeniería Eléctrica, Electrónica y AutomáticaPresidente: Andrés Santos Lleó.- Secretario: Luis Hernández Corporales.- Vocal: Giancarlo Sportell

LAReferencia - Red Federada de Repositorios Institucionales de Publicaciones Científicas Latinoamericanas

Universidad Carlos III de Madrid e-Archivo

Automated Synthesis of Memristor Crossbar Networks

Author: Chakraborty Dwaipayan
Publication venue: 'Information Bulletin on Variable Stars (IBVS)'
Publication date: 01/01/2019
Field of study

The advancement of semiconductor device technology over the past decades has enabled the design of increasingly complex electrical and computational machines. Electronic design automation (EDA) has played a significant role in the design and implementation of transistor-based machines. However, as transistors move closer toward their physical limits, the speed-up provided by Moore\u27s law will grind to a halt. Once again, we find ourselves on the verge of a paradigm shift in the computational sciences as newer devices pave the way for novel approaches to computing. One of such devices is the memristor -- a resistor with non-volatile memory. Memristors can be used as junctional switches in crossbar circuits, which comprise of intersecting sets of vertical and horizontal nanowires. The major contribution of this dissertation lies in automating the design of such crossbar circuits -- doing a new kind of EDA for a new kind of computational machinery. In general, this dissertation attempts to answer the following questions: a. How can we synthesize crossbars for computing large Boolean formulas, up to 128-bit? b. How can we synthesize more compact crossbars for small Boolean formulas, up to 8-bit? c. For a given loop-free C program doing integer arithmetic, is it possible to synthesize an equivalent crossbar circuit? We have presented novel solutions to each of the above problems. Our new, proposed solutions resolve a number of significant bottlenecks in existing research, via the usage of innovative logic representation and artificial intelligence techniques. For large Boolean formulas (up to 128-bit), we have utilized Reduced Ordered Binary Decision Diagrams (ROBDDs) to automatically synthesize linearly growing crossbar circuits that compute them. This cutting edge approach towards flow-based computing has yielded state-of-the-art results. It is worth noting that this approach is scalable to n-bit Boolean formulas. We have made significant original contributions by leveraging artificial intelligence for automatic synthesis of compact crossbar circuits. This inventive method has been expanded to encompass crossbar networks with 1D1M (1-diode-1-memristor) switches, as well. The resultant circuits satisfy the tight constraints of the Feynman Grand Prize challenge and are able to perform 8-bit binary addition. A leading edge development for end-to-end computation with flow-based crossbars has been implemented, which involves methodical translation of loop-free C programs into crossbar circuits via automated synthesis. The original contributions described in this dissertation reflect the substantial progress we have made in the area of electronic design automation for synthesis of memristor crossbar networks

University of Central Florida (UCF): STARS (Showcase of Text, Archives, Research & Scholarship)