373 research outputs found

    Driving the Network-on-Chip Revolution to Remove the Interconnect Bottleneck in Nanoscale Multi-Processor Systems-on-Chip

    Get PDF
    The sustained demand for faster, more powerful chips has been met by the availability of chip manufacturing processes allowing for the integration of increasing numbers of computation units onto a single die. The resulting outcome, especially in the embedded domain, has often been called SYSTEM-ON-CHIP (SoC) or MULTI-PROCESSOR SYSTEM-ON-CHIP (MP-SoC). MPSoC design brings to the foreground a large number of challenges, one of the most prominent of which is the design of the chip interconnection. With a number of on-chip blocks presently ranging in the tens, and quickly approaching the hundreds, the novel issue of how to best provide on-chip communication resources is clearly felt. NETWORKS-ON-CHIPS (NoCs) are the most comprehensive and scalable answer to this design concern. By bringing large-scale networking concepts to the on-chip domain, they guarantee a structured answer to present and future communication requirements. The point-to-point connection and packet switching paradigms they involve are also of great help in minimizing wiring overhead and physical routing issues. However, as with any technology of recent inception, NoC design is still an evolving discipline. Several main areas of interest require deep investigation for NoCs to become viable solutions: • The design of the NoC architecture needs to strike the best tradeoff among performance, features and the tight area and power constraints of the onchip domain. • Simulation and verification infrastructure must be put in place to explore, validate and optimize the NoC performance. • NoCs offer a huge design space, thanks to their extreme customizability in terms of topology and architectural parameters. Design tools are needed to prune this space and pick the best solutions. • Even more so given their global, distributed nature, it is essential to evaluate the physical implementation of NoCs to evaluate their suitability for next-generation designs and their area and power costs. This dissertation performs a design space exploration of network-on-chip architectures, in order to point-out the trade-offs associated with the design of each individual network building blocks and with the design of network topology overall. The design space exploration is preceded by a comparative analysis of state-of-the-art interconnect fabrics with themselves and with early networkon- chip prototypes. The ultimate objective is to point out the key advantages that NoC realizations provide with respect to state-of-the-art communication infrastructures and to point out the challenges that lie ahead in order to make this new interconnect technology come true. Among these latter, technologyrelated challenges are emerging that call for dedicated design techniques at all levels of the design hierarchy. In particular, leakage power dissipation, containment of process variations and of their effects. The achievement of the above objectives was enabled by means of a NoC simulation environment for cycleaccurate modelling and simulation and by means of a back-end facility for the study of NoC physical implementation effects. Overall, all the results provided by this work have been validated on actual silicon layout

    Methoden und Beschreibungssprachen zur Modellierung und Verifikation vonSchaltungen und Systemen: MBMV 2015 - Tagungsband, Chemnitz, 03. - 04. März 2015

    Get PDF
    Der Workshop Methoden und Beschreibungssprachen zur Modellierung und Verifikation von Schaltungen und Systemen (MBMV 2015) findet nun schon zum 18. mal statt. Ausrichter sind in diesem Jahr die Professur Schaltkreis- und Systementwurf der Technischen Universität Chemnitz und das Steinbeis-Forschungszentrum Systementwurf und Test. Der Workshop hat es sich zum Ziel gesetzt, neueste Trends, Ergebnisse und aktuelle Probleme auf dem Gebiet der Methoden zur Modellierung und Verifikation sowie der Beschreibungssprachen digitaler, analoger und Mixed-Signal-Schaltungen zu diskutieren. Er soll somit ein Forum zum Ideenaustausch sein. Weiterhin bietet der Workshop eine Plattform für den Austausch zwischen Forschung und Industrie sowie zur Pflege bestehender und zur Knüpfung neuer Kontakte. Jungen Wissenschaftlern erlaubt er, ihre Ideen und Ansätze einem breiten Publikum aus Wissenschaft und Wirtschaft zu präsentieren und im Rahmen der Veranstaltung auch fundiert zu diskutieren. Sein langjähriges Bestehen hat ihn zu einer festen Größe in vielen Veranstaltungskalendern gemacht. Traditionell sind auch die Treffen der ITGFachgruppen an den Workshop angegliedert. In diesem Jahr nutzen zwei im Rahmen der InnoProfile-Transfer-Initiative durch das Bundesministerium für Bildung und Forschung geförderte Projekte den Workshop, um in zwei eigenen Tracks ihre Forschungsergebnisse einem breiten Publikum zu präsentieren. Vertreter der Projekte Generische Plattform für Systemzuverlässigkeit und Verifikation (GPZV) und GINKO - Generische Infrastruktur zur nahtlosen energetischen Kopplung von Elektrofahrzeugen stellen Teile ihrer gegenwärtigen Arbeiten vor. Dies bereichert denWorkshop durch zusätzliche Themenschwerpunkte und bietet eine wertvolle Ergänzung zu den Beiträgen der Autoren. [... aus dem Vorwort

    RISC-V Core Instruction Extension Sets M and F

    Get PDF
    This thesis project presents the hardware design of the components capable of implementing a 5-stages core RV32I, RV32IM with integer multiplication and division expansion, and RV32IMF with partial single-precision floating-point support. These have been developed using Verilog HDL and based on the RISC-V ISA. Furthermore, these designs have been verified and synthesised on "bare-metal" using the FPGA platform from the DE0 development board. In addition, a custom variety of division modules have been produced to offer performance diversity on frequency of operation, resource allocation and number of clock cycles per division operations. The selection of these modules provides implementation options that allow to personalize the product to the customer needs

    Sistema de predicción epileptogenica en lazo cerrado basado en matrices sub-durales

    Get PDF
    The human brain is the most complex organ in the human body, which consists of approximately 100 billion neurons. These cells effortlessly communicate over multiple hemispheres to deliver our everyday sensorimotor and cognitive abilities. Although the underlying principles of neuronal communication are not well understood, there is evidence to suggest precise synchronisation and/or de-synchronisation of neuronal clusters could play an important role. Furthermore, new evidence suggests that these patterns of synchronisation could be used as an identifier for the detection of a variety of neurological disorders including, Alzheimers (AD), Schizophrenia (SZ) and Epilepsy (EP), where neural degradation or hyper synchronous networks have been detected. Over the years many different techniques have been proposed for the detection of synchronisation patterns, in the form of spectral analysis, transform approaches and statistical based studies. Nonetheless, most are confined to software based implementations as opposed to hardware realisations due to their complexity. Furthermore, the few hardware implementations which do exist, suffer from a lack of scalability, in terms of brain area coverage, throughput and power consumption. Here we introduce the design and implementation of a hardware efficient algorithm, named Delay Difference Analysis (DDA), for the identification of patient specific synchronisation patterns. The design is remarkably hardware friendly when compared with other algorithms. In fact, we can reduce hardware requirements by as much as 80% and power consumption as much as 90%, when compared with the most common techniques. In terms of absolute sensitivity the DDA produces an average sensitivity of more than 80% for a false positive rate of 0.75 FP/h and indeed up to a maximum of 90% for confidence levels of 95%. This thesis presents two integer-based digital processors for the calculation of phase synchronisation between neural signals. It is based on the measurement of time periods between two consecutive minima. The simplicity of the approach allows for the use of elementary digital blocks, such as registers, counters or adders. In fact, the first introduced processor was fabricated in a 0.18μm CMOS process and only occupies 0.05mm2 and consumes 15nW from a 0.5V supply voltage at a signal input rate of 1024S/s. These low-area and low-power features make the proposed circuit a valuable computing element in closed-loop neural prosthesis for the treatment of neural disorders, such as epilepsy, or for measuring functional connectivity maps between different recording sites in the brain. A second VLSI implementation was designed and integrated as a mass integrated 16-channel design. Incorporated into the design were 16 individual synchronisation processors (15 on-line processors and 1 test processor) each with a dedicated training and calculation module, used to build a specialised epileptic detection system based on patient specific synchrony thresholds. Each of the main processors are capable of calculating the phase synchrony between 9 independent electroencephalography (EEG) signals over 8 epochs of time totalling 120 EEG combinations. Remarkably, the entire circuit occupies a total area of only 3.64 mm2. This design was implemented with a multi-purpose focus in mind. Firstly, as a clinical aid to help physicians detect pathological brain states, where the small area would allow the patient to wear the device for home trials. Moreover, the small power consumption would allow to run from standard batteries for long periods. The trials could produce important patient specific information which could be processed using mathematical tools such as graph theory. Secondly, the design was focused towards the use as an in-vivo device to detect phase synchrony in real time for patients who suffer with such neurological disorders as EP, which need constant monitoring and feedback. In future developments this synchronisation device would make an good contribution to a full system on chip device for detection and stimulation.El cerebro humano es el órgano más complejo del cuerpo humano, que consta de aproximadamente 100 mil millones de neuronas. Estas células se comunican sin esfuerzo a través de ambos hemisferios para favorecer nuestras habilidades sensoriales y cognitivas diarias. Si bien los principios subyacentes de la comunicación neuronal no se comprenden bien, existen pruebas que sugieren que la sincronización precisa y/o la desincronización de los grupos neuronales podrían desempeñar un papel importante. Además, nuevas evidencias sugieren que estos patrones de sincronización podrían usarse como un identificador para la detección de una gran variedad de trastornos neurológicos incluyendo la enfermedad de Alzheimer(AD), la esquizofrenia(SZ) y la epilepsia(EP), donde se ha detectado la degradación neural o las redes hiper sincrónicas. A lo largo de los años, se han propuesto muchas técnicas diferentes para la detección de patrones de sincronización en forma de análisis espectral, enfoques de transformación y análisis estadísticos. No obstante, la mayoría se limita a implementaciones basadas en software en lugar de realizaciones de hardware debido a su complejidad. Además, las pocas implementaciones de hardware que existen, sufren una falta de escalabilidad, en términos de cobertura del área del cerebro, rendimiento y consumo de energía. Aquí presentamos el diseño y la implementación de un algoritmo eficiente de hardware llamado “Delay Difference Aproximation” (DDA) para la identificación de patrones de sincronización específicos del paciente. El diseño es notablemente compatible con el hardware en comparación con otros algoritmos. De hecho, podemos reducir los requisitos de hardware hasta en un 80% y el consumo de energía hasta en un 90%, en comparación con las técnicas más comunes. En términos de sensibilidad absoluta, la DDA produce una sensibilidad promedio de más del 80% para una tasa de falsos positivos de 0,75 PF / hr y hasta un máximo del 90% para niveles de confianza del 95%. Esta tesis presenta dos procesadores digitales para el cálculo de la sincronización de fase entre señales neuronales. Se basa en la medición de los períodos de tiempo entre dos mínimos consecutivos. La simplicidad del enfoque permite el uso de bloques digitales elementales, como registros, contadores o sumadores. De hecho, el primer procesador introducido se fabricó en un proceso CMOS de 0.18μm y solo ocupa 0.05mm2 y consume 15nW de un voltaje de suministro de 0.5V a una tasa de entrada de señal de 1024S/s Estas características de baja área y baja potencia hacen que el procesador propuesto sea un valioso elemento informático en prótesis neurales de circuito cerrado para el tratamiento de trastornos neuronales, como la epilepsia, o para medir mapas de conectividad funcional entre diferentes sitios de registro en el cerebro. Además, se diseñó una segunda implementación VLSI que se integró como un diseño de 16 canales integrado en masa. Se incorporaron al diseño 16 procesadores de sincronización individuales (15 procesadores en línea y 1 procesador de prueba), cada uno con un módulo de entrenamiento y cálculo dedicado, utilizado para construir un sistema de detección epiléptico especializado basado en umbrales de sincronía específicos del paciente. Cada uno de los procesadores principales es capaz de calcular la sincronización de fase entre 9 señales de electroencefalografía (EEG) independientes en 8 épocas de tiempo que totalizan 120 combinaciones de EEG. Cabe destacar que todo el circuito ocupa un área total de solo 3.64 mm2. Este diseño fue implementado teniendo en mente varios propósitos. En primer lugar, como ayuda clínica para ayudar a los médicos a detectar estados cerebrales patológicos, donde el área pequeña permitiría al paciente usar el dispositivo para las pruebas caseras. Además, el pequeño consumo de energía permitiría una carga cero del dispositivo, lo que le permitiría funcionar con baterías estándar durante largos períodos. Los ensayos podrían producir información importante específica para el paciente que podría procesarse utilizando herramientas matemáticas como la teoría de grafos. En segundo lugar, el diseño se centró en el uso como un dispositivo in-vivo para detectar la sincronización de fase en tiempo real para pacientes que sufren trastornos neurológicos como el EP, que necesitan supervisión y retroalimentación constantes. En desarrollos futuros, este dispositivo de sincronización sería una buena base para desarrollar un sistema completo de un dispositivo chip para detección de trastornos neurológicos

    Simulated annealing based datapath synthesis

    Get PDF

    inSense: A Variation and Fault Tolerant Architecture for Nanoscale Devices

    Get PDF
    Transistor technology scaling has been the driving force in improving the size, speed, and power consumption of digital systems. As devices approach atomic size, however, their reliability and performance are increasingly compromised due to reduced noise margins, difficulties in fabrication, and emergent nano-scale phenomena. Scaled CMOS devices, in particular, suffer from process variations such as random dopant fluctuation (RDF) and line edge roughness (LER), transistor degradation mechanisms such as negative-bias temperature instability (NBTI) and hot-carrier injection (HCI), and increased sensitivity to single event upsets (SEUs). Consequently, future devices may exhibit reduced performance, diminished lifetimes, and poor reliability. This research proposes a variation and fault tolerant architecture, the inSense architecture, as a circuit-level solution to the problems induced by the aforementioned phenomena. The inSense architecture entails augmenting circuits with introspective and sensory capabilities which are able to dynamically detect and compensate for process variations, transistor degradation, and soft errors. This approach creates ``smart\u27\u27 circuits able to function despite the use of unreliable devices and is applicable to current CMOS technology as well as next-generation devices using new materials and structures. Furthermore, this work presents an automated prototype implementation of the inSense architecture targeted to CMOS devices and is evaluated via implementation in ISCAS \u2785 benchmark circuits. The automated prototype implementation is functionally verified and characterized: it is found that error detection capability (with error windows from \approx30-400ps) can be added for less than 2\% area overhead for circuits of non-trivial complexity. Single event transient (SET) detection capability (configurable with target set-points) is found to be functional, although it generally tracks the standard DMR implementation with respect to overheads

    Feasibility Study of High-Level Synthesis : Implementation of a Real-Time HEVC Intra Encoder on FPGA

    Get PDF
    High-Level Synthesis (HLS) on automatisoitu suunnitteluprosessi, joka pyrkii parantamaan tuottavuutta perinteisiin suunnittelumenetelmiin verrattuna, nostamalla suunnittelun abstraktiota rekisterisiirtotasolta (RTL) käyttäytymistasolle. Erilaisia kaupallisia HLS-työkaluja on ollut markkinoilla aina 1990-luvulta lähtien, mutta vasta äskettäin ne ovat alkaneet saada hyväksyntää teollisuudessa sekä akateemisessa maailmassa. Hidas käyttöönottoaste on johtunut pääasiassa huonommasta tulosten laadusta (QoR) kuin mitä on ollut mahdollista tavanomaisilla laitteistokuvauskielillä (HDL). Uusimmat HLS-työkalusukupolvet ovat kuitenkin kaventaneet QoR-aukkoa huomattavasti. Tämä väitöskirja tutkii HLS:n soveltuvuutta videokoodekkien kehittämiseen. Se esittelee useita HLS-toteutuksia High Efficiency Video Coding (HEVC) -koodaukselle, joka on keskeinen mahdollistava tekniikka lukuisille nykyaikaisille mediasovelluksille. HEVC kaksinkertaistaa koodaustehokkuuden edeltäjäänsä Advanced Video Coding (AVC) -standardiin verrattuna, saavuttaen silti saman subjektiivisen visuaalisen laadun. Tämä tyypillisesti saavutetaan huomattavalla laskennallisella lisäkustannuksella. Siksi reaaliaikainen HEVC vaatii automatisoituja suunnittelumenetelmiä, joita voidaan käyttää rautatoteutus- (HW ) ja varmennustyön minimoimiseen. Tässä väitöskirjassa ehdotetaan HLS:n käyttöä koko enkooderin suunnitteluprosessissa. Dataintensiivisistä koodaustyökaluista, kuten intra-ennustus ja diskreetit muunnokset, myös enemmän kontrollia vaativiin kokonaisuuksiin, kuten entropiakoodaukseen. Avoimen lähdekoodin Kvazaar HEVC -enkooderin C-lähdekoodia hyödynnetään tässä työssä referenssinä HLS-suunnittelulle sekä toteutuksen varmentamisessa. Suorituskykytulokset saadaan ja raportoidaan ohjelmoitavalla porttimatriisilla (FPGA). Tämän väitöskirjan tärkein tuotos on HEVC intra enkooderin prototyyppi. Prototyyppi koostuu Nokia AirFrame Cloud Server palvelimesta, varustettuna kahdella 2.4 GHz:n 14-ytiminen Intel Xeon prosessorilla, sekä kahdesta Intel Arria 10 GX FPGA kiihdytinkortista, jotka voidaan kytkeä serveriin käyttäen joko peripheral component interconnect express (PCIe) liitäntää tai 40 gigabitin Ethernettiä. Prototyyppijärjestelmä saavuttaa reaaliaikaisen 4K enkoodausnopeuden, jopa 120 kuvaa sekunnissa. Lisäksi järjestelmän suorituskykyä on helppo skaalata paremmaksi lisäämällä järjestelmään käytännössä minkä tahansa määrän verkkoon kytkettäviä FPGA-kortteja. Monimutkaisen HEVC:n tehokas mallinnus ja sen monipuolisten ominaisuuksien mukauttaminen reaaliaikaiselle HW HEVC enkooderille ei ole triviaali tehtävä, koska HW-toteutukset ovat perinteisesti erittäin aikaa vieviä. Tämä väitöskirja osoittaa, että HLS:n avulla pystytään nopeuttamaan kehitysaikaa, tarjoamaan ennen näkemätöntä suunnittelun skaalautuvuutta, ja silti osoittamaan kilpailukykyisiä QoR-arvoja ja absoluuttista suorituskykyä verrattuna olemassa oleviin toteutuksiin.High-Level Synthesis (HLS) is an automated design process that seeks to improve productivity over traditional design methods by increasing design abstraction from register transfer level (RTL) to behavioural level. Various commercial HLS tools have been available on the market since the 1990s, but only recently they have started to gain adoption across industry and academia. The slow adoption rate has mainly stemmed from lower quality of results (QoR) than obtained with conventional hardware description languages (HDLs). However, the latest HLS tool generations have substantially narrowed the QoR gap. This thesis studies the feasibility of HLS in video codec development. It introduces several HLS implementations for High Efficiency Video Coding (HEVC) , that is the key enabling technology for numerous modern media applications. HEVC doubles the coding efficiency over its predecessor Advanced Video Coding (AVC) standard for the same subjective visual quality, but typically at the cost of considerably higher computational complexity. Therefore, real-time HEVC calls for automated design methodologies that can be used to minimize the HW implementation and verification effort. This thesis proposes to use HLS throughout the whole encoder design process. From data-intensive coding tools, like intra prediction and discrete transforms, to more control-oriented tools, such as entropy coding. The C source code of the open-source Kvazaar HEVC encoder serves as a design entry point for the HLS flow, and it is also utilized in design verification. The performance results are gathered with and reported for field programmable gate array (FPGA) . The main contribution of this thesis is an HEVC intra encoder prototype that is built on a Nokia AirFrame Cloud Server equipped with 2.4 GHz dual 14-core Intel Xeon processors and two Intel Arria 10 GX FPGA Development Kits, that can be connected to the server via peripheral component interconnect express (PCIe) generation 3 or 40 Gigabit Ethernet. The proof-of-concept system achieves real-time. 4K coding speed up to 120 fps, which can be further scaled up by adding practically any number of network-connected FPGA cards. Overcoming the complexity of HEVC and customizing its rich features for a real-time HEVC encoder implementation on hardware is not a trivial task, as hardware development has traditionally turned out to be very time-consuming. This thesis shows that HLS is able to boost the development time, provide previously unseen design scalability, and still result in competitive performance and QoR over state-of-the-art hardware implementations
    corecore