3,686 research outputs found

    Design and Evaluation of a Hardware System for Online Signal Processing within Mobile Brain-Computer Interfaces

    Get PDF
    Brain-Computer Interfaces (BCIs) sind innovative Systeme, die eine direkte Kommunikation zwischen dem Gehirn und externen Geräten ermöglichen. Diese Schnittstellen haben sich zu einer transformativen Lösung nicht nur für Menschen mit neurologischen Verletzungen entwickelt, sondern auch für ein breiteres Spektrum von Menschen, das sowohl medizinische als auch nicht-medizinische Anwendungen umfasst. In der Vergangenheit hat die Herausforderung, dass neurologische Verletzungen nach einer anfänglichen Erholungsphase statisch bleiben, die Forscher dazu veranlasst, innovative Wege zu beschreiten. Seit den 1970er Jahren stehen BCIs an vorderster Front dieser Bemühungen. Mit den Fortschritten in der Forschung haben sich die BCI-Anwendungen erweitert und zeigen ein großes Potenzial für eine Vielzahl von Anwendungen, auch für weniger stark eingeschränkte (zum Beispiel im Kontext von Hörelektronik) sowie völlig gesunde Menschen (zum Beispiel in der Unterhaltungsindustrie). Die Zukunft der BCI-Forschung hängt jedoch auch von der Verfügbarkeit zuverlässiger BCI-Hardware ab, die den Einsatz in der realen Welt gewährleistet. Das im Rahmen dieser Arbeit konzipierte und implementierte CereBridge-System stellt einen bedeutenden Fortschritt in der Brain-Computer-Interface-Technologie dar, da es die gesamte Hardware zur Erfassung und Verarbeitung von EEG-Signalen in ein mobiles System integriert. Die Architektur der Verarbeitungshardware basiert auf einem FPGA mit einem ARM Cortex-M3 innerhalb eines heterogenen ICs, was Flexibilität und Effizienz bei der EEG-Signalverarbeitung gewährleistet. Der modulare Aufbau des Systems, bestehend aus drei einzelnen Boards, gewährleistet die Anpassbarkeit an unterschiedliche Anforderungen. Das komplette System wird an der Kopfhaut befestigt, kann autonom arbeiten, benötigt keine externe Interaktion und wiegt einschließlich der 16-Kanal-EEG-Sensoren nur ca. 56 g. Der Fokus liegt auf voller Mobilität. Das vorgeschlagene anpassbare Datenflusskonzept erleichtert die Untersuchung und nahtlose Integration von Algorithmen und erhöht die Flexibilität des Systems. Dies wird auch durch die Möglichkeit unterstrichen, verschiedene Algorithmen auf EEG-Daten anzuwenden, um unterschiedliche Anwendungsziele zu erreichen. High-Level Synthesis (HLS) wurde verwendet, um die Algorithmen auf das FPGA zu portieren, was den Algorithmenentwicklungsprozess beschleunigt und eine schnelle Implementierung von Algorithmusvarianten ermöglicht. Evaluierungen haben gezeigt, dass das CereBridge-System in der Lage ist, die gesamte Signalverarbeitungskette zu integrieren, die für verschiedene BCI-Anwendungen erforderlich ist. Darüber hinaus kann es mit einer Batterie von mehr als 31 Stunden Dauerbetrieb betrieben werden, was es zu einer praktikablen Lösung für mobile Langzeit-EEG-Aufzeichnungen und reale BCI-Studien macht. Im Vergleich zu bestehenden Forschungsplattformen bietet das CereBridge-System eine bisher unerreichte Leistungsfähigkeit und Ausstattung für ein mobiles BCI. Es erfüllt nicht nur die relevanten Anforderungen an ein mobiles BCI-System, sondern ebnet auch den Weg für eine schnelle Übertragung von Algorithmen aus dem Labor in reale Anwendungen. Im Wesentlichen liefert diese Arbeit einen umfassenden Entwurf für die Entwicklung und Implementierung eines hochmodernen mobilen EEG-basierten BCI-Systems und setzt damit einen neuen Standard für BCI-Hardware, die in der Praxis eingesetzt werden kann.Brain-Computer Interfaces (BCIs) are innovative systems that enable direct communication between the brain and external devices. These interfaces have emerged as a transformative solution not only for individuals with neurological injuries, but also for a broader range of individuals, encompassing both medical and non-medical applications. Historically, the challenge of neurological injury being static after an initial recovery phase has driven researchers to explore innovative avenues. Since the 1970s, BCIs have been at one forefront of these efforts. As research has progressed, BCI applications have expanded, showing potential in a wide range of applications, including those for less severely disabled (e.g. in the context of hearing aids) and completely healthy individuals (e.g. entertainment industry). However, the future of BCI research also depends on the availability of reliable BCI hardware to ensure real-world application. The CereBridge system designed and implemented in this work represents a significant leap forward in brain-computer interface technology by integrating all EEG signal acquisition and processing hardware into a mobile system. The processing hardware architecture is centered around an FPGA with an ARM Cortex-M3 within a heterogeneous IC, ensuring flexibility and efficiency in EEG signal processing. The modular design of the system, consisting of three individual boards, ensures adaptability to different requirements. With a focus on full mobility, the complete system is mounted on the scalp, can operate autonomously, requires no external interaction, and weighs approximately 56g, including 16 channel EEG sensors. The proposed customizable dataflow concept facilitates the exploration and seamless integration of algorithms, increasing the flexibility of the system. This is further underscored by the ability to apply different algorithms to recorded EEG data to meet different application goals. High-Level Synthesis (HLS) was used to port algorithms to the FPGA, accelerating the algorithm development process and facilitating rapid implementation of algorithm variants. Evaluations have shown that the CereBridge system is capable of integrating the complete signal processing chain required for various BCI applications. Furthermore, it can operate continuously for more than 31 hours with a 1800mAh battery, making it a viable solution for long-term mobile EEG recording and real-world BCI studies. Compared to existing research platforms, the CereBridge system offers unprecedented performance and features for a mobile BCI. It not only meets the relevant requirements for a mobile BCI system, but also paves the way for the rapid transition of algorithms from the laboratory to real-world applications. In essence, this work provides a comprehensive blueprint for the development and implementation of a state-of-the-art mobile EEG-based BCI system, setting a new benchmark in BCI hardware for real-world applicability

    Towards Neuromorphic Gradient Descent: Exact Gradients and Low-Variance Online Estimates for Spiking Neural Networks

    Get PDF
    Spiking Neural Networks (SNNs) are biologically-plausible models that can run on low-powered non-Von Neumann neuromorphic hardware, positioning them as promising alternatives to conventional Deep Neural Networks (DNNs) for energy-efficient edge computing and robotics. Over the past few years, the Gradient Descent (GD) and Error Backpropagation (BP) algorithms used in DNNs have inspired various training methods for SNNs. However, the non-local and the reverse nature of BP, combined with the inherent non-differentiability of spikes, represent fundamental obstacles to computing gradients with SNNs directly on neuromorphic hardware. Therefore, novel approaches are required to overcome the limitations of GD and BP and enable online gradient computation on neuromorphic hardware. In this thesis, I address the limitations of GD and BP with SNNs by proposing three algorithms. First, I extend a recent method that computes exact gradients with temporally-coded SNNs by relaxing the firing constraint of temporal coding and allowing multiple spikes per neuron. My proposed method generalizes the computation of exact gradients with SNNs and enhances the tradeoffs between performance and various other aspects of spiking neurons. Next, I introduce a novel alternative to BP that computes low-variance gradient estimates in a local and online manner. Compared to other alternatives to BP, the proposed method demonstrates an improved convergence rate and increased performance with DNNs. Finally, I combine these two methods and propose an algorithm that estimates gradients with SNNs in a manner that is compatible with the constraints of neuromorphic hardware. My empirical results demonstrate the effectiveness of the resulting algorithm in training SNNs without performing BP

    Resource-aware scheduling for 2D/3D multi-/many-core processor-memory systems

    Get PDF
    This dissertation addresses the complexities of 2D/3D multi-/many-core processor-memory systems, focusing on two key areas: enhancing timing predictability in real-time multi-core processors and optimizing performance within thermal constraints. The integration of an increasing number of transistors into compact chip designs, while boosting computational capacity, presents challenges in resource contention and thermal management. The first part of the thesis improves timing predictability. We enhance shared cache interference analysis for set-associative caches, advancing the calculation of Worst-Case Execution Time (WCET). This development enables accurate assessment of cache interference and the effectiveness of partitioned schedulers in real-world scenarios. We introduce TCPS, a novel task and cache-aware partitioned scheduler that optimizes cache partitioning based on task-specific WCET sensitivity, leading to improved schedulability and predictability. Our research explores various cache and scheduling configurations, providing insights into their performance trade-offs. The second part focuses on thermal management in 2D/3D many-core systems. Recognizing the limitations of Dynamic Voltage and Frequency Scaling (DVFS) in S-NUCA many-core processors, we propose synchronous thread migrations as a thermal management strategy. This approach culminates in the HotPotato scheduler, which balances performance and thermal safety. We also introduce 3D-TTP, a transient temperature-aware power budgeting strategy for 3D-stacked systems, reducing the need for Dynamic Thermal Management (DTM) activation. Finally, we present 3QUTM, a novel method for 3D-stacked systems that combines core DVFS and memory bank Low Power Modes with a learning algorithm, optimizing response times within thermal limits. This research contributes significantly to enhancing performance and thermal management in advanced processor-memory systems

    A database accelerator for energy-efficient query processing and optimization

    Get PDF
    Data processing on a continuously growing amount of information and the increasing power restrictions have become an ubiquitous challenge in our world today. Besides parallel computing, a promising approach to improve the energy efficiency of current systems is to integrate specialized hardware. This paper presents a Tensilica RISC processor extended with an instruction set to accelerate basic database operators frequently used in modern database systems. The core was taped out in a 28 nm SLP CMOS technology and allows energy-efficient query processing as well as query optimization by applying selectivity estimation techniques. Our chip measurements show an 1000x energy improvement on selected database operators compared to state-of-the-art systems

    Reliable Sensor Intelligence in Resource Constrained and Unreliable Environment

    Get PDF
    The objective of this research is to design a sensor intelligence that is reliable in a resource constrained, unreliable environment. There are various sources of variations and uncertainty involved in intelligent sensor system, so it is critical to build reliable sensor intelligence. Many prior works seek to design reliable sensor intelligence by developing robust and reliable task. This thesis suggests that along with improving task itself, task reliability quantification based early warning can further improve sensor intelligence. DNN based early warning generator quantifies task reliability based on spatiotemporal characteristics of input, and the early warning controls sensor parameters and avoids system failure. This thesis presents an early warning generator that predicts task failure due to sensor hardware induced input corruption and controls the sensor operation. Moreover, lightweight uncertainty estimator is presented to take account of DNN model uncertainty in task reliability quantification without prohibitive computation from stochastic DNN. Cross-layer uncertainty estimation is also discussed to consider the effect of PIM variations.Ph.D

    Architecture and Circuit Design Optimization for Compute-In-Memory

    Get PDF
    The objective of the proposed research is to optimize computing-in-memory (CIM) design for accelerating Deep Neural Network (DNN) algorithms. As compute peripheries such as analog-to-digital converter (ADC) introduce significant overhead in CIM inference design, the research first focuses on the circuit optimization for inference acceleration and proposes a resistive random access memory (RRAM) based ADC-free in-memory compute scheme. We comprehensively explore the trade-offs involving different types of ADCs and investigate a new ADC design especially suited for the CIM, which performs the analog shift-add for multiple weight significance bits, improving the throughput and energy efficiency under similar area constraints. Furthermore, we prototype an ADC-free CIM inference chip design with a fully-analog data processing manner between sub-arrays, which can significantly improve the hardware performance over the conventional CIM designs and achieve near-software classification accuracy on ImageNet and CIFAR-10/-100 dataset. Secondly, the research focuses on hardware support for CIM on-chip training. To maximize hardware reuse of CIM weight stationary dataflow, we propose the CIM training architectures with the transpose weight mapping strategy. The cell design and periphery circuitry are modified to efficiently support bi-directional compute. A novel solution of signed number multiplication is also proposed to handle the negative input in backpropagation. Finally, we propose an SRAM-based CIM training architecture and comprehensively explore the system-level hardware performance for DNN on-chip training based on silicon measurement results.Ph.D

    LIPIcs, Volume 251, ITCS 2023, Complete Volume

    Get PDF
    LIPIcs, Volume 251, ITCS 2023, Complete Volum

    Performance Anomalies in Concurrent Data Structure Microbenchmarks

    Get PDF
    Recent decades have witnessed a surge in the development of concurrent data structures with an increasing interest in data structures implementing concurrent sets (CSets). Microbenchmarking tools are frequently utilized to evaluate and compare the performance differences across concurrent data structures. The underlying structure and design of the microbenchmarks themselves can play a hidden but influential role in performance results. However, the impact of microbenchmark design has not been well investigated. In this work, we illustrate instances where concurrent data structure performance results reported by a microbenchmark can vary 10-100x depending on the microbenchmark implementation details. We investigate factors leading to performance variance across three popular microbenchmarks and outline cases in which flawed microbenchmark design can lead to an inversion of performance results between two concurrent data structure implementations. We further derive a set of recommendations for best practices in the design and usage of concurrent data structure microbenchmarks and explore advanced features in the Setbench microbenchmark

    Towards a centralized multicore automotive system

    Get PDF
    Today’s automotive systems are inundated with embedded electronics to host chassis, powertrain, infotainment, advanced driver assistance systems, and other modern vehicle functions. As many as 100 embedded microcontrollers execute hundreds of millions of lines of code in a single vehicle. To control the increasing complexity in vehicle electronics and services, automakers are planning to consolidate different on-board automotive functions as software tasks on centralized multicore hardware platforms. However, these vehicle software services have different and contrasting timing, safety, and security requirements. Existing vehicle operating systems are ill-equipped to provide all the required service guarantees on a single machine. A centralized automotive system aims to tackle this by assigning software tasks to multiple criticality domains or levels according to their consequences of failures, or international safety standards like ISO 26262. This research investigates several emerging challenges in time-critical systems for a centralized multicore automotive platform and proposes a novel vehicle operating system framework to address them. This thesis first introduces an integrated vehicle management system (VMS), called DriveOS™, for a PC-class multicore hardware platform. Its separation kernel design enables temporal and spatial isolation among critical and non-critical vehicle services in different domains on the same machine. Time- and safety-critical vehicle functions are implemented in a sandboxed Real-time Operating System (OS) domain, and non-critical software is developed in a sandboxed general-purpose OS (e.g., Linux, Android) domain. To leverage the advantages of model-driven vehicle function development, DriveOS provides a multi-domain application framework in Simulink. This thesis also presents a real-time task pipeline scheduling algorithm in multiprocessors for communication between connected vehicle services with end-to-end guarantees. The benefits and performance of the overall automotive system framework are demonstrated with hardware-in-the-loop testing using real-world applications, car datasets and simulated benchmarks, and with an early-stage deployment in a production-grade luxury electric vehicle

    SCV-GNN: Sparse Compressed Vector-based Graph Neural Network Aggregation

    Full text link
    Graph neural networks (GNNs) have emerged as a powerful tool to process graph-based data in fields like communication networks, molecular interactions, chemistry, social networks, and neuroscience. GNNs are characterized by the ultra-sparse nature of their adjacency matrix that necessitates the development of dedicated hardware beyond general-purpose sparse matrix multipliers. While there has been extensive research on designing dedicated hardware accelerators for GNNs, few have extensively explored the impact of the sparse storage format on the efficiency of the GNN accelerators. This paper proposes SCV-GNN with the novel sparse compressed vectors (SCV) format optimized for the aggregation operation. We use Z-Morton ordering to derive a data-locality-based computation ordering and partitioning scheme. The paper also presents how the proposed SCV-GNN is scalable on a vector processing system. Experimental results over various datasets show that the proposed method achieves a geometric mean speedup of 7.96Ă—7.96\times and 7.04Ă—7.04\times over CSC and CSR aggregation operations, respectively. The proposed method also reduces the memory traffic by a factor of 3.29Ă—3.29\times and 4.37Ă—4.37\times over compressed sparse column (CSC) and compressed sparse row (CSR), respectively. Thus, the proposed novel aggregation format reduces the latency and memory access for GNN inference
    • …
    corecore