2,370 research outputs found

    Hardware Accelerator for Convolutional Neural Networks

    Get PDF
    Στην παρούσα εργασία επιχειρούμε να συζητήσουμε το θέμα των επιταχυντών για νευρωνικά δίκτυα που γίνεται αντιληπτό ως συνδετικός κρίκος ανάμεσα σε δύο πεδία της επιστήμης των υπολογιστών: τη σχεδίαση παράλληλων συστημάτων και τη σχεδίαση αλγορίθμων μηχανικής μάθησης. Το πρώτο πεδίο ερευνά τους τρόπους με τους οποίους προγράμματα υψηλού επιπέδου αντιστοιχίζονται σε δομές υλικού με σκοπό την επιτάχυνση των υπολογισμών. Το δεύτερο ερευνά τη σχεδίαση δυναμικών αλγορίθμων που έχουν ως στόχο την προσέγγιση της λύσης προβλημάτων που είναι εξαιρετικά δύσκολο (ή ακόμη κι αδύνατο) να επιλυθούν από προγράμματα γραμμένα εξ’ ολοκλήρου από το χρήστη. Ο κλάδος της μηχανικής μάθησης έχει αναδείξει την τελευταία δεκαετία ένα πλήθος εφαρμογών και δυνατοτήτων κυρίως μέσω της βαθιάς μάθησης και των νευρωνικών δικτύων. Τα συνελικτικά νευρωνικά δίκτυα είναι μια ειδική κατηγορία βαθιών νευρωνικών δικτύων που χρησιμοποιούνται για ανίχνευση εικόνων καθώς και σε πολλές άλλες εφαρμογές που σχετίζονται με την υπολογιστική όραση. Ένας επιταχυντής υλικού είναι ένα σύστημα ειδικού σκοπού εξειδικευμένο στην αύξηση της ταχύτητας των εντατικών υπολογισμών που αποτελούν μέρος του αλγορίθμου ενός νευρωνικού δικτύου. Σκοπός του είναι τα αποτελέσματα των υπολογισμών να παράγονται γρήγορα και αποδοτικά όσον αφορά την κατανάλωση ενέργειας. Στα επόμενα κεφάλαια θα προσεγγίσουμε το θεωρητικό υπόβαθρο των παράλληλων υπολογισμών καθώς και εκείνο των νευρωνικών δικτύων και θα παρουσιάσουμε τη σχεδίαση ενός επιταχυντή συνελικτικού νευρωνικού δικτύου σε FPGA. Η υλοποίηση πραγματοποιήθηκε σε γλώσσα VHDL και βρίσκεται στο σύνδεσμο: https://github.com/AggelosPsimitis/FPGA-hardware-accelerator-for-CNN/tree/master .In this thesis we attempt to discuss the concept of hardware accelerators for convolutional neural networks which can be perceived as a linkage between two different areas of computer science. The design of parallel systems and the design machine learning algorithms. The former exploits ways of mapping high level programs into hardware structures and has been growing since the rise of VLSI engineering in the last 40 years. The latter addresses the idea of creating dynamic algorithms that proceed in an iterative fashion, based on some task and some quantity of experience, in order to solve problems that are extremely difficult (or even impossible) to be solved using hard-coded programs. In the last 10 years machine learning has opened a vast world filled with endless possibilities and countless applications, especially through deep learning and deep neural networks (DNNs). Convolutional neural networks (CNNs) are a special case of DNNs which are used for image recognition in various computer-vision related tasks. A hardware accelerator is a special-purpose system dedicated to increase the speed of the computationally intensive parts of the CNN algorithm for results to be given both fast and efficiently regarding energy consumption. In what follows, we will approach the theoretical background of parallel computations and neural networks and we will present the FPGA design of the CNN accelerator. All VHDL code written for this work can be found in https://github.com/ AggelosPsimitis/FPGA-hardware-accelerator-for-CNN/tree/master

    Engineered Materials to Measure and Regulate Cell Mechanotransduction

    Get PDF
    The extracellular environment plays a key role in a wide array of cellular functions including migration, tissue formation, and differentiation. This thesis overviews the design of a molecular sensor to measure cellular forces and a hydrogel system to engineer angiogenic sprouting. We developed molecular force probes (FPs) that report traction forces of adherent cells with high spatial resolution, can be linked to virtually any surface, and do not require monitoring deformations of elastic substrates. FPs consist of DNA hairpins conjugated to fluorophore-quencher pairs that unfold and fluoresce when subjected to specific amounts of force. In chapter two we overview the synthetic strategies to produce these FPs from solid-state synthesis. We then demonstrate the chemical and physical characterization of these FPs. These data show that the FPs can be designed rationally from existing knowledge of the force-responsiveness of DNA hairpins. Chapter three summarizes our methods to affix these FPs to solid substrates to measure cellular traction forces. The silane chemistry to conjugate these FPs to glass coverslips is reported in detail. Then, the results of converting the fluorescence of these FPs to force values is given along with biological validation. We find using this method that cellular tractions are exerted at the distal ends of focal adhesions. In chapter four we present a versatile bioactive PEG hydrogel to study angiogenesis. This material is MMP-degradable and cell-adhesive. We show a microfabrication strategy to micromold these gels to pattern angiogenic sprouting from ex vivo tissue explants

    System specification and performance analysis

    Get PDF

    Dynamic EEG analysis during language comprehension reveals interactive cascades between perceptual processing and sentential expectations

    Get PDF
    Available online 18 October 2020.Understanding spoken language requires analysis of the rapidly unfolding speech signal at multiple levels: acoustic, phonological, and semantic. However, there is not yet a comprehensive picture of how these levels relate. We recorded electroencephalography (EEG) while listeners (N = 31) heard sentences in which we manipulated acoustic ambiguity (e.g., a bees/peas continuum) and sentential expectations (e.g., Honey is made by bees). EEG was analyzed with a mixed effects model over time to quantify how language processing cascades proceed on a millisecond-by-millisecond basis. Our results indicate: (1) perceptual processing and memory for fine-grained acoustics is preserved in brain activity for up to 900 msec; (2) contextual analysis begins early and is graded with respect to the acoustic signal; and (3) top-down predictions influence perceptual processing in some cases, however, these predictions are available simultaneously with the veridical signal. These mechanistic insights provide a basis for a better understanding of the cortical language network.This work was supported by NIH grant DC008089 awarded to BM. This work was partially supported by the Basque Government through the BERC 2018–2021 program and by the Spanish State Research Agency through BCBL Severo Ochoa excellence accreditation SEV-2015- 0490, as well as by a postdoctoral grant from the Spanish Ministry of Economy and Competitiveness (MINECO; reference FJCI-2016-28019), awarded to EK

    Reconfigurable acceleration of Recurrent Neural Networks

    Get PDF
    Recurrent Neural Networks (RNNs) have been successful in a wide range of applications involving temporal sequences such as natural language processing, speech recognition and video analysis. However, RNNs often require a significant amount of memory and computational resources. In addition, the recurrent nature and data dependencies in RNN computations can lead to system stall, resulting in low throughput and high latency. This work describes novel parallel hardware architectures for accelerating RNN inference using Field-Programmable Gate Array (FPGA) technology, which considers the data dependencies and high computational costs of RNNs. The first contribution of this thesis is a latency-hiding architecture that utilizes column-wise matrix-vector multiplication instead of the conventional row-wise operation to eliminate data dependencies and improve the throughput of RNN inference designs. This architecture is further enhanced by a configurable checkerboard tiling strategy which allows large dimensions of weight matrices, while supporting element-based parallelism and vector-based parallelism. The presented reconfigurable RNN designs show significant speedup over CPU, GPU, and other FPGA designs. The second contribution of this thesis is a weight reuse approach for large RNN models with weights stored in off-chip memory, running with a batch size of one. A novel blocking-batching strategy is proposed to optimize the throughput of large RNN designs on FPGAs by reusing the RNN weights. Performance analysis is also introduced to enable FPGA designs to achieve the best trade-off between area, power consumption and performance. Promising power efficiency improvement has been achieved in addition to speeding up over CPU and GPU designs. The third contribution of this thesis is a low latency design for RNNs based on a partially-folded hardware architecture. It also introduces a technique that balances initiation interval of multi-layer RNN inferences to increase hardware efficiency and throughput while reducing latency. The approach is evaluated on a variety of applications, including gravitational wave detection and Bayesian RNN-based ECG anomaly detection. To facilitate the use of this approach, we open source an RNN template which enables the generation of low-latency FPGA designs with efficient resource utilization using high-level synthesis tools.Open Acces

    A pipelined code mapping scheme for static data flow computers

    Get PDF
    Thesis (Ph. D.)--Massachusetts Institute of Technology, Dept. of Electrical Engineering and Computer Science, 1986.MICROFICHE COPY AVAILABLE IN ARCHIVES AND ENGINEERINGBibliography: leaves 245-252.by Gao Guang Rong.Ph.D
    corecore