987 research outputs found

    Low Power system Design techniques for mobile computers

    Get PDF
    Portable products are being used increasingly. Because these systems are battery powered, reducing power consumption is vital. In this report we give the properties of low power design and techniques to exploit them on the architecture of the system. We focus on: min imizing capacitance, avoiding unnecessary and wasteful activity, and reducing voltage and frequency. We review energy reduction techniques in the architecture and design of a hand-held computer and the wireless communication system, including error control, sys tem decomposition, communication and MAC protocols, and low power short range net works

    Scalable Analysis, Verification and Design of IC Power Delivery

    Get PDF
    Due to recent aggressive process scaling into the nanometer regime, power delivery network design faces many challenges that set more stringent and specific requirements to the EDA tools. For example, from the perspective of analysis, simulation efficiency for large grids must be improved and the entire network with off-chip models and nonlinear devices should be able to be analyzed. Gated power delivery networks have multiple on/off operating conditions that need to be fully verified against the design requirements. Good power delivery network designs not only have to save the wiring resources for signal routing, but also need to have the optimal parameters assigned to various system components such as decaps, voltage regulators and converters. This dissertation presents new methodologies to address these challenging problems. At first, a novel parallel partitioning-based approach which provides a flexible network partitioning scheme using locality is proposed for power grid static analysis. In addition, a fast CPU-GPU combined analysis engine that adopts a boundary-relaxation method to encompass several simulation strategies is developed to simulate power delivery networks with off-chip models and active circuits. These two proposed analysis approaches can achieve scalable simulation runtime. Then, for gated power delivery networks, the challenge brought by the large verification space is addressed by developing a strategy that efficiently identifies a number of candidates for the worst-case operating condition. The computation complexity is reduced from O(2^N) to O(N). At last, motivated by a proposed two-level hierarchical optimization, this dissertation presents a novel locality-driven partitioning scheme to facilitate divide-and-conquer-based scalable wire sizing for large power delivery networks. Simultaneous sizing of multiple partitions is allowed which leads to substantial runtime improvement. Moreover, the electric interactions between active regulators/converters and passive networks and their influences on key system design specifications are analyzed comprehensively. With the derived design insights, the system-level co-design of a complete power delivery network is facilitated by an automatic optimization flow. Results show significant performance enhancement brought by the co-design

    Locality-Driven Parallel Static Analysis for Power Delivery Networks

    Full text link

    Doctor of Philosophy

    Get PDF
    dissertationDeep Neural Networks (DNNs) are the state-of-art solution in a growing number of tasks including computer vision, speech recognition, and genomics. However, DNNs are computationally expensive as they are carefully trained to extract and abstract features from raw data using multiple layers of neurons with millions of parameters. In this dissertation, we primarily focus on inference, e.g., using a DNN to classify an input image. This is an operation that will be repeatedly performed on billions of devices in the datacenter, in self-driving cars, in drones, etc. We observe that DNNs spend a vast majority of their runtime to runtime performing matrix-by-vector multiplications (MVM). MVMs have two major bottlenecks: fetching the matrix and performing sum-of-product operations. To address these bottlenecks, we use in-situ computing, where the matrix is stored in programmable resistor arrays, called crossbars, and sum-of-product operations are performed using analog computing. In this dissertation, we propose two hardware units, ISAAC and Newton.In ISAAC, we show that in-situ computing designs can outperform DNN digital accelerators, if they leverage pipelining, smart encodings, and can distribute a computation in time and space, within crossbars, and across crossbars. In the ISAAC design, roughly half the chip area/power can be attributed to the analog-to-digital conversion (ADC), i.e., it remains the key design challenge in mixed-signal accelerators for deep networks. In spite of the ADC bottleneck, ISAAC is able to out-perform the computational efficiency of the state-of-the-art design (DaDianNao) by 8x. In Newton, we take advantage of a number of techniques to address ADC inefficiency. These techniques exploit matrix transformations, heterogeneity, and smart mapping of computation to the analog substrate. We show that Newton can increase the efficiency of in-situ computing by an additional 2x. Finally, we show that in-situ computing, unfortunately, cannot be easily adapted to handle training of deep networks, i.e., it is only suitable for inference of already-trained networks. By improving the efficiency of DNN inference with ISAAC and Newton, we move closer to low-cost deep learning that in turn will have societal impact through self-driving cars, assistive systems for the disabled, and precision medicine

    An FPGA implementation of an investigative many-core processor, Fynbos : in support of a Fortran autoparallelising software pipeline

    Get PDF
    Includes bibliographical references.In light of the power, memory, ILP, and utilisation walls facing the computing industry, this work examines the hypothetical many-core approach to finding greater compute performance and efficiency. In order to achieve greater efficiency in an environment in which Moore’s law continues but TDP has been capped, a means of deriving performance from dark and dim silicon is needed. The many-core hypothesis is one approach to exploiting these available transistors efficiently. As understood in this work, it involves trading in hardware control complexity for hundreds to thousands of parallel simple processing elements, and operating at a clock speed sufficiently low as to allow the efficiency gains of near threshold voltage operation. Performance is there- fore dependant on exploiting a new degree of fine-grained parallelism such as is currently only found in GPGPUs, but in a manner that is not as restrictive in application domain range. While removing the complex control hardware of traditional CPUs provides space for more arithmetic hardware, a basic level of control is still required. For a number of reasons this work chooses to replace this control largely with static scheduling. This pushes the burden of control primarily to the software and specifically the compiler, rather not to the programmer or to an application specific means of control simplification. An existing legacy tool chain capable of autoparallelising sequential Fortran code to the degree of parallelism necessary for many-core exists. This work implements a many-core architecture to match it. Prototyping the design on an FPGA, it is possible to examine the real world performance of the compiler-architecture system to a greater degree than simulation only would allow. Comparing theoretical peak performance and real performance in a case study application, the system is found to be more efficient than any other reviewed, but to also significantly under perform relative to current competing architectures. This failing is apportioned to taking the need for simple hardware too far, and an inability to implement static scheduling mitigating tactics due to lack of support for such in the compiler

    A three-threshold learning rule approaches the maximal capacity of recurrent neural networks

    Get PDF
    Understanding the theoretical foundations of how memories are encoded and retrieved in neural populations is a central challenge in neuroscience. A popular theoretical scenario for modeling memory function is the attractor neural network scenario, whose prototype is the Hopfield model. The model has a poor storage capacity, compared with the capacity achieved with perceptron learning algorithms. Here, by transforming the perceptron learning rule, we present an online learning rule for a recurrent neural network that achieves near-maximal storage capacity without an explicit supervisory error signal, relying only upon locally accessible information. The fully-connected network consists of excitatory binary neurons with plastic recurrent connections and non-plastic inhibitory feedback stabilizing the network dynamics; the memory patterns are presented online as strong afferent currents, producing a bimodal distribution for the neuron synaptic inputs. Synapses corresponding to active inputs are modified as a function of the value of the local fields with respect to three thresholds. Above the highest threshold, and below the lowest threshold, no plasticity occurs. In between these two thresholds, potentiation/depression occurs when the local field is above/below an intermediate threshold. We simulated and analyzed a network of binary neurons implementing this rule and measured its storage capacity for different sizes of the basins of attraction. The storage capacity obtained through numerical simulations is shown to be close to the value predicted by analytical calculations. We also measured the dependence of capacity on the strength of external inputs. Finally, we quantified the statistics of the resulting synaptic connectivity matrix, and found that both the fraction of zero weight synapses and the degree of symmetry of the weight matrix increase with the number of stored patterns.Comment: 24 pages, 10 figures, to be published in PLOS Computational Biolog

    Dependable Embedded Systems

    Get PDF
    This Open Access book introduces readers to many new techniques for enhancing and optimizing reliability in embedded systems, which have emerged particularly within the last five years. This book introduces the most prominent reliability concerns from today’s points of view and roughly recapitulates the progress in the community so far. Unlike other books that focus on a single abstraction level such circuit level or system level alone, the focus of this book is to deal with the different reliability challenges across different levels starting from the physical level all the way to the system level (cross-layer approaches). The book aims at demonstrating how new hardware/software co-design solution can be proposed to ef-fectively mitigate reliability degradation such as transistor aging, processor variation, temperature effects, soft errors, etc. Provides readers with latest insights into novel, cross-layer methods and models with respect to dependability of embedded systems; Describes cross-layer approaches that can leverage reliability through techniques that are pro-actively designed with respect to techniques at other layers; Explains run-time adaptation and concepts/means of self-organization, in order to achieve error resiliency in complex, future many core systems

    Cross-Layer Rapid Prototyping and Synthesis of Application-Specific and Reconfigurable Many-accelerator Platforms

    Get PDF
    Technological advances of recent years laid the foundation consolidation of informatisationof society, impacting on economic, political, cultural and socialdimensions. At the peak of this realization, today, more and more everydaydevices are connected to the web, giving the term ”Internet of Things”. The futureholds the full connection and interaction of IT and communications systemsto the natural world, delimiting the transition to natural cyber systems and offeringmeta-services in the physical world, such as personalized medical care, autonomoustransportation, smart energy cities etc. . Outlining the necessities of this dynamicallyevolving market, computer engineers are required to implement computingplatforms that incorporate both increased systemic complexity and also cover awide range of meta-characteristics, such as the cost and design time, reliabilityand reuse, which are prescribed by a conflicting set of functional, technical andconstruction constraints. This thesis aims to address these design challenges bydeveloping methodologies and hardware/software co-design tools that enable therapid implementation and efficient synthesis of architectural solutions, which specifyoperating meta-features required by the modern market. Specifically, this thesispresents a) methodologies to accelerate the design flow for both reconfigurableand application-specific architectures, b) coarse-grain heterogeneous architecturaltemplates for processing and communication acceleration and c) efficient multiobjectivesynthesis techniques both at high abstraction level of programming andphysical silicon level.Regarding to the acceleration of the design flow, the proposed methodologyemploys virtual platforms in order to hide architectural details and drastically reducesimulation time. An extension of this framework introduces the systemicco-simulation using reconfigurable acceleration platforms as co-emulation intermediateplatforms. Thus, the development cycle of a hardware/software productis accelerated by moving from a vertical serial flow to a circular interactive loop.Moreover the simulation capabilities are enriched with efficient detection and correctiontechniques of design errors, as well as control methods of performancemetrics of the system according to the desired specifications, during all phasesof the system development. In orthogonal correlation with the aforementionedmethodological framework, a new architectural template is proposed, aiming atbridging the gap between design complexity and technological productivity usingspecialized hardware accelerators in heterogeneous systems-on-chip and networkon-chip platforms. It is presented a novel co-design methodology for the hardwareaccelerators and their respective programming software, including the tasks allocationto the available resources of the system/network. The introduced frameworkprovides implementation techniques for the accelerators, using either conventionalprogramming flows with hardware description language or abstract programmingmodel flows, using techniques from high-level synthesis. In any case, it is providedthe option of systemic measures optimization, such as the processing speed,the throughput, the reliability, the power consumption and the design silicon area.Finally, on addressing the increased complexity in design tools of reconfigurablesystems, there are proposed novel multi-objective optimization evolutionary algo-rithms which exploit the modern multicore processors and the coarse-grain natureof multithreaded programming environments (e.g. OpenMP) in order to reduce theplacement time, while by simultaneously grouping the applications based on theirintrinsic characteristics, the effectively explore the design space effectively.The efficiency of the proposed architectural templates, design tools and methodologyflows is evaluated in relation to the existing edge solutions with applicationsfrom typical computing domains, such as digital signal processing, multimedia andarithmetic complexity, as well as from systemic heterogeneous environments, suchas a computer vision system for autonomous robotic space navigation and manyacceleratorsystems for HPC and workstations/datacenters. The results strengthenthe belief of the author, that this thesis provides competitive expertise to addresscomplex modern - and projected future - design challenges.Οι τεχνολογικές εξελίξεις των τελευταίων ετών έθεσαν τα θεμέλια εδραίωσης της πληροφοριοποίησης της κοινωνίας, επιδρώντας σε οικονομικές,πολιτικές, πολιτιστικές και κοινωνικές διαστάσεις. Στο απόγειο αυτής τη ςπραγμάτωσης, σήμερα, ολοένα και περισσότερες καθημερινές συσκευές συνδέονται στο παγκόσμιο ιστό, αποδίδοντας τον όρο «Ίντερνετ των πραγμάτων».Το μέλλον επιφυλάσσει την πλήρη σύνδεση και αλληλεπίδραση των συστημάτων πληροφορικής και επικοινωνιών με τον φυσικό κόσμο, οριοθετώντας τη μετάβαση στα συστήματα φυσικού κυβερνοχώρου και προσφέροντας μεταυπηρεσίες στον φυσικό κόσμο όπως προσωποποιημένη ιατρική περίθαλψη, αυτόνομες μετακινήσεις, έξυπνες ενεργειακά πόλεις κ.α. . Σκιαγραφώντας τις ανάγκες αυτής της δυναμικά εξελισσόμενης αγοράς, οι μηχανικοί υπολογιστών καλούνται να υλοποιήσουν υπολογιστικές πλατφόρμες που αφενός ενσωματώνουν αυξημένη συστημική πολυπλοκότητα και αφετέρου καλύπτουν ένα ευρύ φάσμα μεταχαρακτηριστικών, όπως λ.χ. το κόστος σχεδιασμού, ο χρόνος σχεδιασμού, η αξιοπιστία και η επαναχρησιμοποίηση, τα οποία προδιαγράφονται από ένα αντικρουόμενο σύνολο λειτουργικών, τεχνολογικών και κατασκευαστικών περιορισμών. Η παρούσα διατριβή στοχεύει στην αντιμετώπιση των παραπάνω σχεδιαστικών προκλήσεων, μέσω της ανάπτυξης μεθοδολογιών και εργαλείων συνσχεδίασης υλικού/λογισμικού που επιτρέπουν την ταχεία υλοποίηση καθώς και την αποδοτική σύνθεση αρχιτεκτονικών λύσεων, οι οποίες προδιαγράφουν τα μετα-χαρακτηριστικά λειτουργίας που απαιτεί η σύγχρονη αγορά. Συγκεκριμένα, στα πλαίσια αυτής της διατριβής, παρουσιάζονται α) μεθοδολογίες επιτάχυνσης της ροής σχεδιασμού τόσο για επαναδιαμορφούμενες όσο και για εξειδικευμένες αρχιτεκτονικές, β) ετερογενή αδρομερή αρχιτεκτονικά πρότυπα επιτάχυνσης επεξεργασίας και επικοινωνίας και γ) αποδοτικές τεχνικές πολυκριτηριακής σύνθεσης τόσο σε υψηλό αφαιρετικό επίπεδο προγραμματισμού,όσο και σε φυσικό επίπεδο πυριτίου.Αναφορικά προς την επιτάχυνση της ροής σχεδιασμού, προτείνεται μια μεθοδολογία που χρησιμοποιεί εικονικές πλατφόρμες, οι οποίες αφαιρώντας τις αρχιτεκτονικές λεπτομέρειες καταφέρνουν να μειώσουν σημαντικά το χρόνο εξομοίωσης. Παράλληλα, εισηγείται η συστημική συν-εξομοίωση με τη χρήση επαναδιαμορφούμενων πλατφορμών, ως μέσων επιτάχυνσης. Με αυτόν τον τρόπο, ο κύκλος ανάπτυξης ενός προϊόντος υλικού, μετατεθειμένος από την κάθετη σειριακή ροή σε έναν κυκλικό αλληλεπιδραστικό βρόγχο, καθίσταται ταχύτερος, ενώ οι δυνατότητες προσομοίωσης εμπλουτίζονται με αποδοτικότερες μεθόδους εντοπισμού και διόρθωσης σχεδιαστικών σφαλμάτων, καθώς και μεθόδους ελέγχου των μετρικών απόδοσης του συστήματος σε σχέση με τις επιθυμητές προδιαγραφές, σε όλες τις φάσεις ανάπτυξης του συστήματος. Σε ορθογώνια συνάφεια με το προαναφερθέν μεθοδολογικό πλαίσιο, προτείνονται νέα αρχιτεκτονικά πρότυπα που στοχεύουν στη γεφύρωση του χάσματος μεταξύ της σχεδιαστικής πολυπλοκότητας και της τεχνολογικής παραγωγικότητας, με τη χρήση συστημάτων εξειδικευμένων επιταχυντών υλικού σε ετερογενή συστήματα-σε-ψηφίδα καθώς και δίκτυα-σε-ψηφίδα. Παρουσιάζεται κατάλληλη μεθοδολογία συν-σχεδίασης των επιταχυντών υλικού και του λογισμικού προκειμένου να αποφασισθεί η κατανομή των εργασιών στους διαθέσιμους πόρους του συστήματος/δικτύου. Το μεθοδολογικό πλαίσιο προβλέπει την υλοποίηση των επιταχυντών είτε με συμβατικές μεθόδους προγραμματισμού σε γλώσσα περιγραφής υλικού είτε με αφαιρετικό προγραμματιστικό μοντέλο με τη χρήση τεχνικών υψηλού επιπέδου σύνθεσης. Σε κάθε περίπτωση, δίδεται η δυνατότητα στο σχεδιαστή για βελτιστοποίηση συστημικών μετρικών, όπως η ταχύτητα επεξεργασίας, η ρυθμαπόδοση, η αξιοπιστία, η κατανάλωση ενέργειας και η επιφάνεια πυριτίου του σχεδιασμού. Τέλος, προκειμένου να αντιμετωπισθεί η αυξημένη πολυπλοκότητα στα σχεδιαστικά εργαλεία επαναδιαμορφούμενων συστημάτων, προτείνονται νέοι εξελικτικοί αλγόριθμοι πολυκριτηριακής βελτιστοποίησης, οι οποίοι εκμεταλλευόμενοι τους σύγχρονους πολυπύρηνους επεξεργαστές και την αδρομερή φύση των πολυνηματικών περιβαλλόντων προγραμματισμού (π.χ. OpenMP), μειώνουν το χρόνο επίλυσης του προβλήματος της τοποθέτησης των λογικών πόρων σε φυσικούς,ενώ ταυτόχρονα, ομαδοποιώντας τις εφαρμογές βάση των εγγενών χαρακτηριστικών τους, διερευνούν αποτελεσματικότερα το χώρο σχεδίασης.Η αποδοτικότητά των προτεινόμενων αρχιτεκτονικών προτύπων και μεθοδολογιών επαληθεύτηκε σε σχέση με τις υφιστάμενες λύσεις αιχμής τόσο σε αυτοτελής εφαρμογές, όπως η ψηφιακή επεξεργασία σήματος, τα πολυμέσα και τα προβλήματα αριθμητικής πολυπλοκότητας, καθώς και σε συστημικά ετερογενή περιβάλλοντα, όπως ένα σύστημα όρασης υπολογιστών για αυτόνομα διαστημικά ρομποτικά οχήματα και ένα σύστημα πολλαπλών επιταχυντών υλικού για σταθμούς εργασίας και κέντρα δεδομένων, στοχεύοντας εφαρμογές υψηλής υπολογιστικής απόδοσης (HPC). Τα αποτελέσματα ενισχύουν την πεποίθηση του γράφοντα, ότι η παρούσα διατριβή παρέχει ανταγωνιστική τεχνογνωσία για την αντιμετώπιση των πολύπλοκων σύγχρονων και προβλεπόμενα μελλοντικών σχεδιαστικών προκλήσεων
    corecore