29 research outputs found

    PyGFI: Analyzing and Enhancing Robustness of Graph Neural Networks Against Hardware Errors

    Full text link
    Graph neural networks (GNNs) have recently emerged as a promising learning paradigm in learning graph-structured data and have demonstrated wide success across various domains such as recommendation systems, social networks, and electronic design automation (EDA). Like other deep learning (DL) methods, GNNs are being deployed in sophisticated modern hardware systems, as well as dedicated accelerators. However, despite the popularity of GNNs and the recent efforts of bringing GNNs to hardware, the fault tolerance and resilience of GNNs have generally been overlooked. Inspired by the inherent algorithmic resilience of DL methods, this paper conducts, for the first time, a large-scale and empirical study of GNN resilience, aiming to understand the relationship between hardware faults and GNN accuracy. By developing a customized fault injection tool on top of PyTorch, we perform extensive fault injection experiments on various GNN models and application datasets. We observe that the error resilience of GNN models varies by orders of magnitude with respect to different models and application datasets. Further, we explore a low-cost error mitigation mechanism for GNN to enhance its resilience. This GNN resilience study aims to open up new directions and opportunities for future GNN accelerator design and architectural optimization

    GitFL: Adaptive Asynchronous Federated Learning using Version Control

    Full text link
    As a promising distributed machine learning paradigm that enables collaborative training without compromising data privacy, Federated Learning (FL) has been increasingly used in AIoT (Artificial Intelligence of Things) design. However, due to the lack of efficient management of straggling devices, existing FL methods greatly suffer from the problems of low inference accuracy and long training time. Things become even worse when taking various uncertain factors (e.g., network delays, performance variances caused by process variation) existing in AIoT scenarios into account. To address this issue, this paper proposes a novel asynchronous FL framework named GitFL, whose implementation is inspired by the famous version control system Git. Unlike traditional FL, the cloud server of GitFL maintains a master model (i.e., the global model) together with a set of branch models indicating the trained local models committed by selected devices, where the master model is updated based on both all the pushed branch models and their version information, and only the branch models after the pull operation are dispatched to devices. By using our proposed Reinforcement Learning (RL)-based device selection mechanism, a pulled branch model with an older version will be more likely to be dispatched to a faster and less frequently selected device for the next round of local training. In this way, GitFL enables both effective control of model staleness and adaptive load balance of versioned models among straggling devices, thus avoiding the performance deterioration. Comprehensive experimental results on well-known models and datasets show that, compared with state-of-the-art asynchronous FL methods, GitFL can achieve up to 2.64X training acceleration and 7.88% inference accuracy improvements in various uncertain scenarios

    HIVE-MIND SPACE: A META-DESIGN APPROACH FOR CULTIVATING AND SUPPORTING COLLABORATIVE DESIGN.

    Get PDF
    The ever-growing complexity of design projects requires more knowledge than any individual can have and, therefore, needs the active engagement of all stakeholders in the design process. Collaborative design exploits synergies from multidisciplinary communities, encourages divergent thinking, and enhances social creativity. The research documented in this thesis supports and deepens the understanding of collaborative design in two dimensions: (1) It developed and evaluated socio-technical systems to support collaborative design projects; and (2) It defined and explored a meta- design framework focused on how these systems enable users, as active contributors, to modify and further develop them. The research is grounded in and simultaneously extends the following major dimensions of meta-design: (1) It exploits the contributions of social media and web 2.0 as innovative information technologies; (2) It facilitates the shift from consumer cultures to cultures of participation; (3) It fosters social creativity by harnessing contributions that occur in cultures of participation; (4) It empowers end-users to be active designers involved in creating situated solutions. In a world where change is the norm, meta-design is a necessity rather than a luxury because it is impossible to design software systems at design time for problems that occur only at use time. The co-evolution of systems and users\u2bc social practices pursued in this thesis requires a software environment that can evolve and be tailored continuously. End-user development explores tools and methods to support end users who tailor software artifacts. However, it addresses this objective primarily from a technical perspective and focuses mainly on tailorability. This thesis, centered on meta-design, extends end-user development by creating social conditions and design processes for broad participation in design activities both at design time and at use time. It builds on previous research into meta- design that has provided a strategic overview of design opportunities and principles. And it addresses some shortcomings of meta-design, such as the lack of guidelines for building concrete meta-design environments that can be assessed by empirical evaluation. Given the goal of this research, to explore meta-design approaches for cultivating and supporting collaborative design, the overarching research question guiding this work is: How do we provide a socio-technical environment to bring multidisciplinary design communities together to foster creativity, collaboration, and design evolution? 8 To answer this question, my research was carried out through four different phases: (1) synthesizing concepts, models, and theories; (2) framing conceptual models; (3) developing several systems in specific application areas; and (4) conducting empirical evaluation studies. The main contributions of this research are: \uf0a7 The Hive-Mind Space model, a meta-design framework derived from the \u201csoftware shaping workshop\u201d methodology and that integrates the \u201cseeding, evolutionary growth, reseeding\u201d model. The bottom-up approach inherent in this framework breaks down static social structures so as to support richer ecologies of participation. It provides the means for structuring communication and appropriation. The model\u2bcs open mediation mechanism tackles unanticipated communication gaps among different design communities. \uf0a7 MikiWiki, a structured programmable wiki I developed to demonstrate how the hive-mind space model can be implemented as a practical platform that benefits users and how its features and values can be specified so as to be empirically observable and assessable; \uf0a7 Empirical insights, such as those based on applying MikiWiki to different collaborative design studies, provide evidence that different phases of meta-design represent different modes rather than discrete levels

    Dependable Embedded Systems

    Get PDF
    This Open Access book introduces readers to many new techniques for enhancing and optimizing reliability in embedded systems, which have emerged particularly within the last five years. This book introduces the most prominent reliability concerns from today’s points of view and roughly recapitulates the progress in the community so far. Unlike other books that focus on a single abstraction level such circuit level or system level alone, the focus of this book is to deal with the different reliability challenges across different levels starting from the physical level all the way to the system level (cross-layer approaches). The book aims at demonstrating how new hardware/software co-design solution can be proposed to ef-fectively mitigate reliability degradation such as transistor aging, processor variation, temperature effects, soft errors, etc. Provides readers with latest insights into novel, cross-layer methods and models with respect to dependability of embedded systems; Describes cross-layer approaches that can leverage reliability through techniques that are pro-actively designed with respect to techniques at other layers; Explains run-time adaptation and concepts/means of self-organization, in order to achieve error resiliency in complex, future many core systems

    Power and area efficient clock stretching and critical path reshaping for error resilience

    Get PDF
    Process, voltage and temperature variations are on the rise with technology scaling. Nano-scale technology requires huge design margins to ensure reliable operation. Worst case design margining consumes significant amount of circuits and systems resources. In-situ error detection or correction is an alternative method for cost effective variation tolerance. However, existing in-situ error detection and correction circuits are power and area hungry since they use speculative error management, which gives less power savings at higher error rates. This paper proposes an error resilience technique utilizing available slack in the design. The proposed method uses a clock stretching circuit to relax timing margins on selected critical paths that has sufficient consecutive stage slack. We also propose a power optimization method which reshapes the critical path logic proportionate to the consecutive stage slack. Experimental results show that the proposed method achieves the power and area savings of 40% and 8% respectively compared to the worst case design approach. When compared to the TIMBER error resilience approach, the proposed method saves power more than 74% and area more than 13% at design time. Document type: Articl

    An Inexact Ultra-low Power Bio-signal Processing Architecture With Lightweight Error Recovery

    Get PDF
    The energy efficiency of digital architectures is tightly linked to the voltage level (Vdd) at which they operate. Aggressive voltage scaling is therefore mandatory when ultra-low power processing is required. Nonetheless, the lowest admissible Vdd is oen bounded by reliability concerns, especially since static and dynamic non-idealities are exacerbated in the near-threshold region, imposing costly guard-bands to guarantee correctness under worst-case conditions. A striking alternative, explored in this paper, waives the requirement for unconditional correctness, undergoing more relaxed constraints. First, aer a run-time failure, processing correctly resumes at a later point in time. Second, failures induce a limited Quality-of-Service (QoS) degradation. We focus our investigation on the practical scenario of embedded bio-signal analysis, a domain in which energy efficiency is key, while applications are inherently error-tolerant to a certain degree. Targeting a domain-specific multi-core platform, we present a study of the impact of inexactness on application-visible errors. en, we introduce a novel methodology to manage them, which requires minimal hardware resources and a negligible energy overhead. Experimental evidence show that, by tolerating 900 errors/hour, the resulting inexact platform can achieve an efficiency increase of up to 24%, with a QoS degradation of less than 3%

    An Inexact Ultra-low Power Bio-signal Processing Architecture With Lightweight Error Recovery

    Full text link

    The hare and the tortoise: the problems with the notion of action in ethics

    Full text link
    Wittgenstein once asked, "What is left over if I subtract the fact that my arm goes up from the fact that I raise my arm?" What would be left is, presumably, the quality of 'agency,' which differentiates between legitimate actions and mere behaviors. In my dissertation I investigate the way we conceive of this quality and recommend a change of the prevalent model for one that is developed in a more empirically informed way. Most current work in ethics employs a historically acquired and folk-psychology approved notion of agency. On this view, the distinction between actions and behaviors is fairly clear-cut. Actions proper are characteristic of human beings. They are 'rational' in either the deliberative process that preceded it or in terms of their efficacy; they are launched `autonomously' by the agent's self rather than influenced by context, emotion or habit. These, and a few other conditions have to be fulfilled for an act to earn the badge of an action; falling short of that standard disqualifies it, or, at the very least renders it an imperfect, faulty instance of agency. An agent is thus typically viewed as a disembodied, rational source of conduct, who can withhold her desires and choose between different courses of action using some form of deliberation. I submit that this model survives neither due to its empirical adequacy nor because it is otherwise valuable for ethics (or, more generally, for understanding human behavior). Rather, there is (I argue), a certain widespread philosophical attitude that determines its persistence--a general longing for the stability of the self and an orderly, controllable relationship between the agent and the world. I call the proponents of this attitude "tortoises" and offer a critique of their main claims. I conclude that we must alter this model. The empirical results from psychology and neuroscience suggest that an agent is best viewed as a bundle of modules that are governed by different rules. None of them is "more" the agent than another, but all operate to achieve a state of homeostasis between so the different processes within the agent and the environment

    Quantitative Performance Evaluation of Uncertainty-Aware Hybrid AADL Designs Using Statistical Model Checking

    Get PDF
    International audience— Architecture Analysis and Design Language (AADL) is widely used for the architecture design and analysis of safety-critical real-time systems. Based on the Hybrid annex which supports continuous behavior modeling, Hybrid AADL enables seamless interactions between embedded control systems and continuous physical environments. Although Hybrid AADL is promising in dependability prediction through analyzable architecture development, the worst-case performance analysis of Hybrid AADL designs can easily lead to an overly pessimistic estimation. So far, Hybrid AADL cannot be used to accurately quantify and reason the overall performance of complex systems which interact with external uncertain environments intensively. To address this problem, this paper proposes a statistical model checking based framework that can perform quantitative evaluation of uncertainty-aware Hybrid AADL designs against various performance queries. Our approach extends Hybrid AADL to support the modeling of environment uncertainties. Furthermore, we propose a set of transformation rules that can automatically translate AADL designs together with designers' requirements into Networks of Priced Timed Automata (NPTA) and performance queries, respectively. Comprehensive experimental results on the Movement Authority (MA) scenario of Chinese Train Control System Level 3 (CTCS-3) demonstrate the effectiveness of our approach
    corecore