2,603 research outputs found

    SCV-GNN: Sparse Compressed Vector-based Graph Neural Network Aggregation

    Full text link
    Graph neural networks (GNNs) have emerged as a powerful tool to process graph-based data in fields like communication networks, molecular interactions, chemistry, social networks, and neuroscience. GNNs are characterized by the ultra-sparse nature of their adjacency matrix that necessitates the development of dedicated hardware beyond general-purpose sparse matrix multipliers. While there has been extensive research on designing dedicated hardware accelerators for GNNs, few have extensively explored the impact of the sparse storage format on the efficiency of the GNN accelerators. This paper proposes SCV-GNN with the novel sparse compressed vectors (SCV) format optimized for the aggregation operation. We use Z-Morton ordering to derive a data-locality-based computation ordering and partitioning scheme. The paper also presents how the proposed SCV-GNN is scalable on a vector processing system. Experimental results over various datasets show that the proposed method achieves a geometric mean speedup of 7.96×7.96\times and 7.04×7.04\times over CSC and CSR aggregation operations, respectively. The proposed method also reduces the memory traffic by a factor of 3.29×3.29\times and 4.37×4.37\times over compressed sparse column (CSC) and compressed sparse row (CSR), respectively. Thus, the proposed novel aggregation format reduces the latency and memory access for GNN inference

    20th SC@RUG 2023 proceedings 2022-2023

    Get PDF

    ACiS: smart switches with application-level acceleration

    Full text link
    Network performance has contributed fundamentally to the growth of supercomputing over the past decades. In parallel, High Performance Computing (HPC) peak performance has depended, first, on ever faster/denser CPUs, and then, just on increasing density alone. As operating frequency, and now feature size, have levelled off, two new approaches are becoming central to achieving higher net performance: configurability and integration. Configurability enables hardware to map to the application, as well as vice versa. Integration enables system components that have generally been single function-e.g., a network to transport data—to have additional functionality, e.g., also to operate on that data. More generally, integration enables compute-everywhere: not just in CPU and accelerator, but also in network and, more specifically, the communication switches. In this thesis, we propose four novel methods of enhancing HPC performance through Advanced Computing in the Switch (ACiS). More specifically, we propose various flexible and application-aware accelerators that can be embedded into or attached to existing communication switches to improve the performance and scalability of HPC and Machine Learning (ML) applications. We follow a modular design discipline through introducing composable plugins to successively add ACiS capabilities. In the first work, we propose an inline accelerator to communication switches for user-definable collective operations. MPI collective operations can often be performance killers in HPC applications; we seek to solve this bottleneck by offloading them to reconfigurable hardware within the switch itself. We also introduce a novel mechanism that enables the hardware to support MPI communicators of arbitrary shape and that is scalable to very large systems. In the second work, we propose a look-aside accelerator for communication switches that is capable of processing packets at line-rate. Functions requiring loops and states are addressed in this method. The proposed in-switch accelerator is based on a RISC-V compatible Coarse Grained Reconfigurable Arrays (CGRAs). To facilitate usability, we have developed a framework to compile user-provided C/C++ codes to appropriate back-end instructions for configuring the accelerator. In the third work, we extend ACiS to support fused collectives and the combining of collectives with map operations. We observe that there is an opportunity of fusing communication (collectives) with computation. Since the computation can vary for different applications, ACiS support should be programmable in this method. In the fourth work, we propose that switches with ACiS support can control and manage the execution of applications, i.e., that the switch be an active device with decision-making capabilities. Switches have a central view of the network; they can collect telemetry information and monitor application behavior and then use this information for control, decision-making, and coordination of nodes. We evaluate the feasibility of ACiS through extensive RTL-based simulation as well as deployment in an open-access cloud infrastructure. Using this simulation framework, when considering a Graph Convolutional Network (GCN) application as a case study, a speedup of on average 3.4x across five real-world datasets is achieved on 24 nodes compared to a CPU cluster without ACiS capabilities

    Approximate Computing Survey, Part I: Terminology and Software & Hardware Approximation Techniques

    Full text link
    The rapid growth of demanding applications in domains applying multimedia processing and machine learning has marked a new era for edge and cloud computing. These applications involve massive data and compute-intensive tasks, and thus, typical computing paradigms in embedded systems and data centers are stressed to meet the worldwide demand for high performance. Concurrently, the landscape of the semiconductor field in the last 15 years has constituted power as a first-class design concern. As a result, the community of computing systems is forced to find alternative design approaches to facilitate high-performance and/or power-efficient computing. Among the examined solutions, Approximate Computing has attracted an ever-increasing interest, with research works applying approximations across the entire traditional computing stack, i.e., at software, hardware, and architectural levels. Over the last decade, there is a plethora of approximation techniques in software (programs, frameworks, compilers, runtimes, languages), hardware (circuits, accelerators), and architectures (processors, memories). The current article is Part I of our comprehensive survey on Approximate Computing, and it reviews its motivation, terminology and principles, as well it classifies and presents the technical details of the state-of-the-art software and hardware approximation techniques.Comment: Under Review at ACM Computing Survey

    Tools for efficient Deep Learning

    Get PDF
    In the era of Deep Learning (DL), there is a fast-growing demand for building and deploying Deep Neural Networks (DNNs) on various platforms. This thesis proposes five tools to address the challenges for designing DNNs that are efficient in time, in resources and in power consumption. We first present Aegis and SPGC to address the challenges in improving the memory efficiency of DL training and inference. Aegis makes mixed precision training (MPT) stabler by layer-wise gradient scaling. Empirical experiments show that Aegis can improve MPT accuracy by at most 4\%. SPGC focuses on structured pruning: replacing standard convolution with group convolution (GConv) to avoid irregular sparsity. SPGC formulates GConv pruning as a channel permutation problem and proposes a novel heuristic polynomial-time algorithm. Common DNNs pruned by SPGC have maximally 1\% higher accuracy than prior work. This thesis also addresses the challenges lying in the gap between DNN descriptions and executables by Polygeist for software and POLSCA for hardware. Many novel techniques, e.g. statement splitting and memory partitioning, are explored and used to expand polyhedral optimisation. Polygeist can speed up software execution in sequential and parallel by 2.53 and 9.47 times on Polybench/C. POLSCA achieves 1.5 times speedup over hardware designs directly generated from high-level synthesis on Polybench/C. Moreover, this thesis presents Deacon, a framework that generates FPGA-based DNN accelerators of streaming architectures with advanced pipelining techniques to address the challenges from heterogeneous convolution and residual connections. Deacon provides fine-grained pipelining, graph-level optimisation, and heuristic exploration by graph colouring. Compared with prior designs, Deacon shows resource/power consumption efficiency improvement of 1.2x/3.5x for MobileNets and 1.0x/2.8x for SqueezeNets. All these tools are open source, some of which have already gained public engagement. We believe they can make efficient deep learning applications easier to build and deploy.Open Acces

    20th SC@RUG 2023 proceedings 2022-2023

    Get PDF

    Exemplars as a least-committed alternative to dual-representations in learning and memory

    Get PDF
    Despite some notable counterexamples, the theoretical and empirical exchange between the fields of learning and memory is limited. In an attempt to promote further theoretical exchange, I explored how learning and memory may be conceptualized as distinct algorithms that operate on a the same representations of past experiences. I review representational and process assumptions in learning and memory, by the example of evaluative conditioning and false recognition, and identified important similarities in the theoretical debates. Based on my review, I identify global matching memory models and their exemplar representation as a promising candidate for a common representational substrate that satisfies the principle of least commitment. I then present two cases in which exemplar-based global matching models, which take characteristics of the stimulus material and context into account, suggest parsimonious explanations for empirical dissociations in evaluative conditioning and false recognition in long-term memory. These explanations suggest reinterpretations of findings that are commonly taken as evidence for dual-representation models. Finally, I report the same approach provides also provides a natural unitary account of false recognition in short-term memory, a finding which challenges the assumption that short-term memory is insulated from long-term memory. Taken together, this work illustrates the broad explanatory scope and the integrative and yet parsimonious potential of exemplar-based global matching models

    The effectiveness of computer-based information systems: definition and measurement

    Get PDF
    Determining and enhancing the effectiveness of computer-based information systems (1/S) in organisations remains a top priority of managers. This study shows that the essential nature and role of 1/S is changing and that classic views of 1/S effectiveness have become increasingly inappropriate. Drawing on the organisational effectiveness literature, it is argued that user perceptions provide a practical alternative and a conceptually sound basis for defining and measuring 1/S effectiveness. A popular measure - User Information Satisfaction - is examined and empirical studies using this measure are critiqued. This reveal limited theoretical grounding or convergence but a growing emphasis on behavioural theory. Based on prior empirical work by the author and expectancy and motivation theory, a model of 1/S behaviours is offered. The model suggests that fit between the needs of the organisation and the capability of 1/S to satisfy these needs is essential to achieving 1/S effectiveness. Several hypotheses are formulated. The development and validation of a particular measurement instrument is traced. The instrument addresses 37 facets of the overall information systems function and respondents complete perceptual scales tapping the relative importance of these facets and how well each is performed. The instrument is used in a field survey of 1025 managers and 1/S staff in eleven large organisations. Attitudes towards 1/S are found to correlate with perceptions of fit between organisational needs and 1/S capabilities. The survey is complemented by management interviews, document analysis and an assessment of the dynamics of the relevant 1/S groups. Cultural and other features associated with perceived 1/S success are found. It is concluded that perceptions of organisational members are central to the meaning of information systems effectiveness, but that the user information satisfaction construct and purely attitudinal measures are inadequate. Based on the notion of fit, a new definition of 1/S effectiveness is proposed. Guidelines for measurement are presented and it is argued that the instrument used in this study is a satisfactory tool. Specific recommendations for management are made and rich opportunities for future research are identified

    Metaverse. Old urban issues in new virtual cities

    Get PDF
    Recent years have seen the arise of some early attempts to build virtual cities, utopias or affective dystopias in an embodied Internet, which in some respects appear to be the ultimate expression of the neoliberal city paradigma (even if virtual). Although there is an extensive disciplinary literature on the relationship between planning and virtual or augmented reality linked mainly to the gaming industry, this often avoids design and value issues. The observation of some of these early experiences - Decentraland, Minecraft, Liberland Metaverse, to name a few - poses important questions and problems that are gradually becoming inescapable for designers and urban planners, and allows us to make some partial considerations on the risks and potentialities of these early virtual cities
    • …
    corecore