1,398 research outputs found

    GPU-based Private Information Retrieval for On-Device Machine Learning Inference

    Full text link
    On-device machine learning (ML) inference can enable the use of private user data on user devices without revealing them to remote servers. However, a pure on-device solution to private ML inference is impractical for many applications that rely on embedding tables that are too large to be stored on-device. In particular, recommendation models typically use multiple embedding tables each on the order of 1-10 GBs of data, making them impractical to store on-device. To overcome this barrier, we propose the use of private information retrieval (PIR) to efficiently and privately retrieve embeddings from servers without sharing any private information. As off-the-shelf PIR algorithms are usually too computationally intensive to directly use for latency-sensitive inference tasks, we 1) propose novel GPU-based acceleration of PIR, and 2) co-design PIR with the downstream ML application to obtain further speedup. Our GPU acceleration strategy improves system throughput by more than 20Ă—20 \times over an optimized CPU PIR implementation, and our PIR-ML co-design provides an over 5Ă—5 \times additional throughput improvement at fixed model quality. Together, for various on-device ML applications such as recommendation and language modeling, our system on a single V100 GPU can serve up to 100,000100,000 queries per second -- a >100Ă—>100 \times throughput improvement over a CPU-based baseline -- while maintaining model accuracy

    DCMS: A data analytics and management system for molecular simulation

    Get PDF
    Molecular Simulation (MS) is a powerful tool for studying physical/chemical features of large systems and has seen applications in many scientific and engineering domains. During the simulation process, the experiments generate a very large number of atoms and intend to observe their spatial and temporal relationships for scientific analysis. The sheer data volumes and their intensive interactions impose significant challenges for data accessing, managing, and analysis. To date, existing MS software systems fall short on storage and handling of MS data, mainly because of the missing of a platform to support applications that involve intensive data access and analytical process. In this paper, we present the database-centric molecular simulation (DCMS) system our team developed in the past few years. The main idea behind DCMS is to store MS data in a relational database management system (DBMS) to take advantage of the declarative query interface (i.e., SQL), data access methods, query processing, and optimization mechanisms of modern DBMSs. A unique challenge is to handle the analytical queries that are often compute-intensive. For that, we developed novel indexing and query processing strategies (including algorithms running on modern co-processors) as integrated components of the DBMS. As a result, researchers can upload and analyze their data using efficient functions implemented inside the DBMS. Index structures are generated to store analysis results that may be interesting to other users, so that the results are readily available without duplicating the analysis. We have developed a prototype of DCMS based on the PostgreSQL system and experiments using real MS data and workload show that DCMS significantly outperforms existing MS software systems. We also used it as a platform to test other data management issues such as security and compression

    Gen-acceleration: Pioneering work for hardware accelerator generation using large language models

    Get PDF
    Optimizing computational power is critical in the age of data-intensive applications and Artificial Intelligence (AI)/Machine Learning (ML). While facing challenging bottlenecks, conventional Von-Neumann architecture with implementing such huge tasks looks seemingly impossible. Hardware Accelerators are critical in efficiently deploying these technologies and have been vastly explored in edge devices. This study explores a state-of-the-art hardware accelerator; Gemmini is studied; we leveraged the open-sourced tool. Furthermore, we developed a Hardware Accelerator in the study we compared with the Non-Von-Neumann architecture. Gemmini is renowned for efficient matrix multiplication, but configuring it for specific tasks requires manual effort and expertise. We propose implementing it by reducing manual intervention and domain expertise, making it easy to develop and deploy hardware accelerators that are time-consuming and need expertise in the field; by leveraging the Large Language Models (LLMs), they enable data-informed decision-making, enhancing performance. This work introduces an innovative method for hardware accelerator generation by undertaking the Gemmini to generate optimizing hardware accelerators for AI/ML applications and paving the way for automation and customization in the field

    Challenges of Video Monitoring for Phenomenological Diagnostics in Present and Future Tokamaks

    Get PDF
    With the development of heterogeneous camera networks working at different wavelengths and frame rates and covering a large surface of vacuum vessel, the visual observation of a large variety of plasma and thermal phenomena (e.g., hot spots, ELMs, MARFE, arcs, dusts, etc.) becomes possible. In the domain of machine protection, a phenomenological diagnostic is a key-element towards plasma/thermal event dangerousness assessment during real time operation. It is also of primary importance to automate the extraction and the storage of phenomena information for further off-line event retrieval and analysis, thus leading to a better use of massive image data bases for plasma physics studies. To this end, efforts have been devoted to the development of image processing algorithms dedicated to the recognition of specific events. But a need arises now for the integration of techniques developed so far in both hardware and software directions. We present in this paper our latests results in the field of real time phenomena recognition and management through our image understanding software platform. This platform has been validated on Tore Supra during operation and is under evaluation for the foreseen imaging diagnostic of ITER

    Machines Learning - Towards a New Synthetic Autobiographical Memory

    Get PDF
    Autobiographical memory is the organisation of episodes and contextual information from an individual’s experiences into a coherent narrative, which is key to our sense of self. Formation and recall of autobiographical memories is essential for effective, adaptive behaviour in the world, providing contextual information necessary for planning actions and memory functions such as event reconstruction. A synthetic autobiographical memory system would endow intelligent robotic agents with many essential components of cognition through active compression and storage of historical sensorimotor data in an easily addressable manner. Current approaches neither fulfil these functional requirements, nor build upon recent understanding of predictive coding, deep learning, nor the neurobiology of memory. This position paper highlights desiderata for a modern implementation of synthetic autobiographical memory based on human episodic memory, and proposes that a recently developed model of hippocampal memory could be extended as a generalised model of autobiographical memory. Initial implementation will be targeted at social interaction, where current synthetic autobiographical memory systems have had success

    Performance-Aware High-Performance Computing for Remote Sensing Big Data Analytics

    Get PDF
    The incredible increase in the volume of data emerging along with recent technological developments has made the analysis processes which use traditional approaches more difficult for many organizations. Especially applications involving subjects that require timely processing and big data such as satellite imagery, sensor data, bank operations, web servers, and social networks require efficient mechanisms for collecting, storing, processing, and analyzing these data. At this point, big data analytics, which contains data mining, machine learning, statistics, and similar techniques, comes to the help of organizations for end-to-end managing of the data. In this chapter, we introduce a novel high-performance computing system on the geo-distributed private cloud for remote sensing applications, which takes advantages of network topology, exploits utilization and workloads of CPU, storage, and memory resources in a distributed fashion, and optimizes resource allocation for realizing big data analytics efficiently
    • …
    corecore