Search CORE

1,398 research outputs found

Towards the Optimal Hardware Architecture for Computer Vision

Author: Alejandro Nieto
David López Vilarino
Víctor Brea Sánchez
Publication venue: 'IntechOpen'
Publication date: 23/03/2012
Field of study

GPU-based Private Information Retrieval for On-Device Machine Learning Inference

Author: Brooks David
Gupta Udit
Johnson Jeff
Lai Liangzhen
Lam Maximilian
Lee Hsien-Hsin S.
Leontiadis Ilias
Li Yang
Maeng Kiwan
Reddi Vijay Janapa
Rhu Minsoo
Suh G. Edward
Wei Gu-Yeon
Xiong Wenjie
Publication venue
Publication date: 25/09/2023
Field of study

On-device machine learning (ML) inference can enable the use of private user data on user devices without revealing them to remote servers. However, a pure on-device solution to private ML inference is impractical for many applications that rely on embedding tables that are too large to be stored on-device. In particular, recommendation models typically use multiple embedding tables each on the order of 1-10 GBs of data, making them impractical to store on-device. To overcome this barrier, we propose the use of private information retrieval (PIR) to efficiently and privately retrieve embeddings from servers without sharing any private information. As off-the-shelf PIR algorithms are usually too computationally intensive to directly use for latency-sensitive inference tasks, we 1) propose novel GPU-based acceleration of PIR, and 2) co-design PIR with the downstream ML application to obtain further speedup. Our GPU acceleration strategy improves system throughput by more than

20 \times

over an optimized CPU PIR implementation, and our PIR-ML co-design provides an over

5 \times

additional throughput improvement at fixed model quality. Together, for various on-device ML applications such as recommendation and language modeling, our system on a single V100 GPU can serve up to

100,000

queries per second -- a

>100 \times

throughput improvement over a CPU-based baseline -- while maintaining model accuracy

arXiv.org e-Print Archive

DCMS: A data analytics and management system for molecular simulation

Author: Anand Kumar
Joseph C Fogarty
Meryem Berrada
Sagar A Pandit
Vladimir Grupcev
Xingquan Zhu
Yi-Cheng Tu
Yuni Xia
Publication venue: Springer Nature
Publication date: 01/01/2014
Field of study

Molecular Simulation (MS) is a powerful tool for studying physical/chemical features of large systems and has seen applications in many scientific and engineering domains. During the simulation process, the experiments generate a very large number of atoms and intend to observe their spatial and temporal relationships for scientific analysis. The sheer data volumes and their intensive interactions impose significant challenges for data accessing, managing, and analysis. To date, existing MS software systems fall short on storage and handling of MS data, mainly because of the missing of a platform to support applications that involve intensive data access and analytical process. In this paper, we present the database-centric molecular simulation (DCMS) system our team developed in the past few years. The main idea behind DCMS is to store MS data in a relational database management system (DBMS) to take advantage of the declarative query interface (i.e., SQL), data access methods, query processing, and optimization mechanisms of modern DBMSs. A unique challenge is to handle the analytical queries that are often compute-intensive. For that, we developed novel indexing and query processing strategies (including algorithms running on modern co-processors) as integrated components of the DBMS. As a result, researchers can upload and analyze their data using efficient functions implemented inside the DBMS. Index structures are generated to store analysis results that may be interesting to other users, so that the results are readily available without duplicating the analysis. We have developed a prototype of DCMS based on the PostgreSQL system and experiments using real MS data and workload show that DCMS significantly outperforms existing MS software systems. We also used it as a platform to test other data management issues such as security and compression

IUPUIScholarWorks

Springer - Publisher Connector

PubMed Central

Recommended from our members

Complex Query Operators on Modern Parallel Architectures

Author: Zois Vasileios
Publication venue: eScholarship, University of California
Publication date: 01/01/2019
Field of study

Identifying interesting objects from a large data collection is a fundamental problem for multi-criteria decision making applications.In Relational Database Management Systems (RDBMS), the most popular complex query operators used to solve this type of problem are the Top-K selection operator and the Skyline operator.Top-K selection is tasked with retrieving the k-highest ranking tuples from a given relation, as determined by a user-defined aggregation function.Skyline selection retrieves those tuples with attributes offering (pareto) optimal trade-offs in a given relation.Efficient Top-K query processing entails minimizing tuple evaluations by utilizing elaborate processing schemes combined with sophisticated data structures that enable early termination.Skyline query evaluation involves supporting processing strategies which are geared towards early termination and incomparable tuple pruning.The rapid increase in memory capacity and decreasing costs have been the main drivers behind the development of main-memory database systems.Although the act of migrating query processing in-memory has created many opportunities to improve the associated query latency, attaining such improvements has been very challenging due to the growing gap between processor and main memory speeds.Addressing this limitation has been made easier by the rapid proliferation of multi-core and many-core architectures.However, their utilization in real systems has been hindered by the lack of suitable parallel algorithms that focus on algorithmic efficiency.In this thesis, we study in depth the Top-K and Skyline selection operators, in the context of emerging parallel architectures.Our ultimate goal is to provide practical guidelines for developing work-efficient algorithms suitable for parallel main memory processing.We concentrate on multi-core (CPU), many-core (GPU), and processing-in-memory architectures (PIM), developing solutions optimized for high throughout and low latency.The first part of this thesis focuses on Top-K selection, presenting the specific details of early termination algorithms that we developed specifically for parallel architectures and various types of accelerators (i.e. GPU, PIM).The second part of this thesis, concentrates on Skyline selection and the development of a massively parallel load balanced algorithm for PIM architectures.Our work consolidates performance results across different parallel architectures using synthetic and real data on variable query parameters and distributions for both of the aforementioned problems.The experimental results demonstrate several orders of magnitude better throughput and query latency, thus validating the effectiveness of our proposed solutions for the Top-K and Skyline selection operators

eScholarship - University of California

Gen-acceleration: Pioneering work for hardware accelerator generation using large language models

Author: Vungarala Durga Lakshmi Venkata Deepak
Publication venue: Digital Commons @ NJIT
Publication date: 31/12/2023
Field of study

Optimizing computational power is critical in the age of data-intensive applications and Artificial Intelligence (AI)/Machine Learning (ML). While facing challenging bottlenecks, conventional Von-Neumann architecture with implementing such huge tasks looks seemingly impossible. Hardware Accelerators are critical in efficiently deploying these technologies and have been vastly explored in edge devices. This study explores a state-of-the-art hardware accelerator; Gemmini is studied; we leveraged the open-sourced tool. Furthermore, we developed a Hardware Accelerator in the study we compared with the Non-Von-Neumann architecture. Gemmini is renowned for efficient matrix multiplication, but configuring it for specific tasks requires manual effort and expertise. We propose implementing it by reducing manual intervention and domain expertise, making it easy to develop and deploy hardware accelerators that are time-consuming and need expertise in the field; by leveraging the Large Language Models (LLMs), they enable data-informed decision-making, enhancing performance. This work introduces an innovative method for hardware accelerator generation by undertaking the Gemmini to generate optimizing hardware accelerators for AI/ML applications and paving the way for automation and customization in the field

Digital Commons @ New Jersey Institute of Technology (NJIT)

Challenges of Video Monitoring for Phenomenological Diagnostics in Present and Future Tokamaks

Author: Martin Vincent
Moncada Victor
Travere Jean-Marcel
Publication venue: HAL CCSD
Publication date: 30/06/2011
Field of study

With the development of heterogeneous camera networks working at different wavelengths and frame rates and covering a large surface of vacuum vessel, the visual observation of a large variety of plasma and thermal phenomena (e.g., hot spots, ELMs, MARFE, arcs, dusts, etc.) becomes possible. In the domain of machine protection, a phenomenological diagnostic is a key-element towards plasma/thermal event dangerousness assessment during real time operation. It is also of primary importance to automate the extraction and the storage of phenomena information for further off-line event retrieval and analysis, thus leading to a better use of massive image data bases for plasma physics studies. To this end, efforts have been devoted to the development of image processing algorithms dedicated to the recognition of specific events. But a need arises now for the integration of techniques developed so far in both hardware and software directions. We present in this paper our latests results in the field of real time phenomena recognition and management through our image understanding software platform. This platform has been validated on Tore Supra during operation and is under evaluation for the foreseen imaging diagnostic of ITER

HAL-CEA

Machines Learning - Towards a New Synthetic Autobiographical Memory

Author: Evans MH
Fox CW
Prescott TJ
Publication venue: 'Springer Science and Business Media LLC'
Publication date: 01/01/2014
Field of study

Autobiographical memory is the organisation of episodes and contextual information from an individual’s experiences into a coherent narrative, which is key to our sense of self. Formation and recall of autobiographical memories is essential for effective, adaptive behaviour in the world, providing contextual information necessary for planning actions and memory functions such as event reconstruction. A synthetic autobiographical memory system would endow intelligent robotic agents with many essential components of cognition through active compression and storage of historical sensorimotor data in an easily addressable manner. Current approaches neither fulfil these functional requirements, nor build upon recent understanding of predictive coding, deep learning, nor the neurobiology of memory. This position paper highlights desiderata for a modern implementation of synthetic autobiographical memory based on human episodic memory, and proposes that a recently developed model of hippocampal memory could be extended as a generalised model of autobiographical memory. Initial implementation will be targeted at social interaction, where current synthetic autobiographical memory systems have had success

Crossref

White Rose Research Online

Performance-Aware High-Performance Computing for Remote Sensing Big Data Analytics

Author: Pektürk Mustafa Kemal
Ünal Muhammet
Publication venue: 'IntechOpen'
Publication date: 22/08/2018
Field of study

The incredible increase in the volume of data emerging along with recent technological developments has made the analysis processes which use traditional approaches more difficult for many organizations. Especially applications involving subjects that require timely processing and big data such as satellite imagery, sensor data, bank operations, web servers, and social networks require efficient mechanisms for collecting, storing, processing, and analyzing these data. At this point, big data analytics, which contains data mining, machine learning, statistics, and similar techniques, comes to the help of organizations for end-to-end managing of the data. In this chapter, we introduce a novel high-performance computing system on the geo-distributed private cloud for remote sensing applications, which takes advantages of network topology, exploits utilization and workloads of CPU, storage, and memory resources in a distributed fashion, and optimizes resource allocation for realizing big data analytics efficiently

IntechOpen

Crossref