Search CORE

7 research outputs found

Welcome to Sigmod 2019 - The 2019 ACM SIGMOD International Conference on the Management of Data!

Author: Ailamaki A. (Anastasia)
Boncz P.A. (Peter)
Manegold S. (Stefan)
Publication venue
Publication date: 30/06/2019
Field of study

CWI's Institutional Repository

Proceedings of the 2019 International Conference on Management of Data

Author
Publication venue
Publication date: 30/06/2019
Field of study

CWI's Institutional Repository

Starling: An I/O-Efficient Disk-Resident Graph Index Framework for High-Dimensional Vector Similarity Search on Data Segment

Author: Gao Yunjun
Guo Rentong
Ke Xiangyu
Peng Zhangyang
Wang Mengzhao
Wu Songlin
Xie Charles
Xu Weizhi
Xu Xiaoliang
Yi Xiaomeng
Publication venue
Publication date: 02/03/2024
Field of study

High-dimensional vector similarity search (HVSS) is gaining prominence as a powerful tool for various data science and AI applications. As vector data scales up, in-memory indexes pose a significant challenge due to the substantial increase in main memory requirements. A potential solution involves leveraging disk-based implementation, which stores and searches vector data on high-performance devices like NVMe SSDs. However, implementing HVSS for data segments proves to be intricate in vector databases where a single machine comprises multiple segments for system scalability. In this context, each segment operates with limited memory and disk space, necessitating a delicate balance between accuracy, efficiency, and space cost. Existing disk-based methods fall short as they do not holistically address all these requirements simultaneously. In this paper, we present Starling, an I/O-efficient disk-resident graph index framework that optimizes data layout and search strategy within the segment. It has two primary components: (1) a data layout incorporating an in-memory navigation graph and a reordered disk-based graph with enhanced locality, reducing the search path length and minimizing disk bandwidth wastage; and (2) a block search strategy designed to minimize costly disk I/O operations during vector query execution. Through extensive experiments, we validate the effectiveness, efficiency, and scalability of Starling. On a data segment with 2GB memory and 10GB disk capacity, Starling can accommodate up to 33 million vectors in 128 dimensions, offering HVSS with over 0.9 average precision and top-10 recall rate, and latency under 1 millisecond. The results showcase Starling's superior performance, exhibiting 43.9

\times

higher throughput with 98% lower query latency compared to state-of-the-art methods while maintaining the same level of accuracy.Comment: This paper has been accepted by SIGMOD 202

arXiv.org e-Print Archive

The Evolution of Distributed Systems for Graph Neural Networks and their Origin in Graph Processing and Deep Learning: A Survey

Author: Jacobsen Hans-Arno
Mayer Ruben
Vatter Jana
Publication venue: 'Association for Computing Machinery (ACM)'
Publication date: 23/05/2023
Field of study

Graph Neural Networks (GNNs) are an emerging research field. This specialized Deep Neural Network (DNN) architecture is capable of processing graph structured data and bridges the gap between graph processing and Deep Learning (DL). As graphs are everywhere, GNNs can be applied to various domains including recommendation systems, computer vision, natural language processing, biology and chemistry. With the rapid growing size of real world graphs, the need for efficient and scalable GNN training solutions has come. Consequently, many works proposing GNN systems have emerged throughout the past few years. However, there is an acute lack of overview, categorization and comparison of such systems. We aim to fill this gap by summarizing and categorizing important methods and techniques for large-scale GNN solutions. In addition, we establish connections between GNN systems, graph processing systems and DL systems.Comment: Accepted at ACM Computing Survey

arXiv.org e-Print Archive