Hypergraphs have demonstrated their superiority in modeling complex systems compared to traditional graphs by directly capturing the interactions among multiple entities. Hyperedge prediction, which aims to predict unobserved potential hyperedges, is a fundamental task in hypergraph analysis. A critical component in hyperedge prediction is the sampling of informative negative hyperedges from significantly larger candidate negative sets, compared to traditional graphs, to enhance model training efficacy. Most existing methods utilize predefined heuristics to sample negative hyperedges, resulting in limited generalizability due to their reliance on these predefined rules. The new state-of-the-art in this field is generation-based methods, which treat negative sampling as a generative task. Nevertheless, current generation-based approaches are not scalable to large hypergraphs. Additionally, diffusion models have demonstrated superior performance in numerous generative tasks, yet their potential application in the generation of negative hyperedges remains unexplored. However, the adaptation of diffusion models to this specific task presents challenges due to: (1) diffusion models are inherently designed to generate high-quality positive samples, which are well-defined, as opposed to negative samples; (2) diffusion models are traditionally employed in continuous space, whereas negative sampling for hyperedge prediction operates in discrete space.To address these complexities, we introduce SEHP (Scalable and Effective Negative Sample Generation for Hyperedge Prediction), which employs a conditional diffusion model to iteratively generate and refine negative hyperedges, thereby advancing them towards the decision boundary to improve model performance. SEHP further enhances scalability by effectively sampling sub-hypergraphs, integrating global structural information into the diffusion model for batch training. Extensive experiments conducted on real-world datasets demonstrate that SEHP surpasses existing state-of-the-art methods in both prediction accuracy and scalability. The code of our paper is available at https://github.com/SLQu/SEHPFull Tex
Is data on this page outdated, violates copyrights or anything else? Report the problem now and we will take corresponding actions after reviewing your request.