SiGeo: Sub-One-Shot NAS via Information Theory and Geometry of Loss
  Landscape

Chen, Wen-Yen; Fedorov, Igor; Liu, Kuang-Hung; Wen, Wei; Zhang, Xin; Zheng, Hua

SiGeo: Sub-One-Shot NAS via Information Theory and Geometry of Loss Landscape

Authors: Wen-Yen Chen
Igor Fedorov
Kuang-Hung Liu
Wei Wen
Xin Zhang
Hua Zheng
Publication date: 22 November 2023
Publisher

Abstract

Neural Architecture Search (NAS) has become a widely used tool for automating neural network design. While one-shot NAS methods have successfully reduced computational requirements, they often require extensive training. On the other hand, zero-shot NAS utilizes training-free proxies to evaluate a candidate architecture's test performance but has two limitations: (1) inability to use the information gained as a network improves with training and (2) unreliable performance, particularly in complex domains like RecSys, due to the multi-modal data inputs and complex architecture configurations. To synthesize the benefits of both methods, we introduce a "sub-one-shot" paradigm that serves as a bridge between zero-shot and one-shot NAS. In sub-one-shot NAS, the supernet is trained using only a small subset of the training data, a phase we refer to as "warm-up." Within this framework, we present SiGeo, a proxy founded on a novel theoretical framework that connects the supernet warm-up with the efficacy of the proxy. Extensive experiments have shown that SiGeo, with the benefit of warm-up, consistently outperforms state-of-the-art NAS proxies on various established NAS benchmarks. When a supernet is warmed up, it can achieve comparable performance to weight-sharing one-shot NAS methods, but with a significant reduction (

\sim 60

\%) in computational costs.Comment: 24 pages, 7 figure

Similar works

Full text

Available Versions

arXiv.org e-Print Archive

oai:arXiv.org:2311.13169

Last time updated on 08/05/2024