Neural Architecture Search (NAS) has become a widely used tool for automating
neural network design. While one-shot NAS methods have successfully reduced
computational requirements, they often require extensive training. On the other
hand, zero-shot NAS utilizes training-free proxies to evaluate a candidate
architecture's test performance but has two limitations: (1) inability to use
the information gained as a network improves with training and (2) unreliable
performance, particularly in complex domains like RecSys, due to the
multi-modal data inputs and complex architecture configurations. To synthesize
the benefits of both methods, we introduce a "sub-one-shot" paradigm that
serves as a bridge between zero-shot and one-shot NAS. In sub-one-shot NAS, the
supernet is trained using only a small subset of the training data, a phase we
refer to as "warm-up." Within this framework, we present SiGeo, a proxy founded
on a novel theoretical framework that connects the supernet warm-up with the
efficacy of the proxy. Extensive experiments have shown that SiGeo, with the
benefit of warm-up, consistently outperforms state-of-the-art NAS proxies on
various established NAS benchmarks. When a supernet is warmed up, it can
achieve comparable performance to weight-sharing one-shot NAS methods, but with
a significant reduction (∼60\%) in computational costs.Comment: 24 pages, 7 figure