Large language models (LLMs) have become an integral component in solving a
wide range of NLP tasks. In this work, we explore a novel use case of using
LLMs to build performance predictors (PP): models that, given a specific deep
neural network architecture, predict its performance on a downstream task. We
design PP prompts for LLMs consisting of: (i) role: description of the role
assigned to the LLM, (ii) instructions: set of instructions to be followed by
the LLM to carry out performance prediction, (iii) hyperparameters: a
definition of each architecture-specific hyperparameter and (iv)
demonstrations: sample architectures along with their efficiency metrics and
'training from scratch' performance. For machine translation (MT) tasks, we
discover that GPT-4 with our PP prompts (LLM-PP) can predict the performance of
architecture with a mean absolute error matching the SOTA and a marginal
degradation in rank correlation coefficient compared to SOTA performance
predictors. Further, we show that the predictions from LLM-PP can be distilled
to a small regression model (LLM-Distill-PP). LLM-Distill-PP models
surprisingly retain the performance of LLM-PP largely and can be a
cost-effective alternative for heavy use cases of performance estimation.
Specifically, for neural architecture search (NAS), we propose a Hybrid-Search
algorithm for NAS (HS-NAS), which uses LLM-Distill-PP for the initial part of
search, resorting to the baseline predictor for rest of the search. We show
that HS-NAS performs very similar to SOTA NAS across benchmarks, reduces search
hours by 50% roughly, and in some cases, improves latency, GFLOPs, and model
size