137 research outputs found
Achieving on-Mobile Real-Time Super-Resolution with Neural Architecture and Pruning Search
Though recent years have witnessed remarkable progress in single image
super-resolution (SISR) tasks with the prosperous development of deep neural
networks (DNNs), the deep learning methods are confronted with the computation
and memory consumption issues in practice, especially for resource-limited
platforms such as mobile devices. To overcome the challenge and facilitate the
real-time deployment of SISR tasks on mobile, we combine neural architecture
search with pruning search and propose an automatic search framework that
derives sparse super-resolution (SR) models with high image quality while
satisfying the real-time inference requirement. To decrease the search cost, we
leverage the weight sharing strategy by introducing a supernet and decouple the
search problem into three stages, including supernet construction,
compiler-aware architecture and pruning search, and compiler-aware pruning
ratio search. With the proposed framework, we are the first to achieve
real-time SR inference (with only tens of milliseconds per frame) for
implementing 720p resolution with competitive image quality (in terms of PSNR
and SSIM) on mobile platforms (Samsung Galaxy S20)
NASRec: Weight Sharing Neural Architecture Search for Recommender Systems
The rise of deep neural networks provides an important driver in optimizing
recommender systems. However, the success of recommender systems lies in
delicate architecture fabrication, and thus calls for Neural Architecture
Search (NAS) to further improve its modeling. We propose NASRec, a paradigm
that trains a single supernet and efficiently produces abundant
models/sub-architectures by weight sharing. To overcome the data multi-modality
and architecture heterogeneity challenges in recommendation domain, NASRec
establishes a large supernet (i.e., search space) to search the full
architectures, with the supernet incorporating versatile operator choices and
dense connectivity minimizing human prior for flexibility. The scale and
heterogeneity in NASRec impose challenges in search, such as training
inefficiency, operator-imbalance, and degraded rank correlation. We tackle
these challenges by proposing single-operator any-connection sampling,
operator-balancing interaction modules, and post-training fine-tuning. Our
results on three Click-Through Rates (CTR) prediction benchmarks show that
NASRec can outperform both manually designed models and existing NAS methods,
achieving state-of-the-art performance
Farthest Greedy Path Sampling for Two-shot Recommender Search
Weight-sharing Neural Architecture Search (WS-NAS) provides an efficient
mechanism for developing end-to-end deep recommender models. However, in
complex search spaces, distinguishing between superior and inferior
architectures (or paths) is challenging. This challenge is compounded by the
limited coverage of the supernet and the co-adaptation of subnet weights, which
restricts the exploration and exploitation capabilities inherent to
weight-sharing mechanisms. To address these challenges, we introduce Farthest
Greedy Path Sampling (FGPS), a new path sampling strategy that balances path
quality and diversity. FGPS enhances path diversity to facilitate more
comprehensive supernet exploration, while emphasizing path quality to ensure
the effective identification and utilization of promising architectures. By
incorporating FGPS into a Two-shot NAS (TS-NAS) framework, we derive
high-performance architectures. Evaluations on three Click-Through Rate (CTR)
prediction benchmarks demonstrate that our approach consistently achieves
superior results, outperforming both manually designed and most NAS-based
models.Comment: 9 pages, 5 figure
AutoMoE: Heterogeneous Mixture-of-Experts with Adaptive Computation for Efficient Neural Machine Translation
Mixture-of-Expert (MoE) models have obtained state-of-the-art performance in
Neural Machine Translation (NMT) tasks. Existing works in MoE mostly consider a
homogeneous design where the same number of experts of the same size are placed
uniformly throughout the network. Furthermore, existing MoE works do not
consider computational constraints (e.g., FLOPs, latency) to guide their
design. To this end, we develop AutoMoE -- a framework for designing
heterogeneous MoE's under computational constraints. AutoMoE leverages Neural
Architecture Search (NAS) to obtain efficient sparse MoE sub-transformers with
4x inference speedup (CPU) and FLOPs reduction over manually designed
Transformers, with parity in BLEU score over dense Transformer and within 1
BLEU point of MoE SwitchTransformer, on aggregate over benchmark datasets for
NMT. Heterogeneous search space with dense and sparsely activated Transformer
modules (e.g., how many experts? where to place them? what should be their
sizes?) allows for adaptive compute -- where different amounts of computations
are used for different tokens in the input. Adaptivity comes naturally from
routing decisions which send tokens to experts of different sizes. AutoMoE
code, data, and trained models are available at https://aka.ms/AutoMoE.Comment: ACL 2023 Finding
Rankitect: Ranking Architecture Search Battling World-class Engineers at Meta Scale
Neural Architecture Search (NAS) has demonstrated its efficacy in computer
vision and potential for ranking systems. However, prior work focused on
academic problems, which are evaluated at small scale under well-controlled
fixed baselines. In industry system, such as ranking system in Meta, it is
unclear whether NAS algorithms from the literature can outperform production
baselines because of: (1) scale - Meta ranking systems serve billions of users,
(2) strong baselines - the baselines are production models optimized by
hundreds to thousands of world-class engineers for years since the rise of deep
learning, (3) dynamic baselines - engineers may have established new and
stronger baselines during NAS search, and (4) efficiency - the search pipeline
must yield results quickly in alignment with the productionization life cycle.
In this paper, we present Rankitect, a NAS software framework for ranking
systems at Meta. Rankitect seeks to build brand new architectures by composing
low level building blocks from scratch. Rankitect implements and improves
state-of-the-art (SOTA) NAS methods for comprehensive and fair comparison under
the same search space, including sampling-based NAS, one-shot NAS, and
Differentiable NAS (DNAS). We evaluate Rankitect by comparing to multiple
production ranking models at Meta. We find that Rankitect can discover new
models from scratch achieving competitive tradeoff between Normalized Entropy
loss and FLOPs. When utilizing search space designed by engineers, Rankitect
can generate better models than engineers, achieving positive offline
evaluation and online A/B test at Meta scale.Comment: Wei Wen and Kuang-Hung Liu contribute equall
Binarizing Sparse Convolutional Networks for Efficient Point Cloud Analysis
In this paper, we propose binary sparse convolutional networks called BSC-Net
for efficient point cloud analysis. We empirically observe that sparse
convolution operation causes larger quantization errors than standard
convolution. However, conventional network quantization methods directly
binarize the weights and activations in sparse convolution, resulting in
performance drop due to the significant quantization loss. On the contrary, we
search the optimal subset of convolution operation that activates the sparse
convolution at various locations for quantization error alleviation, and the
performance gap between real-valued and binary sparse convolutional networks is
closed without complexity overhead. Specifically, we first present the shifted
sparse convolution that fuses the information in the receptive field for the
active sites that match the pre-defined positions. Then we employ the
differentiable search strategies to discover the optimal opsitions for active
site matching in the shifted sparse convolution, and the quantization errors
are significantly alleviated for efficient point cloud analysis. For fair
evaluation of the proposed method, we empirically select the recently advances
that are beneficial for sparse convolution network binarization to construct a
strong baseline. The experimental results on Scan-Net and NYU Depth v2 show
that our BSC-Net achieves significant improvement upon our srtong baseline and
outperforms the state-of-the-art network binarization methods by a remarkable
margin without additional computation overhead for binarizing sparse
convolutional networks.Comment: Accepted to CVPR202
- …