80,443 research outputs found
Federated Neural Architecture Search
To preserve user privacy while enabling mobile intelligence, techniques have
been proposed to train deep neural networks on decentralized data. However,
training over decentralized data makes the design of neural architecture quite
difficult as it already was. Such difficulty is further amplified when
designing and deploying different neural architectures for heterogeneous mobile
platforms. In this work, we propose an automatic neural architecture search
into the decentralized training, as a new DNN training paradigm called
Federated Neural Architecture Search, namely federated NAS. To deal with the
primary challenge of limited on-client computational and communication
resources, we present FedNAS, a highly optimized framework for efficient
federated NAS. FedNAS fully exploits the key opportunity of insufficient model
candidate re-training during the architecture search process, and incorporates
three key optimizations: parallel candidates training on partial clients, early
dropping candidates with inferior performance, and dynamic round numbers.
Tested on large-scale datasets and typical CNN architectures, FedNAS achieves
comparable model accuracy as state-of-the-art NAS algorithm that trains models
with centralized data, and also reduces the client cost by up to two orders of
magnitude compared to a straightforward design of federated NAS
Integrative Dynamic Reconfiguration in a Parallel Stream Processing Engine
Load balancing, operator instance collocations and horizontal scaling are
critical issues in Parallel Stream Processing Engines to achieve low data
processing latency, optimized cluster utilization and minimized communication
cost respectively. In previous work, these issues are typically tackled
separately and independently. We argue that these problems are tightly coupled
in the sense that they all need to determine the allocations of workloads and
migrate computational states at runtime. Optimizing them independently would
result in suboptimal solutions. Therefore, in this paper, we investigate how
these three issues can be modeled as one integrated optimization problem. In
particular, we first consider jobs where workload allocations have little
effect on the communication cost, and model the problem of load balance as a
Mixed-Integer Linear Program. Afterwards, we present an extended solution
called ALBIC, which support general jobs. We implement the proposed techniques
on top of Apache Storm, an open-source Parallel Stream Processing Engine. The
extensive experimental results over both synthetic and real datasets show that
our techniques clearly outperform existing approaches
- …