Recent studies have shown that skeletonization (pruning parameters) of
networks \textit{at initialization} provides all the practical benefits of
sparsity both at inference and training time, while only marginally degrading
their performance. However, we observe that beyond a certain level of sparsity
(approx 95%), these approaches fail to preserve the network performance, and
to our surprise, in many cases perform even worse than trivial random pruning.
To this end, we propose an objective to find a skeletonized network with
maximum {\em foresight connection sensitivity} (FORCE) whereby the
trainability, in terms of connection sensitivity, of a pruned network is taken
into consideration. We then propose two approximate procedures to maximize our
objective (1) Iterative SNIP: allows parameters that were unimportant at
earlier stages of skeletonization to become important at later stages; and (2)
FORCE: iterative process that allows exploration by allowing already pruned
parameters to resurrect at later stages of skeletonization. Empirical analyses
on a large suite of experiments show that our approach, while providing at
least as good a performance as other recent approaches on moderate pruning
levels, provides remarkably improved performance on higher pruning levels
(could remove up to 99.5% parameters while keeping the networks trainable).
Code can be found in https://github.com/naver/force