FreezeOut – Accelerate Training by Progressively Freezing Layers
The early layers of a deep neural net have the fewest parameters, but take up the most computation. We propose to only train the hidden layers for a set portion of the training run, freezing them out one-by-one and excluding them from the backward pass. We empirically demonstrate that FreezeOut yields savings of up to 20% wall-clock time during training with 3% loss in accuracy for DenseNets on CIFAR.
Whether this tradeoff is acceptable is up to the user. If one is prototyping many different designs and simply wants to observe how they rank relative to one another, then employing higher levels of FreezeOut may be tenable. If, however, one has set one’s network design and hyperparameters and simply wants to maximize performance on a test set, then a reduction in training time is likely of no value, and freezing layers is not a desirable technique to use.