The blog introduces the paper : DARTS
Introduction
The conventional architecture search algorithms are computationally demanding despite their remarkable performance. An inherent cause of inefficiency for the dominant approaches based on reinforcement learning(RL), evolution, MCTS, SMBO or Bayesian optimization is the fact that architecture search is treated as a black-box optimization problem over a discrete domain, which leads to a large number of architecture evaluations required.
In this work, the author approaches the problem from a different angle and propose a method for efficient architecture search called DARTS(Differentiable ARchiTecture Search). Instead of searching over a discrete set of candidate architectures, the author relax the search space to be continuous so that the architecture can be optimization with respect to its validation set performance by gradient descent. It’s able to achieve competitive performance with the state of the art using orders of magnitude less computation resources. It’s also simpler than many existing approaches as it does not involve controllers, hypernetworks or performance predictors.
DARTS is able to learn high-performance architecture building blocks with complex graph topologies within a rich search space. It is not restricted to any specific architecture family and is applicable to both convolutional and recurrent networks.
The contributions of this paper can be summarized as follows:
- Introduce a novel algorithm for differentiable network architecture search based on bilevel optimization, which is applicable to both convolutional and recurrent architectures.
- Achieve highly competitive results on CIFAR-10 and outperforms the state of the art on PTB.
- Achieve remarkable efficiency improvement (reducing the cost of architecture discovery to a few GPU days), which the author attributes to the use of gradient-based optimization as opposed to non-differentiable search techniques.
- Show that the architectures learned by DARTS on CIFAR-10 and PTB are transferable to ImageNet and WikiText-2, respectively.
Differentiable Architecture Search
Search Space
Search for a computation cell as the building block of the final architecture. The learned cell could either be stacked to form a convolutional network or recursively connected to form a recurrent network.
A cell is a directed acyclic graph consisting of an ordered sequence of
- For convolutional cells, the input nodes are defined as the cell outputs in the previous two layers.
- For recurrent cells, these are defined as the input at the current step and the state carried from the previous step. The output of the cell is obtained by applying a reduction operation (e.g. concatenation) to all the intermediate nodes.
Each intermediate node is computed based on all of its predecessors:
A special zero operation is also included to indicate a lack of connection between two nodes. The task of learning the cell therefore reduces to learning the operations on its edge.\
Continuous Relaxation and Optimization
Let
where the operation mixing weights for a pair of nodes
At the end of the search, a discrete architecture can be obtained by replacing each mixed operation
After relaxation, the goal it to jointly learn the architecture
Denote by
The goal for architecture search is to find
This implies a bilevel optimization problem with
Approximate Architecture Gradient
Propose a simple approximation scheme as follows:
The idea is to approximate
The expression above contains an expensive matrix-vector product in its second term. The complexity can be substantially reduced using the finite difference approximation:
Evaluating the finite difference requires only two forward passes for the weights and two backward passes for
Deriving Discrete Architectures
Retain the top-
To make the derived architecture comparable with those in the existing work, use
Experiments and Results
Experiments on CIFAR-10 and PTB consist of two stages, architecture search and architecture evaluation.
- Search for the cell architectures using DARTS and determine the best cells based on their validation performance.
- Use these cells to construct larger architectures, and test their performance.
- Investigate the transferability of the best cells learned on CIFAR-10 and PTB by evaluating them on ImageNet and WikiText-2(WT2) respectively.
Architecture Search
Architecture Evaluation
Conclusion
We presented DARTS, a simple yet effificient architecture search algorithm for both convolutional and recurrent networks. By searching in a continuous space, DARTS is able to match or outperform the state-of-the-art non-differentiable architecture search methods on image classifification and language modeling tasks with remarkable effificiency improvement by several orders of magnitude. There are many interesting directions to improve DARTS further. For example, the current method may suffer from discrepancies between the continuous architecture encoding and the derived discrete architecture. This could be alleviated, e.g., by annealing the softmax temperature (with a suitable schedule) to enforce one-hot selection. It would also be interesting to investigate performance-aware architecture derivation schemes based on the shared parameters learned during the search process.
The paper
If you like this blog or find it useful for you, you are welcome to comment on it. You are also welcome to share this blog, so that more people can participate in it. If the images used in the blog infringe your copyright, please contact the author to delete them. Thank you !