The blog introduces the paper : AutoST:Efficient Neural Architecture Search for Spatio-Temporal Prediction
Introduction
How to find the optimal neural architecture at various scenarios in cities is still an unsettled problem.
- The spatio-temporal correlation is diverse from location to location
- spatio-temporal correlation is complex including spatial dependency between regions and temporal correlation among timestamps
- spatio-temporal correlation is heterogeneous to different tasks
Two important aspects neglected in existing approaches.
Different areas may have different spatial ranges preference while the the size of convolution kernel which models the range of neighborhood dependency is usually fixed and empirically set.
Select the optimal architecture found in neural architecture search (NAS) and analyze the search process.Finally, it shows that the Beijing achieves slightly higher score than Guiyang in convolution score (add the weighted kernel size in all layers as the total scores for each architecture at every iteration)
Current approaches usually use the residual network to aggregate the features in adjacent layers, failing to fuse the low and high-level features. The more the number of skip connections at the search stage indicates the more important the low-level features. It turns out that different cities have distinct architecture preferences and core cities usually pay attention to the global spatial dependency.(shown in Fig1(c))
Propose a general network called AutoST shown in Fig2.
The traditional processes of ST prediction consists of three components:- initial ST feature extractor: constructing ST features from raw ST data
- feature learning network
- external fusion and predictor: fusing the external factors with flow information and then predicting urban flows in the future
The contributions can be summarized as three aspects:
- Propose a novel model named AutoST for spatio-temporal prediction which introduces the neural architecture search technique to dynamically capture the various-range spatial correlations and to fuse multi-level features. In addition, AutoST is oriented to ST data rather than specific application scenarios, which can be easily applied to a series of deep models.
- Design an efficient and effective search space including two basic modules: i) a mix convolution block at each layer to capture various-range spatial correlations; ii) a mix skip connection block among layers to dynamically fuse the multi-level features. Specifically, the mix convolution block consists of multiple convolutions kernels, the larger size of which indicates the longer range correlations. Besides, the mix skip connection block includes a connection cell and a do-not-connection cell to dynamically learn to fuse the low- and high level features.
- Perform extensive experiments on four real-world spatio-temporal datasets varying from taxi flow to crowd flow, and the experimental results demonstrate that AutoST can significantly improve the spatial-temporal prediction. Furthermore, our proposed spatio-temporal NAS method is more efficient than existing NAS methods.
Preliminaries
DEFINITION 1. Spatial-Temporal prediction: We partition a city into an I ∗ J grids based on longitude and latitude where a grid denotes a region. There are many types of measures in a region for different ST applications, such as the crowd flows, bike recent and return, taxi pick-ups and drop-offs. Then, the urban information at time t can be denoted as
DEFINITION 2. External Features: We denote
Problem Statement: Given historical observed citywide urban flows
flow for all locations in the next timestamps
Methodology
Spatio-Temporal Search Space
Distinct to neural network with fixed architecture, NAS network is composed of three modules form simple to complex:
- a candidate cell module which defines the search unit.
- an operation block module which perform weighted sum over all possible operations to make the sarch sape continuous.
- a NAS network module which is consisted of a series of mix operations.
Candidate Cells
Divide a city into a grid map where each grid denotes a region and these grids make up an image. Select the candidate cells based on following three considerations:
- Since nearby regions may affect each other for ST prediction, convolution operation is important. Convolutions with different kernel size model the spatial dependency in different ranges
- Local correlation captured by low-level convolution and global correlation encoded with high-level convolution are both important for city-wide flow forecasting. Therefore, skip connection which fuse multi-level correlations is necessary.
- Distinct to the common CNN networks in image domain, ST prediction task doesn’t need the pooling operation because pooling operation can cause the information loss probably.
Cells:
- Convolutional operations consisting of
standard convolution (Std_conv_3), standard convolution (Std_conv_5), separable convolution(Sep_conv_3), separable convolution (Sep_conv_5) - Skip connection operation including don’t connection (none) operation and connection (identity) operation.
Operation Blocks
Gradient-based search strategy usually calculate the weighted sums over the outputs of all basic operations to avoid the discrete choice of basic operation unit.
In DARTS, the author simply add the outputs of eight basic cells as the final output of one operation block, which can be formulated as follows:
However, the calculation of all basic convolution operation at each layer may cause large memory consumption. There are
In order to solve the memory inefficient problem, the basic cells are grouped into two categories and define twp types of mix operations:
- mix convolution block
represented as blue arrow which calculates the weighted sum of all convolution outputs - mix connection block
represented as green arrow which multiples the outputs of each layers with connection probability.
The architecture parameters
The number of skip connections increases when iterating more epochs and is caused by the inherent unfair competition. Therefore, they propose to relax the choice of operations to be dependent and make each operation has an equal opportunity to develop its strength. They apply a sigmoid activation instead of softmax to generate the architecture weights.
The values in
There are
NAS Network
DARTS divided the network into two parts, the inner network which learns the architecture by NAS and the outer network which has the fixed architecture.
However, the fixed architecture of outer network may reduce the performance. In order to make the network more efficient, we constrain the search space according to the characteristics of ST prediction in the following two aspects:
- There are only one convolution in adjacent layers to capture large-range spatial dependency
- There are no external feature transform when fusing multi-level features so that we simply add the outputs of previous layers to current layer.
The output of
where
The proposed network admits the following three advantages:
- it is more effective than fixed architecture due that existing networks are a subset of AutoST
- it releases experts from refining networks
- it is more efficient than recent neural architecture search method because the search space is carefully designed and constrained for ST prediction task
AutoST for Spatio-Temporal Prediction
We evaluate AutoST on three popular spatio-temporal prediction models (STResNet, ST-3DNet and DeepSTN) to verify the efficiency and generality.
We follow the same CPT (closeness, period and trend) paradigm and take DeepSTN as example to explain the AutoST for spatio-temporal prediction.
where
Where
Algorithm and Optimization
Experiments
The paper
If you like this blog or find it useful for you, you are welcome to comment on it. You are also welcome to share this blog, so that more people can participate in it. If the images used in the blog infringe your copyright, please contact the author to delete them. Thank you !