The blog introduces the paper : STTN
Introduction
Models with stationary assumption are not practical in long-term forecasting, as traffic flows are highly dynamical in nature.
CNN-based and RNN-based models are restricted for traffic forecasting in the following two aspects
Fixed Spatial Dependencies
For each sensor, its correlated sensors vary with the forecasting. Fig1(a) shows that for the target node(purple), different correlated nodes(red) are considered to formulate the spatial dependencies for different time steps(e.g.), according to the range(red circle) determined by the traffic speeds and distances. Furthermore, it is important to capture the dynamical spatial dependencies. Spatial dependencies irregularly oscillate with time steps, due to the periodical effect of rush hours, varying weather conditions and unexpected occurrence of traffic accidents as shown in Fig1(b).
Limited-range Temporal Dependencies
Long-range temporal dependencies are usually ignored in existing methods.Fig1(a) illustrates spatial dependencies with various scales for different time steps, which implies that prediction performance would be degraded by limiting the range of temporal dependencies.
Furthermore, prediction errors would be propagated and accumulated for long-term traffic forecasting with existing auto-regressive methods trained with either individual loss for each time step or joint loss for multiple time steps, as depicted in Fig2(a).
It is desirable to achieve accurate long-term perdicition based on long-range temporal dependencies extarcted form error-free temporal contexts as shown in Fig2(b).
STTN is proposed to address aforementioned challenges. The contributions of the paper are summarized as below:
- Develop a spatial-temporal block to dynamically
model long-range spatial-temporal dependencies. - Present a new variant of GNNs, named spatial transformer, to model the time-varying directed spatial dependencies and dynamically capture the hidden spatial patterns of traffific flflows.
- Design a temporal transformer to achieve multi-step prediction using long-range temporal dependencies.
- Develop a spatial-temporal block to dynamically
Proposed Model
Formulate the traffic forecasting task as a spatial-temporal prediction problem. The model consists of two main components: spatial-temporal (ST) block and prediction layer.
Problem Formulation
As other traffic network, the model can be respresented as:
To achieve accurate prediction,
Overall Architecture
Spatial-temporal Blocks
As shown in Fig3, the input to the
Prediction Layer
The prediction layer leverages two classical convolutional layer to make multi-step prediction based on the spatial-temporal features from the last spatial-temporal block. Its input is a 2-D tensor
Mean absolute loss are adopted to train the model
where
Spatial Transformer
The spatial transformer consists of spatial-temporal positional embedding layer, fixed graph convolution layer,dynamical graph convolution layer and gate mechanism for information fusion.
Spatial-Temporal Positional Embedding
The spatial dependencies of two nodes in the graph would be determined by their distances and observed time steps. Transformer cannot capture the spatial (position) and temporal information of observations with the fully connected feed-forward structures.
Thus, the prior positional embedding is to encode the ‘position’ information into the input sequences. Adopt the spatial-temporal positional embedding layer to learn the spatial-temporal embedding into each node feature. The parameter matrices
Fed the
Fixed Graph Convolutional Layer
Use the graph spectral convolution based on Chebyshev polynomial approximation to capture the stationary spatial dependencies form the road topology.
Extract the structure-aware features:
where
Dynamical Graph Convolutional Layer
To capture the time-evolving hidden spatial dependencies, the author proposes a novel dynamical graph convolutional layer to achieve training and modeling in the high-demensional latent subspaces.
Learn the linear mappings that project the input features of each node to the high-dimensional latent subspaces. Then utilize the dynamic nature of self-attention mechanism to address the projected features to efficiently model the dynamical spatial dependencies between node according to the varying graph signals.
The process is as follows:
It is worth mentioning that multiple patterns of spatial dependencies can be learned with mulit-head attention mechanism by introducing multiple pairs of subsapces.
Furthermore a shared three-layer feed-forward nerual network with nonlinear activation is applied on each node:
where
Gate Mechanism for Feature Fusion
The gate mechanism is developed to fuse the spatial features learned from fifixed and dynamical graph convolutional layers. The gate
where
General Dynamical Graph Neural Networks
The spatial transformer can be viewed as a general message-passing dynamical graph neural network.
The author demonstrates that the spatial transformer can be formulated as an iterative feature learning process of message passing and update for all the nodes
Denote
where
where
Temporal Transformers
Similar to the spatial transformer,
The process is as follows:
where
For each node, its output is
Experiments
The paper
If you like this blog or find it useful for you, you are welcome to comment on it. You are also welcome to share this blog, so that more people can participate in it. If the images used in the blog infringe your copyright, please contact the author to delete them. Thank you !