STTN:Spatial-Temporal Transformer Networks for Traffic Flow Forecasting

An introduction to STTN

Posted by Ccloud on 2023-02-05
Estimated Reading Time 7 Minutes
Words 1.2k In Total
Viewed Times

The blog introduces the paper : STTN

Introduction

  • Models with stationary assumption are not practical in long-term forecasting, as traffic flows are highly dynamical in nature.

  • CNN-based and RNN-based models are restricted for traffic forecasting in the following two aspects

    • Fixed Spatial Dependencies
      For each sensor, its correlated sensors vary with the forecasting. Fig1(a) shows that for the target node(purple), different correlated nodes(red) are considered to formulate the spatial dependencies for different time steps(e.g. ), according to the range(red circle) determined by the traffic speeds and distances.
      irregular

      Furthermore, it is important to capture the dynamical spatial dependencies. Spatial dependencies irregularly oscillate with time steps, due to the periodical effect of rush hours, varying weather conditions and unexpected occurrence of traffic accidents as shown in Fig1(b).

    • Limited-range Temporal Dependencies
      Long-range temporal dependencies are usually ignored in existing methods.

      Fig1(a) illustrates spatial dependencies with various scales for different time steps, which implies that prediction performance would be degraded by limiting the range of temporal dependencies.
      Furthermore, prediction errors would be propagated and accumulated for long-term traffic forecasting with existing auto-regressive methods trained with either individual loss for each time step or joint loss for multiple time steps, as depicted in Fig2(a).
      vary
      It is desirable to achieve accurate long-term perdicition based on long-range temporal dependencies extarcted form error-free temporal contexts as shown in Fig2(b).

  • STTN is proposed to address aforementioned challenges. The contributions of the paper are summarized as below:

    • Develop a spatial-temporal block to dynamically
      model long-range spatial-temporal dependencies.
    • Present a new variant of GNNs, named spatial transformer, to model the time-varying directed spatial dependencies and dynamically capture the hidden spatial patterns of traffific flflows.
    • Design a temporal transformer to achieve multi-step prediction using long-range temporal dependencies.

Proposed Model

Formulate the traffic forecasting task as a spatial-temporal prediction problem. The model consists of two main components: spatial-temporal (ST) block and prediction layer.

Problem Formulation

As other traffic network, the model can be respresented as:

To achieve accurate prediction, captures dynamical spatial dependencies and long-range temporal dependencies from and .

Overall Architecture

Architecture

Spatial-temporal Blocks

As shown in Fig3, the input to the -th spatial-temporal block is a 3-D tensor of -dimensional features for the nodes at time steps extracted by the -th spatial-temporal block. The spatial transformer and temporal transformer are stacked to generate the 3-D output tensor. Residual connections are adopted to for stable training. In the -th spatial-temporal block, the spatial transformer extracts spatial features from the input node feature as well as graph adjacency matrix

is combined with to generate the input to the subsequent temporal transformer

Prediction Layer

The prediction layer leverages two classical convolutional layer to make multi-step prediction based on the spatial-temporal features from the last spatial-temporal block. Its input is a 2-D tensor that consists of the -dimensional spatial-temporal features of the nodes for last time step . The mulit-setp prediction for future traffic conditions of the nodes is

Mean absolute loss are adopted to train the model

where is the groundtruth traffic speeds.

Spatial Transformer

Spatial_Transformer

The spatial transformer consists of spatial-temporal positional embedding layer, fixed graph convolution layer,dynamical graph convolution layer and gate mechanism for information fusion.

Spatial-Temporal Positional Embedding

The spatial dependencies of two nodes in the graph would be determined by their distances and observed time steps. Transformer cannot capture the spatial (position) and temporal information of observations with the fully connected feed-forward structures.

Thus, the prior positional embedding is to encode the ‘position’ information into the input sequences. Adopt the spatial-temporal positional embedding layer to learn the spatial-temporal embedding into each node feature. The parameter matrices and are the dictionaries.

is initialized with the graph adjacency matrix to consider the connectivity and distance between nodes for modeling spatial dependencies, while is initialized with one-hot time encoding to inject the time step into each node.

and are tiled along the spatial and temporal axes to generate and .Concatenate them as and use a convolutional layer to decreases the dimension to

Fed the into the fixed and dynamical graph convolutional layers for spatial feature learning. Since graph convolution operations can be realized in parallel for the time steps via tensor operations, consider the 2-D tensor of for arbitrary one time step for brevity.

Fixed Graph Convolutional Layer

Use the graph spectral convolution based on Chebyshev polynomial approximation to capture the stationary spatial dependencies form the road topology.

Extract the structure-aware features:

where is the -th channel(column) of and is the learned weights. Since is constructed based on the physical connectivity and distance between sensors, the stationary spatial dependencies determined by the road topology can be explicity explored through the fixed graph convolutional layer.

Dynamical Graph Convolutional Layer

To capture the time-evolving hidden spatial dependencies, the author proposes a novel dynamical graph convolutional layer to achieve training and modeling in the high-demensional latent subspaces.

latent

Learn the linear mappings that project the input features of each node to the high-dimensional latent subspaces. Then utilize the dynamic nature of self-attention mechanism to address the projected features to efficiently model the dynamical spatial dependencies between node according to the varying graph signals.

The process is as follows:

self_attention

It is worth mentioning that multiple patterns of spatial dependencies can be learned with mulit-head attention mechanism by introducing multiple pairs of subsapces.

Furthermore a shared three-layer feed-forward nerual network with nonlinear activation is applied on each node:

where is the residual connections.

and are combined by for feature fusion with the gate mechanism

Gate Mechanism for Feature Fusion

The gate mechanism is developed to fuse the spatial features learned from fifixed and dynamical graph convolutional layers. The gate is derived from and of the fixed and dynamical graph convolutional layers.

where and are linear projections that transform and into 1-D vectors

General Dynamical Graph Neural Networks

The spatial transformer can be viewed as a general message-passing dynamical graph neural network.

The author demonstrates that the spatial transformer can be formulated as an iterative feature learning process of message passing and update for all the nodes in a dynamical graph neural network.

Denote the input features of node . For arbitrary ,it receives the message from the nodes in

where is the composite message-passing function that realizes Eqs. When is obtained, is updated with calculated from and

where is the shared position-wise feed forward network.

Temporal Transformers

temporal_transformer

Similar to the spatial transformer, is obtained from the concatenation of the input features and the temporal embedding ,where is a convolutional layer that yields a -dimension vector for each node at each time step. Parallelize over the nodes to model temporal dependencies. The 2-D tensor of spatial features is considered for arbitrary one node in .

The process is as follows:

where is the residual connections.

For each node, its output is

Experiments

exp1

exp2

exp3

The paper


If you like this blog or find it useful for you, you are welcome to comment on it. You are also welcome to share this blog, so that more people can participate in it. If the images used in the blog infringe your copyright, please contact the author to delete them. Thank you !