Aligning Time Series on Incomparable Spaces

  Paper   PDF   Code   Poster   Video


Data is often gathered sequentially in the form of a time series, which consists of sequences of data points observed at successive time points. Dynamic time warping (DTW) defines a meaningful similarity measure between two time series. Often times, the pairs of time series we are interested in are defined on different spaces: for instance, one might want to align a video with a corresponding audio wave, potentially sampled at different frequencies. In this work, we propose Gromov Dynamic Time Warping (GDTW), a distance between time series on potentially incomparable spaces, and apply it to various settings, including barycentric averaging, generative modeling, and imitation learning.

Time Series Alignment

Sakoe and Chiba1 consider the problem of aligning two time series xXTx\boldsymbol{x} \in \mathcal{X}^{T_x} and yYTy\boldsymbol{y} \in \mathcal{Y}^{T_y}, where potentially TxTyT_x \neq T_y. This is formalized asDTW(x,y)=minAA(Tx,Ty)D,AF \operatorname{DTW}(\boldsymbol{x},\boldsymbol{y}) = \min_{\mathbf{A} \in \mathcal{A}(T_x,T_y)} \langle\mathbf{D}, \mathbf{A}\rangle_{\operatorname{F}}

where Dij=dX(xi,yj)D_{ij} = d_\mathcal{X}(x_i,y_j) is the pairwise distance matrix and AijA_{ij} is 11 if xix_i and yjy_j are aligned, and 00 otherwise. A DTW alignment of two time series is shown in Figure 1.

Dynamic Time Warping
Figure 1: Alignment of two time series via Dynamic Time Warping.

A practical drawback of DTW is the need for both time series x\boldsymbol{x} and y\boldsymbol{y} to live on the same spaces, with a metric dXd_{\mathcal{X}}. This can cause the following issues.

  • Alignment can fail for time series that are only close up to isometries, such as rotations and translations.
  • The method doesn’t apply to time series which are defined on different spaces, such as location coordinates for x\boldsymbol{x} and pixel values for y\boldsymbol{y}.

In such cases, defining a meaningful distance between samples from the two sequences is impractical as it would require detailed understanding of the objects we wish to study.

Dealing with Incomparable Spaces

Motivated by connections between DTW and optimal transport,2 we introduce a distance between time series xXTx\boldsymbol{x} \in \mathcal{X}^{T_x} and yYTy\boldsymbol{y} \in \mathcal{Y}^{T_y} defined on potentially incomparable metric spaces. The key idea is to define a loss function which compares pairwise distances in XTx\mathcal{X}^{T_x} with those in YTy\mathcal{Y}^{T_y}. For this, we define the Gromov dynamic time warping distance asGDTW(x,y)=minAA(Tx,Ty)ijklL(dX(xi,xk),dY(yj,yl))AijAkl, \operatorname{GDTW}(\boldsymbol{x},\boldsymbol{y})=\min_{\mathbf{A} \in \mathcal{A}(T_x,T_y)} \sum_{ijkl} \mathcal{L} \big(d_{\mathcal{X}}(x_i,x_k),d_{\mathcal{Y}}(y_j,y_l)\big) A_{ij}A_{kl},

where dXd_{\mathcal{X}} is a distance defined on X\mathcal{X}, and dYd_{\mathcal{Y}} a distance defined on Y\mathcal{Y}. We solve the optimization problem over the set of alignment matrices by applying a Frank–Wolfe-inspired algorithm. Results can be seen in Figure 2 for different rotations and translations of the original time series.

Figure 2: Alignment of two time series via Dynamic Time Warping and Gromov Dynamic Time Warping under different transformations.

Similarly to DTW, GDTW suffers from unpredictability when the time series is close to a change point of the optimal alignment matrix because of the discontinuity of derivatives. To remedy this, we introduce a softened version of this expression, mirroring the definition of soft DTW.3 This allows smoother derivatives when applying it to for instance generative modeling of time series and imitation learning.


We showcase a number of applications of Gromov DTW. We considered 3 settings: barycentric averaging, generative modeling, and imitation learning.

  • Barycentric averaging: we extend the algorithm of Peyré et al.2 to the sequential setting via coordinate descent on the GDTW objective. We plot barycenters under several warping approaches in Figure 3.

    Barycenters of the Quickdraw dataset
    Figure 3: Barycenters of the Quickdraw dataset’s fishes via various time warping approaches.
  • Generative modeling: we extend the algorithm of Genevay et al.4 and Bunne et al.5 to the sequential setting by leveraging GDTW as ground metric in an entropic Wasserstein objective. Samples can be observed in Figure 4.

    Generated samples
    Figure 4: Samples generated by the time series GAN trained on sequential MNIST.
  • Imitation learning: We propose an approach to performing imitation learning when the agent and expert do not live on comparable spaces, which consists in minimizing the Gromov-DTW between expert demonstrations and agent rollouts. This is illustrated in Figure 5.

    Expert trajectory and agent policy
    Figure 5: Expert trajectory (left, sequence of pixel images) and policy of an agent (right, sequence of points in R2\mathbb{R}^2) learned by imitation learning.


We introduce a distance between time series living on potentially incomparable spaces, Gromov DTW, which significantly broadens the range of applications of previous time-series metrics like DTW. We also propose a smoothed version that alleviates the discontinuity of GDTW’s gradient. Gromov DTW is a general concept for comparing two time series and can be applied to a number of applications, including barycentric averaging, generative modeling and imitation learning.



H. Sakoe, S. Chiba. Dynamic Programming Algorithm Optimization for Spoken Word Recognition. ICASSP, 2018.


G. Peyré, M. Cuturi, J. Solomon. Gromov–Wasserstein Averaging of Kernel and Distance Matrices. ICML, 2016.


M. Cuturi, M. Blondel. Soft-DTW: A Differentiable Loss Function for Time-Series. ICML, 2017.


A. Genevay, G. Peyré, M. Cuturi. Learning Generative Models with Sinkhorn Divergences. AISTATS, 2018.


C. Bunne, D. Alvarez-Melis, A. Krause, S. Jegelka. Learning Generative Models Across Incomparable Spaces. ICML, 2019.