Learning models of physical systems can be tricky, but exploiting inductive biases about the nature of the system can speed up learning significantly. In the following, we will give a brief overview and the key insights behind variational integrator networks.

When learning models of physical systems, we’re often dealing with nonlinear dynamics and learning from noisy or high dimensional data from a limited number of samples. This is particularly relevant in robotics, where the cost of getting more data is expensive. Expressive models like neural networks are great at handling high dimensional data and learning complex functions. Using standard feed-forward or recurrent neural network architectures, we can learn to approximate physical systems if given enough data. There are two potential issues that can make neural networks difficult to use in practice.

- Because they learn
*approximate*physics, predictions can behave erratically. This is particularly the case when predicting iteratively to forecast the evolution of the system. - Having to learn the physics requires more data, and data-efficiency can be crucial.

Error can accumulate over time, causing even an accurate short-term model, such as the recurrent residual network shown below, to do worse over the long term.

To address these issues, we propose variational integrator networks (VINs). VINs are expressive neural network architectures with built-in physics. Using VINs allows us to easily learn models with physical forecasting behaviour from noisy or even pixel data in a data-efficient way.

# From Residual Networks to Variational Integrator Networks

The idea is simple: if we view neural networks as dynamical systems^{1}^{2}^{3}—and discretize them in a manner that preserves qualitative physical properties^{4}—we can define network architectures that obey the laws of physics. A particularly salient example of the kind of inductive bias we are interested in is the presence of conservation laws, for instance conservation of energy or conservation of momentum.

A canonical description of classical physical dynamical systems is Lagrangian mechanics, where a system is completely characterized by its Lagrangian $L(q, \dot{q}, t)$, a scalar function that encodes underlying physical properties. The equations of motion for such a system are a set of first-order ODEs called the Euler-Lagrange equations. At the same time, a deep residual network can be viewed as a system of ODEs

$\frac{\mathrm{d}x}{\mathrm{d}t} = f_{\theta}(x, t)$

discretized using an Euler scheme,^{1}^{2}^{3} giving

$x_{t+1} = x_t + hf_{\theta}(x_t) .$

Inspired by this perspective, one can consider Euler discretising the equations of motion

$\frac{\mathrm{d}}{\mathrm{d}t}\frac{\partial L_{\theta}}{\partial \dot{q}} - \frac{\partial L_{\theta}}{\partial q} = 0$

arising from Lagrangian mechanics instead for the corresponding Residual network. A problem with this approach is that the Euler scheme ignores the underlying geometry and qualitative properties of the equations of motion, and hence the physics. This is the reason the dynamics spiral out of control in the video shown previously. To avoid this, we propose to use variational integrators,^{4} a class of structure preserving integrators to address this issue. The result are **Variational Integrator Networks (VINs)**. VINs facilitate accurate long-term predictions and data-efficient learning while remaining flexible to model complex behavior. An illustration of the architecture and example comparisons are given below.

**(a)**VAE

**(b)**Dynamic VAE

**(c)**Lie Group VAE

**(d)**VIN-$SO(2)$

**(e)**VIN-$SO(2)^*$

**(f)**Ground Truth

**Example embedded representations of an ideal pendulum system:**black/colored dots represent embedded train/test images, gray lines connect points sequentially in time. The embeddings learned by the baseline models fail to capture the global structure (a)–(b) and/or are discontinuous with respect to the time dimension (c). The VIN-$SO(2)$ (d), learns an embedding that is consistent with the ground truth (f), particularly in (e), where the superscript $(\cdot)^*$ indicates that the non-identifiable latent mass matrix $\mathbf{M}$ is set to the true value.

# Concluding remarks

To summarize, learning approximate physics implicitly can lead to incorrect qualitative behavior and a decrease in accuracy. Variational integrator networks are a class of network architectures that encode physical laws explicitly, which improves data-efficiency and produces well behaved forecasts, particularly over longer trajectories. Variational integrator networks can be used to learn from noisy observation of a physical system, or as an architecture for variational autoencoders, enabling them to learn from pixel observations.

# References

^{1}

E. Haber and L. Ruthotto. Stable architectures for deep neural networks. Inverse Problems, 34(1):014004, 2017.

^{2}

W. E. A proposal on machine learning via dynamical systems. Communications in Mathematics and Statistics, 5(1):1–11, 2017.

^{3}

R. T. Q. Chen, Y. Rubanova, J. Bettencourt, and D.Duvenaud. Neural ordinary differential equations. NeurIPS, 2018.

^{4}

J. E. Marsden, S. Pekarsky, S. Shkoller, and M. West. Variational methods, multisymplectic geometry and continuum mechanics. Journal of Geometry and Physics, 38(3–4):253–284, 2001.