Robust Video Super-Resolution with Learned Temporal Dynamics

Ding Liu1, Zhaowen Wang2, Yuchen Fan1, Xianming Liu3,
Zhangyang Wang4, Shiyu Chang5, Thomas Huang1

1University of Illinois at Urbana-Champaign 2Adobe Research
3Facebook 4Texas A&M University 5IBM Research


Video super-resolution (SR) is the task of estimating a high-resolution (HR) video sequence from a low-resolution (LR) one. Given that deep learning has been successfully applied to the task of single image SR, which demonstrates the strong capability of neural networks for modeling spatial relation within one single image, the key factor to conduct video SR is how to efficiently and effectively exploit the temporal dependency among consecutive LR frames other than the spatial relation. However, this remains challenging because complex motion is difficult to model and can bring detrimental effects if not handled properly. We tackle the problem of learning temporal dynamics from two aspects. First, we propose a temporal adaptive network that can adaptively determine the optimal scale of temporal dependency. Inspired by the Inception module in GoogLeNet, filters on various temporal scales are applied to the input LR sequence before their responses are adaptively aggregated, in order to fully exploit the temporal relation among consecutive LR frames. Second, we decrease the complexity of motion among neighboring frames using a spatial alignment network that can be end-to-end trained with the temporal adaptive network and has the merit of increasing the robustness to complex motion and the efficiency compared to competing image alignment methods. We provide a comprehensive evaluation of the temporal adaptation and the spatial alignment modules. We show the temporal adaptive design considerably improve SR quality over its plain counterparts, and the spatial alignment network is able to attain comparable SR performance with the sophisticated optical flow based approach, but requires much less running time. Overall our proposed model with learned temporal dynamics is shown to achieve state-of-the-art SR results in terms of not only spatial consistency but also temporal coherence on public video datasets, compared with other recent video SR approaches.

Tempral Adaptive Network

tempral adaptive network

Visualization of Weight Maps

weight maps

Spatial Alignment Network

spatial alignment network

Super-Resolved Video Results

foliage   walk   temple

Super-Resolved Frame Results

Vid4 dataset


Ding Liu, Zhaowen Wang, Yuchen Fan, Xianming Liu, Zhangyang Wang, Shiyu Chang and Thomas Huang, Robust Video Super-Resolution with Learned Temporal Dynamics. Proceedings of the IEEE International Conference on Computer Vision, 2017. [pdf][bib]

Ding Liu, Zhaowen Wang, Yuchen Fan, Xianming Liu, Zhangyang Wang, Shiyu Chang, Xinchao Wang and Thomas Huang, Learning Temporal Dynamics for Video Super-Resolution: A Deep Learning Approach. IEEE Transcations on Image Processing, 2018 (accepted). [pdf][bib]