Unsupervised Learning for Physical Interaction through Video Prediction