Why we have different representation for different domain, why can't we have shared or single representation of all modalities like image, text and speech. This paper is a baby step in this direction. Variational Auto Encoders for Multi Modal Generative Network.