Paper Summary

Multi Modal Variational AutoEncoders

Why we have different representation for different domain, why can't we have shared or single representation of all modalities like image, text and speech. This paper is a baby step in this direction. Variational Auto Encoders for Multi Modal Generative Network.

Understanding Deep Learning requires Re-Thinking Generalisation

Paper Link Generalisation in Machine Learning Models In Supervised Machine Learning, we train the model on training set and evaluate the model on a validation/testing set which is unseen during the training process.

Recurrent Multimodal Interaction for Referring Image Segmentation

This idea of treating this problem as sequential problem is innovative and having an mLSTM cell to encode both visual and linguistic features is also good. This gives model ability to forget all those pixel which defy correspondence initially. My guts are that this kind of model would work good even where there are small objects because at each time step there would be a reduction/change of probable pixels for segmentation.

My Journey to Natural Language Processing

Natural Language Processing I consider the reader to be familiar with normal machine learning terminology and methods. We here are going to deal with Supervised learning mostly. Since all machine learning models are mathematical function approximators, we can;t input a sentence to a mathematical model!

Xception: Deep Learning with Depthwise Separable Convolutions

We know each channel in Convolution block represents some high level feature say in human, each channel would map a body part(just for example!). Then this idea of depthwise conv proposes that in process of identifying one body part don't process info from other body part info, it would disturb the signal in processing.