A Nice Easy Tutorial To Follow On Capsule Networks Based On Sabour, Frosst, And Hinton's Paper

Dec. 22, 2017, 4:52 p.m. By: Kirti Bakshi

Capsule Networks

In this Tutorial Video by Aurelien Geron, the pure emphasis has been made on all about Capsule Networks that is a new architecture for Neural Networks has been clearly explained.

Geoffrey Hinton had the idea of Capsule Networks Several years ago, and in the same context, he published a paper in 2011 that introduced many of the key ideas but faced a hard time in making them work properly, until now.

Later this year, in October, a paper called "Dynamic Routing Between Capsules" was published by Sara Sabour, Nicholas Frosst and Geoffrey Hinton where they succeeded in reaching the state-of-the-art performance on the MNIST dataset and also demonstrated considerably better results as compared to the Convolutional Neural Nets(CNN) on highly overlapping digits.

So what are Capsule Networks Exactly? In Computer Graphics, we start with an abstract representation of a scene, and each object type has various instantiation parameters. Then, you call some rendering function, and then you finally get an image.

Inverse Graphics is just the reverse process. You start with an image, and you try to find what object it contains, and what are their instantiation parameters. A capsule network is basically a neural network that tries to perform inverse graphics. It is composed of many capsules. A capsule, in short, is any function that provides a hand in predicting the presence as well as the instantiation parameters of a particular object at a given location.

One key feature of Capsule Networks is that they preserve detailed information about the objects location and its pose, throughout the network. In a regular Convolutional Neural Net(CNN), there are generally severally pooling layers and unfortunately, these pooling layers tend to lose information, such as the precise location and pose of the objects. It really does not prove to be a big deal if you just want to classify the whole image, but it makes it challenging to perform accurate image segmentation or object detection(which require precise location and pose). The fact that the capsule is equivariant makes them very promising for these applications.

The video takes a turn from here, and a stepwise procedure is made to be available to help follow according to the paper, along with which there is also a discussion of the figures that were present in the paper.

Moving on, the video ends with the summarization of the Pro's And Con's that are mentioned below:

The positives are:

  • Capsule Networks have reached high accuracy on MNIST, and promise on CIFAR10

  • The requirement of Less training data.

  • Position and pose information are preserved (equivariance)

  • This is promising for image segmentation and object detection

  • Routing by agreement is great for overlapping objects (explaining away)

  • Capsule activations nicely map the hierarchy of parts

  • Offers Robustness to affine transformations

  • Activation vectors are easier to interpret( rotation, thickness, skew, etc)

The negatives are:

  • Not state of the art on CIFAR10 though it is a good start.

  • Not tested yet on larger images(e.g.: ImageNet): will it work well?

  • Results in slow training due to the inner loop in the routing by agreement algorithm.

  • A CapsNet cannot see two very close identical objects (called "crowding", and it has been observed as well in human vision).

In relation to the same, given below are the links to its implementations, and more:

NIPS 2017 Paper:

  • Dynamic Routing Between Capsules, by Sara Sabour, Nicholas Frosst, Geoffrey E. Hinton

  • Link: Arvix.org

The 2011 paper:

  • Transforming Autoencoders by Geoffrey E. Hinton, Alex Krizhevsky and Sida D. Wang

  • Link: Click Here

CapsNet implementations:

  • TensorFlow implementation by Aurelien Geron: Link to GitHub: Click Here. It is presented in the video.

  • Keras w/ TensorFlow backend: GitHub

  • TensorFlow: GitHub

  • PyTorch: GitHub

Book:

  • Name: Hands-On Machine with Scikit-Learn and TensorFlow O'Reilly, 2017

  • Amazon: Click Here.

About Aurelien Geron: