Announcing Tensor Comprehensions

Feb. 16, 2018, 8:04 a.m. By: Kirti Bakshi

Tensor Comprehensions

Facebook AI Research (FAIR) moves with a commitment to open science and work with the machine learning community in order to push AI research further and has made the announcement of the release of Tensor Comprehensions, a mathematical language and C++ library that will act as a helping aid to bridge the gap between engineers who focus on the practical needs of running large-scale models on various hardware backends and researchers, who communicate in terms of mathematical operations. The main feature of Tensor Comprehensions that is differentiating is that it represents a unique take on Just-In-Time compilation in order to produce the high-performance codes automatically and on-demand that the machine learning community needs. The language is a collaboration between Facebook, Inria, ETH Zurich and MIT as of now.

Now, moving onto Order of magnitude productivity gains:

For the creation of new high-performance machine learning (ML) layers, the typical workflow can span days or weeks of engineering work through a process that has two phases:

  • A researcher at a Numpy-level abstraction, writes a new layer chaining existing operations in a deep learning library like PyTorch, and then tests it in small-scale experiments. The performance of the code implementing the validated idea needs to be accelerated by an order of magnitude to run large-scale experiments.

And then, for GPUs and CPUs takes the layer and writes code that is efficient:

  • The engineer needs to be a Computing expert of High-Performance of which only a limited supply of talent is available.

  • The engineer needs to acquire context, map out a strategy, write and debug code

  • Moving the code to the backend involves mundane tasks, such as verbose argument checking and adding boilerplate integration code.

The team, therefore, anticipates great practical value in open-sourcing a package that shortens this process from days or weeks to minutes. With Tensor Comprehensions, their vision is for researchers to write their idea out in mathematical notation, that automatically gets compiled and tuned by the system, and the result is hence a specialized code with good performance.

In this release, you are provided with:

  • A mathematical notation to express a broad family of ML ideas in a simple syntax.

  • A C++ frontend for this mathematical notation based on Halide IR

  • A polyhedral Just-in-Time (JIT) compiler based on Integer Set Library (ISL)

  • A multi-threaded, multi-GPU auto tuner based on evolutionary search.

Moving to its Related earlier work:

Halide is one recent language that in the adjacent field of high-performance image processing has become popular. This language uses similar high-level functional syntax to describe an image processing pipeline, and then, in a separate block of code, schedules it explicitly onto the hardware, specifying in detail how operations are tiled, vectorized, parallelized, and fused. This for people with architectural expertise makes it a very productive language, but it is difficult to use for most ML practitioners. Even though automatic scheduling of Halide is an active research area, but for ML code running on a GPU, there is no good solution yet.

Tensor Comprehension Halide

Tensor Comprehension and Halide:

Tensor Comprehensions makes the use of Halide compiler as a library. They build on the intermediate representation (IR) and analysis tools of Halide, then so that one can without the need to explicitly say how it is going to run write layers using similar high-level syntax pair it with techniques of polyhedral compilation. They also found ways to make this language even more concise, by eliminating the need to specify loop bounds for reductions.

PERFORMANCE:

While there are still many improvements in the works, On the performance side, Tensor Comprehensions in favourable cases can already match or beat the performance of current ML frameworks integrated with hand-tuned libraries. This mainly is achieved by the ability to adapt code generation strategies to sizes of specific problems.

The following bar chart illustrates performance gains that were observed when comparing kernels produced automatically by Tensor Comprehensions against existing alternatives in Caffe2 and ATen.

Pascal GPU

As they extend their contribution to more hardware backends, the language will also complement fast libraries that are written by hardware manufacturers such as Intel and NVIDIA, and in conjunction will be used with libraries such as MKL, CUDNN or NNPack as well.

What can we expect next:

This release will allow researchers and programmers to communicate concisely the intention of their program by writing layers in a notation that is similar to the math's they use in their papers. They will also be able to take that notation and translate it easily into a fast implementation in a matter of minutes rather than hours and days as before. As the tool chain grows, it is expected that the usability and performance will increase and benefit the whole community.

The release of PyTorch integration for Tensor Comprehensions is planned for a later date.

This work is as of now in the early stages and is expected to be shared early and looks forward to improving it with feedback from the community.

For More Information: FaceBook Research