Presenting A Modern Big Data Algorithm: Hyperlearn- 50%+ Faster Machine Learning.

Nov. 6, 2018, 10:37 a.m. By: Kirti Bakshi


On transition to the first quarter of the 21st century, datasets grew larger and wider. Alongside, the running of primitive and slow algorithms result in headaches, productivity and economic losses.

Therefore, by optimizing algorithms used in stock market predictions, climate change modelling, artificial intelligence and cancer research, the world can dramatically benefit from faster and more accurate numerical methods.

What is HyperLearn?

HyperLearn, that was started last month by Daniel Hanchen and still has some unstable package is a Statsmodel, is a result of the collaboration of languages such as Pandas, PyTorch, NoGil, Numba, Numpy, Scipy & LAPACK, and also has similarities to Scikit-Learn.

Daniel aims to make Linear Regression, Ridge, PCA, LDA/QDA faster, which then flows onto other algorithms being faster. HyperLearn already has staggering results, as this Statsmodels combo incorporates novel algorithms to make it 50% more faster and enables it to use 50% lesser RAM along with a leaner GPU Sklearn.

Apart from this, HyperLearn also has embedded statistical inference measures, and can be called similar to a syntax of Scikit Learn.

What are the Key Methodologies and Aims of the HyperLearn project?

The key methodologies and Aims of the project are:

1. Parallel For Loops:

  • Hyperlearn for loops will include Memory Sharing and Memory Management.

  • CUDA Parallelism will be made possible with the help of PyTorch & Numba.

2. It is 50%+ faster and leaner:

  • Matrix operations that have been improved include: Matrix Multiplication Ordering, Element Wise Matrix Multiplication reducing complexity to O(n^2) from O(n^3), reducing Matrix Operations to Einstein Notation and Evaluating one-time Matrix Operations in succession to reduce RAM overhead.

  • Applying QR Decomposition and then SVD(Singular Value decomposition) in some cases might be faster as well.

  • Utilization of the structure of the matrix in order to compute faster inverse

  • Computing SVD(X) and then getting pinv(X) is sometimes faster than pure pinv(X)

3. Statsmodels is sometimes slow:

  • Prediction Intervals, Confidence, Hypothesis Tests & Goodness of Fit tests for linear models are optimized.

  • Making the use of Einstein Notation & Hadamard Products where possible.

  • Computing only what is necessary to compute (Diagonal of matrix only)

  • Fixing the flaws of Statsmodels on notation, speed, memory issues and storage of variables.

4. Deep Learning Drop In Modules with PyTorch:

  • Making the use of PyTorch to create Scikit-Learn like drop in replacements.

5. 20%+ Less Code along with Cleaner Clearer Code

  • Using Decorators & Functions wherever possible.

  • Intuitive Middle Level Function names like (isTensor, isIterable).

  • Handles Parallelism easily through hyperlearn.multiprocessing

6. Accessing Old and Exciting New Algorithms:

  • Matrix Completion algorithms – Non Negative Least Squares, NNMF

  • Batch Similarity Latent Dirichelt Allocation (BS-LDA)

  • Correlation Regression and many more!

Not limiting it here, Daniel further went on to publish some prelim algorithm timing results on a range of algo's from PyTorch, Numpy, MKL, Scipy, MKL, HyperLearn’s methods + Numba JIT compiled algorithms.

The key findings on the HyperLearn statsmodel are mentioned below:

  • HyperLearn’s Pseudoinverse has no speed improvement

  • HyperLearn’s PCA will have over 200% improvement in speed boost.

  • HyperLearn’s Linear Solvers will be over 1 times faster.

Source And Information: GitHub