LightGBM: A Light Gradient Boosting Machine

Feb. 25, 2018, 8:45 a.m. By: Kirti Bakshi

LightGBM

Today, Data Science is known to be one among the fastest growing fields in the world. Every day there is a launch of some or the other new algorithm, some of which fail and some achieve success. Today, we will discuss one of the most successful machine learning algorithm, Light GBM.

Before moving ahead, What actually is LightGBM?

LightGBM is a fast, distributed as well as high-performance gradient boosting (GBDT, GBRT, GBM or MART) framework that makes the use of a learning algorithm that is tree-based, and is used for ranking, classification as well as many other machine learning tasks. It is known to be under the shade of the DMKT project of Microsoft.

Now, What are its Advantages?

LightGBM as we already know is a gradient boosting framework that makes the use of tree-based learning algorithms. It is designed with the following advantages in order to be distributed as well as efficient:

  • Higher efficiency as well as faster training speed

  • Usage of lower memory

  • Better accuracy

  • Supports Parallel and GPU learning

  • Data of large-scale can be handled.

On the basis of all the experiments that have been performed on public datasets, it is shown that LightGBM with significantly lower memory consumption on both efficiency and accuracy, can outperform other existing boosting framework.

Adding on, the experiments also show that LightGBM by using multiple machines for training in specific settings can achieve a linear speed-up.

Why is Light GBM gaining popularity at an extreme level?

Day by day the size of data is increasing and it is becoming increasingly difficult for traditional data science algorithms to give faster results. Coming to Light GBM, it is prefixed as ‘Light’ because of its high speed. Light GBM, to its advantage, can handle the large size of data and takes lower memory to run.

One more reason why Light GBM is popular is that it focuses more on the accuracy of results. It also supports GPU learning and this is why data scientists are widely using LGBM for development of data science applications.

Now, How does it differ from other tree-based algorithms?

While other algorithms grow trees horizontally, Light GBM grows tree vertically meaning that Light GBM grows tree leaf-wise while other algorithms grow level-wise. It, in order to grow, will choose the leaf that has a max delta loss. When growing the same leaf, Leaf-wise algorithm can reduce more loss when compared to a level-wise algorithm.

LightGBM

Since we now know the concept of Light GBM, what about its implementation?

Implementation of Light GBM is quite easy, the only thing that turns out to be complicated is parameter tuning. Light GBM covers more than 100 parameters and It is very important for an implementer to know at least some basic parameters of Light GBM.

It Ultimate Plus Points:

  • Easy To Use

  • Growing

  • Multilanguage

Now, The main question is: Can we use Light GBM everywhere?

No, it is not advisable to use LGBM on small datasets. Also, Light GBM is sensitive to overfitting and can easily overfit small data. There is no threshold on the number of rows.

Metrics And Applications:

1. Some of the Applications supported by the algorithm are:

  • Regression, L2 loss is the objective function.

  • Binary classification, the objective function is log loss

  • Cross-entropy

  • multi-classification

  • Lambdarank, lambdarank with NDCG is the objective function.

2. Some of the Metrics supported are:

  • L1 loss

  • L2 loss

  • Log loss

  • Classification error rate

  • AUC

  • NDCG

  • MAP

For more regarding the same, go through the link mentioned below.

LightGBM: Optimization in Parallel Learning:

LightGBM provides following parallel learning algorithms:

  • Feature Parallel: Feature parallel in the decision tree aims to parallel the "Find Best Split".

  • Data Parallel: This parallel learning aims to parallel the decision learning as a whole.

  • Voting Parallel: The communication cost here in Data Parallel is further reduced by Voting parallel to constant cost.

With all of the above, one can definitely say that it is one of the most successful algorithms to be tried at least once by all.

For More Information: GitHub