Naive Bayes - machine learning algorithm for classification problems

July 15, 2017, 2:35 p.m. By: Vishakha Jha

Naive Bayes

Collections are the basic need for today to develop and prepare yourself for tomorrow whether bundles of documents, files or web pages. But how to handle such huge amount of data, here comes the method of Naive Bayes classification as the rescue. It is said to be a very simple classification algorithm that makes some strong assumptions regarding the independence of each input variable. The algorithm gives great results when applied to textual data analysis. It is based on Bayes Theorem named after Thomas Bayes which is associated with the concept of conditional probability.

It is a classification algorithm for two-class and multi-class classification instance. Instead of calculating the values of each attribute value, as they are expected to be conditionally independent provided that the target value and calculated as P(d1|h) * P(d2|H) and so on. Broadly there are three types of Naive Bayes algorithm including – Gaussian based on the concept of normal distribution, MultiNominal applicable on multinomially distributed data and Bernoulli which requires binary value and is used on the data which is classified according to multivariate Bernoulli distributions.

The algorithm is said to be the fast and highly efficient algorithm that depends on doing a bunch of counts. It can be easily tested on the small dataset and is expected to perform well when the input variables are categorical. But as it assumes all the features to be unrelated hence it cannot learn the relationship between features. It is said to have data scarcity which leads to data instability. Though it assumes conditional independence the algorithm presents good performance in various application domains.

It has great application in the field of Sentiment Analysis used by Facebook along with Document Categorization to find relevancy scores and for Email spam filtering to classify email as spam or not which is accepted by Google. It is used in Sci-Kit Learn for implementation of Data Science Libraries in Python and in R as well.

Example : Naive Bayesian