Naive Bayes – A classification algorithm under supervised learning group based on probabilistic logic. This is one of the simplest machine learning algorithm of all times. Generative algorithms from GANs are also used as classifiers, interestingly they can do much more than categorisation though. Logistic regression is another classification algorithm which models posterior probability by learning input to output mapping and creates discriminative model.
This is AILabPage‘s another tutorial series post to discuss about Naive Bayes classifier algorithm to explain it in simple terms.
- What is Naive Bayes Algorithm?
- Why it’s called Naive
- How it works?
- Mathematics Behind Naive Bayes Algorithm
- When to use it?
Lets Unfold – Naive Bayes
AILabPage defines machine learning as “A focal point where business, data, experience meets emerging technology and decides to work together”. If you have not unfolded machine learning jargon already then please take look on our machine learning post series library.
This probabilistic (science of uncertainty and data) algorithm is used for making prediction in a classification fashion after predictive model gets fully tested. Naïve bayes Classifier is definitive and most important milestone to understand and to begin machine learning journey.
Naive Bayes algorithm is a method set of probabilities. For each attribute from each class set, it uses probability to make predictions. This algorithm falls under supervised machine learning approach. The data model comes out of this effort is called as “Predictive Model” with probabilistic problems at foundation leve
This falls under a family of probabilistic algorithms that take advantage of probability theory and Bayes theorem. As per Wikipedia “Bayes’ theorem describes the probability of an event, based on prior knowledge of conditions that might be related to the event. For example when the temperature going to be good, there is a good probability that my son will play tennis.
Why It is Naive?
“Naive” because of its nature and assumptions it takes. By default this algorithm assumes that all attributes are independent of each other. This is simplest algorithm to understand and implement. How this works how? to answer this question, Bayes theory of probability is the only answer.
Naive Bayes classifier works well especially for text data. This one is the most simple algorithm that can be applied to text data with strong independence assumptions between the features. In other words it do well for when of categorical input variables compared to numerical variables. For instance, if our goal is to find a fruit based on its colour, shape, and skin, then spherical shape, orange colour and thick skin fruit would most likely be an orange.
An Intuitive Explanation or More
While walking on football ground the first white round moving object we will see and label it as football but on cricket play ground, label gets change to cricket ball. Matter of fact the object can be any thing, then why not something else. This is where probability comes into play.
Human mind is programmed in such a way that it can easily and quickly classify objects with labels by its features. Feature mapping is so tightly coupled in brain that its very unlikely to make mistake. Because human brain does feature extraction at rapid speed and apply probability swiftly thus classification of object gets labelled as a ball.
In short, what we have concluded in above example is a probabilistic classifier. Naive Bayes algorithm learns the probability of an object with certain features belonging to a particular class.
Mathematics Behind Naive Bayes Algorithm
It is simply written as
So here “A” represents the class eg. ball or anything. “B” represents features calculated individually. Calculating posterior probability was made simple in Bayes theorem it provides a method for same i.e. P ( A| B ), from P ( A ), P ( B ), and P ( B| A ).
- P ( A | B ) is the posterior probability of class A given predictor (features).
- P ( A ) is the probability of class.
- P ( B | A ) is the likelihood which is the probability of predictor given class.
- P ( B ) is the earlier probability of predictor.
The assumption actually strong assumption taken by Naive Bayes classifier is that the effect of the value of a predictor ( B ) on a given class ( A ) is independent of the values of other predictors. Because of the assumption it is also called as class conditional independence.
To break down each and rewrite our function with above example in mind it can be written as below.
Lets put some real example here.
Imagine we have 3 sets of playing balls with feature as diameter in cm, red in colour and hard in feel. To tabulate the data as sample training data we have following.
To train the classifier, we count up various subsets of points and use them to compute the earlier and conditional probabilities. Now we know Naive Bayes classifier is just a matter of counting some attributes frequency against each class. Other words how many times each attribute co-occurs with each class.
From our training data set, let’s try to predict the class of another ball. Lets assume for our prediction problem we have data set as
- Diameter-22cm – Yes or 1
- Red – Yes or 1
- Hard – Yes or 1
Now to classify above attributes in the formula its simple to calculate our target label to make a prediction. The result of probability calculation from each class i.e. testing against all 3 parameters with formula above will lead to best prediction. The class with highest probability will win the game.
To implement and build a Naive Bayes Algorithm in Python. The first step is to import all necessary libraries.
- import numpy as np
- import pandas as pd
- from sklearn.naive_bayes import GaussianNB
- from sklearn.preprocessing import LabelEncoder
- from sklearn.model_selection import train_test_split
- from sklearn.metrics import accuracy_score
Steps to Implement Naive Bayes Algorithm
The main aim in the Naive Bayes algorithm is to calculate the conditional probability of an object with a feature vector f1, f2,…,fn belongs to a particular class. Below are some steps to implement.
- Data Loading:
- Load the data from which ever file
- Format it in xls or in csv file format
- Split it into training and test data sets.
- Preparing Data:
- Summarise properties of training dataset to calculate probabilities to make predictions.
- Generate a forecast:
- Use training dataset from step-2, clean it and organise it.
- Generate a single prediction on test dataset.
- Performance evaluation:
- Evaluate the model
- Iterate steps 2, 3 and 4 to improve the accuracy of predictions.
- Find the least error margin.
- Clean-up and Finalisation:
- Cleaning, organising and polishing the predictive model code
- Used predictive model with all elements to present a complete quality code
- Implement and keep the code for your algorithm.
If we have machine learning classification problem in hand then Naive Bayes algorithm turns out to be the best and first choice. Its simple and easy with multiple features and classes.
Advantages for Naive Bayes
- The strongest advantage of Naive Bayes is its real time & multi class prediction.
- Naive Bayes is a strong trigger for learning classifier that too with speed
- It is a relatively easy algorithm to build and understand and easily trainable using a small data set.
Books + Other readings Referred
- Research through Open Internet
- AILabPage (group of self-taught engineers) members hands-on lab work.
- Machine Learning: An Algorithmic Perspective
Feedback & Further Question
Do you have any questions about Machine Learning, Data Science or Data Analytics? Leave a comment or ask your question via email . Will try my best to answer it.
Points to Note:
All credits if any remains on the original contributor only. We have kept our discussion limited i.e only around Naive Bayes algorithm though for more details on machine learning and its details you can browse the blog.
Conclusion – In post we have learnt that the Naive Bayes is a supervised machine learning algorithm. Also we have learnt about its usage which is mainly in text-based data sets for learning and understand something about text data sets. It is called “Naive” because of its nature strong thought process. Assumptions it takes for occurrence of a certain feature even though they are independent of the occurrence of other features.
In above example its clear that once algorithm is trained, it can then give a label without being told. It can then predict which classifier based on what it was taught. We have seen, the training data is vital to the success of the model.