Naive Bayes – A classification algorithm under supervised learning group based on probabilistic logic. This is one of the simplest machine learning algorithms of all times. Generative algorithms from GANs are also used as classifiers, interestingly they can do much more than categorisation though. Logistic regression is another classification algorithm which models posterior probability by learning input to output mapping and creates a discriminative model.
This is AILabPage‘s another tutorial series post to discuss the Naive Bayes classifier algorithm to explain it in simple terms.
- What is Naive Bayes Algorithm?
- Why it’s called Naive
- How it works?
- Mathematics Behind Naive Bayes Algorithm
- When to use it?
Lets Unfold – Naive Bayes
AILabPage defines machine learning as “A focal point where business, data, experience meets emerging technology and decides to work together”. If you have not unfolded machine learning jargon already then please take look on our machine learning post series library.
This probabilistic (science of uncertainty and data) algorithm is used for making a prediction in a classification fashion after the predictive model gets fully tested. Naïve Bayes Classifier is a definitive and most important milestone to understand and to begin machine learning journey.
Naive Bayes algorithm is a method set of probabilities. For each attribute from each class set, it uses probability to make predictions. This algorithm falls under a supervised machine learning approach. The data model comes out of this effort is called as “Predictive Model” with probabilistic problems at foundation level
This falls under a family of probabilistic algorithms that take advantage of probability theory and Bayes theorem. As per Wikipedia “Bayes’ theorem describes the probability of an event, based on prior knowledge of conditions that might be related to the event. For example, when the temperature going to be good, there is a good probability that my son will play tennis.
Why It is Naive?
“Naive” because of its nature and assumptions it takes. By default, this algorithm assumes that all attributes are independent of each other. This is the simplest algorithm to understand and implement. How this works how? to answer this question, the Bayes theory of probability is the only answer.
Naive Bayes classifier works well, especially for text data. This one is the most simple algorithm that can be applied to text data with strong independence assumptions between the features. In other words, it does well for when of categorical input variables compared to numerical variables. For instance, if our goal is to find a fruit based on its colour, shape, and skin, then spherical shape, orange colour and thick skin fruit would most likely be an orange.
An Intuitive Explanation or More
While walking on football ground the first white round moving object we will see and label it as football but on cricket playground, the label gets a change to a cricket ball. Matter of fact the object can be anything, then why not something else. This is where probability comes into play.
The human mind is programmed in such a way that it can easily and quickly classify objects with labels by its features. Feature mapping is so tightly coupled in the brain that it’s very unlikely to make mistake. Because the human brain does feature extraction at rapid speed and applies probability swiftly thus classification of the object gets labelled as a ball.
In short, what we have concluded in the above example is a probabilistic classifier. Naive Bayes algorithm learns the probability of an object with certain features belonging to a particular class.
Mathematics Behind Naive Bayes Algorithm
It is simply written as
So here “A” represents the class eg. ball or anything. “B” represents features calculated individually. Calculating posterior probability was made simple in Bayes theorem it provides a method for same i.e. P ( A| B ), from P ( A ), P ( B ), and P ( B| A ).
- P ( A | B ) is the posterior probability of class A given predictor (features).
- P ( A ) is the probability of class.
- P ( B | A ) is the likelihood which is the probability of predictor given class.
- P ( B ) is the earlier probability of predictor.
The assumption actually strong assumption taken by Naive Bayes classifier is that the effect of the value of a predictor ( B ) on a given class ( A ) is independent of the values of other predictors. Because of the assumption, it is also called as class conditional independence.
To break down each and rewrite our function with the above example in mind it can be written as below.
Lets put some real example here.
Imagine we have 3 sets of playing balls with the feature as diameter in cm, red in colour and hard in feel. To tabulate the data as sample training data we have following.
To train the classifier, we count up various subsets of points and use them to compute the earlier and conditional probabilities. Now we know Naive Bayes classifier is just a matter of counting some attributes frequency against each class. Other words how many times each attribute co-occurs with each class.
From our training data set, let’s try to predict the class of another ball. Let’s assume for our prediction problem we have data set as
- Diameter-22cm – Yes or 1
- Red – Yes or 1
- Hard – Yes or 1
Now to classify above attributes in the formula its simple to calculate our target label to make a prediction. The result of probability calculation from each class i.e. testing against all 3 parameters with the formula above will lead to best prediction. The class with the highest probability will win the game.
To implement and build a Naive Bayes Algorithm in Python. The first step is to import all the necessary libraries.
- import numpy as np
- import pandas as pd
- from sklearn.naive_bayes import GaussianNB
- from sklearn.preprocessing import LabelEncoder
- from sklearn.model_selection import train_test_split
- from sklearn.metrics import accuracy_score
Steps to Implement a Naive Bayes Algorithm
The main aim in the Naive Bayes algorithm is to calculate the conditional probability of an object with a feature vector f1, f2,…, fn belongs to a particular class. Below are some steps to implement.
- Data Loading:
- Load the data from whichever file
- Format it in “xls” or in a “csv” file format
- Split it into training and test data sets.
- Preparing Data:
- Summarise properties of training dataset to calculate probabilities to make predictions.
- Generate a forecast:
- Use training dataset from step-2, clean it and organise it.
- Generate a single prediction on the test dataset.
- Performance evaluation:
- Evaluate the model
- Iterate steps 2, 3 and 4 to improve the accuracy of predictions.
- Find the least error margin.
- Clean-up and Finalisation:
- Cleaning, organising and polishing the predictive model code
- Used a predictive model with all elements to present a complete quality code
- Implement and keep the code for your algorithm.
If we have a machine learning classification problem in hand then the Naive Bayes algorithm turns out to be the best and first choice. It’s simple and easy with multiple features and classes.
Advantages for Naive Bayes
- The strongest advantage of Naive Bayes is its real-time & multiclass prediction.
- Naive Bayes is a strong trigger for learning classifier that too with speed
- It is a relatively easy algorithm to build and understand and easily trainable using a small data set.
Books + Other readings Referred
- Research through Open Internet
- AILabPage (group of self-taught engineers) members hands-on lab work.
- Machine Learning: An Algorithmic Perspective
Feedback & Further Question
Do you have any questions about Machine Learning, Data Science or Data Analytics? Leave a comment or ask your question via email. Will try my best to answer it.
Points to Note:
All credits if any remains on the original contributor only. We have kept our discussion limited i.e only around Naive Bayes algorithm though for more details on machine learning and its details you can browse the blog.
Conclusion – In the post, we have learnt that Naive Bayes is a supervised machine learning algorithm. Also, we have learnt about its usage which is mainly in text-based data sets for learning and understands something about text data sets. It is called “Naive” because of its nature strong thought process. Assumptions it takes for the occurrence of a certain feature even though they are independent of the occurrence of other features.
In the above example its clear that once the algorithm is trained, it can then give a label without being told. It can then predict which classifier based on what it was taught. We have seen, the training data is vital to the success of the model.