Implementing a Naive Bayes Classifier from scratch in python
One of the frequent topics I have come across while diving into the machine learning world is naive Bayes; the equation that is not so naive.
This article is trying to simplify the math behind Multinomial Naive Bayes.
The complete code for this article can be found here.
What is Naive Bayes?
Naive Bayes is a probabilistic classifier, which means this approach uses probability and/or frequency to classify.
A typical example (which we will test out) is ham/spam email classification, the classifier takes a look at our labeled spam and ham emails and classifies them based on the previously observed frequency of words.
This “Naive” classifier somewhat mimics us, humans, in this scenario; how would I know that an email is a spam; from previous emails! I know that the email from my hr is “hr@company.com” based on previous interactions with that email or knowledge, therefore seeing an email from “hr1@company.com” could raise suspicion.
The equation below represents Naive Bayes:
Breaking down the equation above, it states to find the argmax of the sum of the log of the prior and the log of the likelihood of our email.
Smoothing.
Now, we understand how Naive Bayes classifier works; but looking closely at it we find that the classifier would only work on new emails when we…