Statistics and Machine Learning – Does Statistics and ML walk together like partners?. Is Machine learning just a polished or glorified version of statistics?. There are many question such like this, at-least in my mind. Even today I get these questions from many of my fellow lab members. Why so many people still struggle with these two in different areas for their day to day work and carry confusion in their mind.
Is Machine Learning a Computerised Version of Statistics?
Is Machine learning a computerised or glorified version of statistics?. Answer to this question is simple which is “NO” (In my personal opinion). To my understanding they both complement each other and work like partners. The two friends from school days just crossing each other on some occasions.
There was also the “Statistical Modeling: The Two Cultures” paper by Leo Breiman in 2001. This paper argued that statisticians rely too heavily on data modeling, and that machine learning techniques are making progress by instead relying on the predictive accuracy of models.
Are the terms like Statistics and Machine Learning synonyms?
Is there any difference between Statistics and Machine Learning jargons?
Lets not bring Data Mining in our minds here. The focus is only on these unfriend friends moreover in my personal opinion DM is not related here. Just to remind you some time back people also called Blockchain a glorified and polished version of swarm intelligence which is not the case any ways.
Statistics and Machine Learning
We have classic example from linear regression (LR) which was developed in the field of statistics. Linear regression is studied as a model for understanding the relationship between input and output numerical variables. Now it has been borrowed by machine learning algorithms. It is both a statistical algorithm and a machine learning algorithm.
Though there are some methodological differences between machine learning and statistics but those really don’t divorce these two friends at all. The difference between the two is that machine learning emphasizes optimization and performance, while other is concerned about sample, population and hypothesis. Machine learning is more concerned with making predictions, even if the prediction can not be explained well.
How to Lie with Statistics is a book written by Darrell Huff in 1954 presenting an introduction to statistics for the general reader. He was not a statistician, but was a journalist who wrote many “how to” articles as a freelancer.
Roles – Machine Learning, Statistics and Computer Science
- Machine Learning – Optimize a performance criterion using example data or past experience.
- Statistics – Inference from a sample data and stress on hypothesis.
- Computer science – Efficient algorithms to solve the optimization problem and representing & evaluating the model for inference.
To be a better Data Scientist I should have started as Statistician
Though its not entirely true but upto some extent Statistics can be called as graphical branch of mathematics (In my personal opinion). As per wiki statistics is “The practice or science of collecting and analysing numerical data in large quantities, especially for the purpose of inferring proportions in a whole from those in a representative sample.”
It is the science of data which deals with the collection, analysis, interpretation, and presentation of masses of numerical data. In general Statistics talk about more of human language and touch. Like Statistician would say prediction coming out of variable Y is true only in case what variable a, b, c and I are saying is true.
Machine learning is a much younger discipline of the two. It is build on the foundations of statistics. It has managed to absorb much of statistics philosophy and many techniques over the years. This focuses on data that is too complex for humans to figure out its meaningful regularities. Random samples are not good enough for the task here.
Statistics vs Machine Learning
When the model is projected as simulation this is where statistics and computation seem to converge beautifully. In short statistics knows very well that In today’s world it would be silly to ignore the massive amounts of data available for analysis. And statistics alone can not do much.
Handling humungous amount of data it needs support of Machine Learning or computer power. For analysing massive data one needs to worry about storage and computation most likely computation in distributed manner.
Approximate Bayesian Computation – ABC
In statistics a task is called “ABC” to perform inference over some variables. Like weather forecasting which is an area where all variables have clear semantic interpretations. It is the task of the statistician to perform inference over these variables. However it demands complex skill set and requires careful thought on where and how to spend the available computation power.
The same inference task in machine learning when gets solved known as “probabilistic programming”. In addition develops specialized programming languages (e.g. based on graphical models) in which to express these models.
Statistics – Its a graphical branch of mathematics(In my personal opinion). It deals with the collection, analysis, interpretation, and presentation of masses of numerical data.
A small project for learning !!
- Machine Learning is more passionate about Predictions rather then Causality.
Within small group of AILabPage members and other AILabPage lab fellows we did some project for testing and learning purposes. Our goal was to perform similar task on available data set using traditional “Statistics Techniques” & “Machine Learning Techniques” and eventually see the the final results.
- Predictions and online services – We built data model for seamless transition from training to prediction. We targeted for online and batch prediction services use. Integration to Google global load balancing was the idea but for some reason and time shortage we did not used it. Still we manage to achieve this.
- Got predictive analytics model which is scalable.
- Able to do good demonstration of the promise by leveraging statistics breakthroughs.
- Designed and used the artificial neural network architecture to make model transparent and easy to debug without RNN of-course. It took longer time then usual. We managed and development fully working model.
After developing an experimental machine learning model using TensorFlow via cloud machine learning engine, we achieved 73% accuracy in predictions. I have to confess we did used neural-network as well. What are the other options to enables us to automatically scale our machine learning application, and reach our ultimate goal. If you have any; may you suggest in comment box below please.
Points to Note:
All credits if any remains on the original contributor only. We have covered all basics around data analytics for digital marketing analytics chapter-1. In next upcoming chapters will talk about implementation, usage and practice experience for markets.
Books + Other readings Referred
- Research through open internet, news portals, white papers and imparted knowledge via live conferences & lectures.
- Lab and hands on experience of @AILabPage (Self taught learners group) members.
Feedback & Further Question
Do you have any questions about AI, Machine Learning, Data Science or Big Data Analytics? Leave a question in comment or ask via email . Will try best to answer it.
Conclusion -With the rise of interest in Machine Learning there are a couple of different perspectives out there around the similarities between them. One go from general to specific conclusion and vice versa but as matter fact the two disciplines cant be divorced. Better known as two sides of the same coin. They represent two key aspects of data science that should become integrated in the long run. So the Statistical Machine Learning may come up soon. Statistics departments can not run without people without programming skills. Therefore it seems reasonable to include computer science classes in a statistics curriculum. They’re taught the same way, using the same books, using the same mathematics. It depends upon data and research objective to choose the research methodlogy either as inductive or deductive methods.
============================ About the Author ===========================
Read about Author at : About Me
Thank you all, for spending your time reading this post. Please share your feedback / comments / critics / agreements or disagreement. Remark for more details about posts, subjects and relevance please read the disclaimer.