Statistics and Machine Learning – Do Statistics and ML walk together like partners?. Is Machine learning just a polished or glorified version of statistics?. There are many questions such as this, at least in my mind. Even today I get these questions from many of my fellow lab members. Why so many people still struggle with these two in different areas for their day to day work and carry confusion in their mind.
Is Machine Learning a Computerised Version of Statistics?
Is Machine learning a computerised or glorified version of statistics?. Answer to this question is simple which is “NO” (In my personal opinion). To my understanding, they both complement each other and work as partners. The two friends from school days just crossing each other on some occasions.
There was also the “Statistical Modeling: The Two Cultures” paper by Leo Breiman in 2001. This paper argued that statisticians rely too heavily on data modelling, and that machine learning techniques are making progress by instead relying on the predictive accuracy of models.
Are the terms like Statistics and Machine Learning synonyms?
Is there any difference between Statistics and Machine Learning jargons?
Let us not bring Data Mining in our minds here. The focus is only on these unfriend friends moreover in my personal opinion DM is not related here. Just to remind you some time back people also called Blockchain a glorified and polished version of swarm intelligence which is not the case in anyway.
Statistics and Machine Learning
We have a classic example from linear regression (LR) which was developed in the field of statistics. Linear regression is studied as a model for understanding the relationship between input and output numerical variables. Now it has been borrowed by machine learning algorithms. It is both a statistical algorithm and a machine learning algorithm.
Though there are some methodological differences between machine learning and statistics those really don’t divorce these two friends at all. The difference between the two is that machine learning emphasizes optimization and performance, while others are concerned about sample, population and hypothesis. Machine learning is more concerned with making predictions, even if the prediction can not be explained well.
How to Lie with Statistics is a book written by Darrell Huff in 1954 presenting an introduction to statistics for the general reader. He was not a statistician but was a journalist who wrote many “how to” articles as a freelancer.
Roles – Machine Learning, Statistics and Computer Science
- Machine Learning – Optimize a performance criterion using example data or past experience.
- Statistics – Inference from a sample data and stress on hypothesis.
- Computer science – Efficient algorithms to solve the optimization problem and representing & evaluating the model for inference.
To be a better Data Scientist I should have started as Statistician
Though its not entirely true but upto some extent Statistics can be called as graphical branch of mathematics (In my personal opinion). As per wiki statistics is “The practice or science of collecting and analysing numerical data in large quantities, especially for the purpose of inferring proportions in a whole from those in a representative sample.”
It is the science of data which deals with the collection, analysis, interpretation, and presentation of masses of numerical data. In general, Statistics talk about more of human language and touch. Like Statistician would say prediction coming out of variable Y is true only in case what variable a, b, c and I are saying is true.
Machine learning is a much younger discipline of the two. It is built on the foundations of statistics. It has managed to absorb much of statistics philosophy and many techniques over the years. This focuses on data that is too complex for humans to figure out its meaningful regularities. Random samples are not good enough for the task here.
Statistics vs Machine Learning
When the model is projected as simulation this is where statistics and computation seem to converge beautifully. In short, statistics knows very well that In today’s world it would be silly to ignore the massive amounts of data available for analysis. And statistics alone can not do much.
Handling the humungous amount of data it needs the support of Machine Learning or computer power. For analysing massive data one needs to worry about storage and computation most likely computation in a distributed manner.
Approximate Bayesian Computation – ABC
In statistics, a task is called “ABC” to perform inference over some variables. Like weather forecasting which is an area where all variables have clear semantic interpretations. It is the task of the statistician to perform inference over these variables. However, it demands a complex skill set and requires careful thought on where and how to spend the available computation power.
The same inference task in machine learning when gets solved known as “probabilistic programming”. In addition, develops specialized programming languages (e.g. based on graphical models) in which to express these models.
Statistics – Its a graphical branch of mathematics(In my personal opinion). It deals with the collection, analysis, interpretation, and presentation of masses of numerical data.
A small project for learning !!
- Machine Learning is more passionate about Predictions rather than Causality.
Within a small group of AILabPage members and other AILabPage lab fellows, we did some project for testing and learning purposes. Our goal was to perform a similar task on available data set using traditional “Statistics Techniques” & “Machine Learning Techniques” and eventually see the final results.
- Predictions and online services – We built a data model for a seamless transition from training to prediction. We targeted for online and batch prediction services use. Integration to Google global load balancing was the idea but for some reason and time shortage, we did not use it. Still, we manage to achieve this.
- Got predictive analytics model which is scalable.
- Able to do a good demonstration of the promise by leveraging statistics breakthroughs.
- Designed and used the artificial neural network architecture to make model transparent and easy to debug without RNN of-course. It took a longer time than usual. We managed and development fully working model.
After developing an experimental machine learning model using TensorFlow via cloud machine learning engine, we achieved 73% accuracy in predictions. I have to confess we did use neural-network as well. What are the other options to enables us to automatically scale our machine learning application, and reach our ultimate goal? If you have any; may you suggest in the comment box below, please.
Points to Note:
All credits if any remains on the original contributor only. We have covered all basics around data analytics for digital marketing analytics chapter-1. In the next upcoming chapters will talk about implementation, usage and practice experience for markets.
Books + Other readings Referred
- Research through open internet, news portals, white papers and imparted knowledge via live conferences & lectures.
- Lab and hands-on experience of @AILabPage (Self-taught learners group) members.
Feedback & Further Question
Do you have any questions about AI, Machine Learning, Data Science or Big Data Analytics? Leave a question in a comment or ask via email. Will try best to answer it.
Conclusion -With the rise of interest in Machine Learning there are a couple of different perspectives out there around the similarities between them. One goes from general to a specific conclusion and vice versa but as a matter fact, the two disciplines cant be divorced. Better known as two sides of the same coin. They represent two key aspects of data science that should become integrated into the long run. So the Statistical Machine Learning may come up soon. Statistics departments cannot run without people without programming skills. Therefore it seems reasonable to include computer science classes in a statistics curriculum. They’re taught the same way, using the same books, using the same mathematics. It depends upon data and research objective to choose the research methodology either as inductive or deductive methods.
============================ About the Author ===========================
Read about Author at : About Me
Thank you all, for spending your time reading this post. Please share your feedback / comments / critics / agreements or disagreement. Remark for more details about posts, subjects and relevance please read the disclaimer.