Does Statistics and ML walk together like partners?. Is Machine learning just a polished and shined version of statistics?. There are many question such like this. At-least in my mind even today I get these questions. Why I struggle with these two in different areas of my mind?.
Are terms like Statistics and MachineLearning synonyms or is there any difference between these two jargons?.
Is Machine learning a computerised or glorified version of statistics?. Answer to this question is simple which is “NO” (In my personal opinion). To my understanding they both complement each other and work like partners. The two friends from school days just crossing each other on some occasions.
There was also the “Statistical Modeling: The Two Cultures” paper by Leo Breiman in 2001. This paper argued that statisticians rely too heavily on data modeling, and that machine learning techniques are making progress by instead relying on the predictive accuracy of models.
Lets not bring Data Mining in our minds here. The focus is only on these unfriend friends moreover in my personal opinion DM is not related here. Just to remind you some time back people also called Blockchain a glorified and polished version of swarm intelligence which is not the case any ways.
Machine Learning is more passionate about Predictions rather then Causality
Statistics and Machine Learning
We have classic example from linear regression (LR) which was developed in the field of statistics. Linear regression is studied as a model for understanding the relationship between input and output numerical variables. Now it has been borrowed by machine learning algorithms. It is both a statistical algorithm and a machine learning algorithm.
Though there are some methodological differences between machine learning and statistics but those really don’t divorce these two friends at all. The difference between the two is that machine learning emphasizes optimization and performance, while other is concerned about sample, population and hypothesis. Machine learning is more concerned with making predictions, even if the prediction can not be explained well.
How to Lie with Statistics is a book written by Darrell Huff in 1954 presenting an introduction to statistics for the general reader. He was not a statistician, but was a journalist who wrote many “how to” articles as a freelancer.
Its a graphical branch of mathematics(In my personal opinion). It deals with the collection, analysis, interpretation, and presentation of masses of numerical data. In general Statistics talk about more of human language and touch. Like Statistician would say prediction coming out of variable Y is true only in case what variable a, b, c and I are saying is true.
Machine learning is a much younger discipline of the two. It is build on the foundations of statistics. It has managed to absorb much of statistics philosophy and many techniques over the years. This focuses on data that is too complex for humans to figure out its meaningful regularities. Random samples are not good enough for the task here.
Roles for key stake holders
- Machine Learning – Optimize a performance criterion using example data or past experience.
- Statistics – Inference from a sample data and stress on hypothesis.
- Computer science – Efficient algorithms to solve the optimization problem and representing & evaluating the model for inference.
To be a better Data Scientist I should have started as Statistician
Statistics vs Machine Learning
When the model is projected as simulation this is where statistics and computation seem to converge beautifully. In short statistics knows very well that In today’s world it would be silly to ignore the massive amounts of data available for analysis. And statistics alone can not do much.
Handling humungous amount of data it needs support of Machine Learning or computer power. For analysing massive data one needs to worry about storage and computation most likely computation in distributed manner.
Approximate Bayesian Computation – ABC
In statistics a task is called “ABC” to perform inference over some variables. Like weather forecasting which is an area where all variables have clear semantic interpretations. It is the task of the statistician to perform inference over these variables. However it demands complex skill set and requires careful thought on where and how to spend the available computation power.
The same inference task in machine learning when gets solved known as “probabilistic programming”. In addition develops specialized programming languages (e.g. based on graphical models) in which to express these models.
Statistics – Its a graphical branch of mathematics(In my personal opinion). It deals with the collection, analysis, interpretation, and presentation of masses of numerical data.
A small project for learning !!
Within small group of friends we did some project for testing and learning purposes. Our goal was to perform similar task on set of data using traditional statics techniques and machine learning and eventually see the the final results results.
After developing an experimental machine learning model using TensorFlow via cloud machine learning engine, we achieved 73% accuracy in predictions. I have to confess we did used neural-network as well.
Our Next Goal – Predictions and online services – Any Suggestion for
- To get predictive analytics model which is scalable.
- Able to do good demonstration of the promise by leveraging statistics breakthroughs.
- Architecting neural nets to make them transparent and easy to debug without RNN of-course that would require some time and further development.
We are looking forward to build our own model for seamless transition from training to prediction. We are targeting to use online and batch prediction services. Integration to Google global load balancing is one way to achieve this. What are the other options to enables us to automatically scale our machine learning application, and reach our ultimate goal. If you have any; may you suggest in comment box below please.
Disclaimer – All credits if any remains on the original contributor only.
Conclusion -With the rise of interest in Machine Learning there are a couple of different perspectives out there around the similarities between them. The two disciplines will never and can not divorce. Better known as two sides of the same coin. They represent two key aspects of data science that should become integrated in the long run. So the Statistical Machine Learning may come up soon. Statistics departments can not run without people without programming skills. Therefore it seems reasonable to include computer science classes in a statistics curriculum. They’re taught the same way, using the same books, using the same mathematics.
================ About the Author ===================
Read about Author at : About Me
Thank you all, for spending your time reading this post. Please share your feedback / comments / critics / agreements or disagreement. Remark for more details about posts, subjects and relevance please read the disclaimer.