Artificial Intelligence Machine Learning

Machine Learning – It all Boils Down to the Training Data

ByV Sharma

Training Data – There is a famous punch line about Data. “Data is not good enough if it’s not quality data”. If your data model is not working or performing as expected blame your data and data source. Instead of struggling to find an opportunity for performance tuning look for improving data quality fed into the model.

Table of Contents

Machine Learning – Basics

AILabPage defines machine learning as “A focal point where business, data, experience meets emerging technology and decides to work together”.

ML instructs an algorithm to learn for itself by analyzing data. Algorithms here learn to map the input to output (supervised learning). Detection of patterns (unsupervised learning) or by reward/punishment (reinforcement learning). The more data it processes, the smarter the algorithm gets.

In other words, Machine learning algorithms “learn” from the observations. When exposed to more observations, the algorithm improves its predictive performance. You can follow the below post for more basic details about Machine Learning.

Machine Learning – How What and Why

Thanks to statistics, machine learning became very famous in the 1990s. Machine Learning is about the use and development of fancy learning algorithms. The intersection of computer science and statistics gave birth to probabilistic approaches in AI. This shifted the field further toward data-driven approaches. Data science is more about the extraction of knowledge (KDD) from data through algorithms to answer a particular question or solve particular problems.

It All Boils Down to the Training Data

Data factory setup, what comes in, what does not, and continuous quality checks are the most crucial steps for any business to look at for initial setup. Since the success of any data model highly depends upon training data. Algorithms used are mostly off the shelf and picking the correct algorithm is another challenge. After sorting data, data model, and choice of algorithm, our last challenge is creating high-quality data flow pipe and check gates.

The digital transformation of any modern business requires quality data, not just any data. The three key success factors for any good business which help it to grow with the correct digital marketing drive are

Quality of Data collected.
Data Scientists.
Tools to visualise, analyse and summarise the data.

So if Data is the new fuel of today’s time then we must accept data scientists as oil refineries and data tools as important ingredients which help to refine and produce desired results. Cognitive Analytics provides a 360-degree view of them to make the correct decision and the right time. The out of Big Data Analytics paints an excellent picture in the below categories.

Descriptive Analytics
Diagnostic Analytics
Predictive Analytics
Prescriptive Analytics

Continuous and consistently feeding high-quality data to train the algorithm of choice for business applications is another challenging part of machine learning. In short, what we are saying is “Building an ecosystem to choose, feed, train, choose a correct algorithm and label/categorize data that gets integrated with data models” is all we need for machine learning. This is the core idea behind machine learning.

Model Regularisation – Poor Performance of Models

Algorithms work well for many applications but can suffer from the problem of overfitting / under-fitting. Regularisation is needed to overcome an algorithm that has a high variance on any side. Data set for prediction or classification problems is always considered to be critical as accuracy becomes make or break point. The implementation happens in two parts

First implementing a design model on training data set
Secondly testing the accuracy of training data set.

Looking at the outcome i.e. quality of accuracy forces the data scientist to decide whether to increase the accuracy or decrease it by playing around with data feature selections i.e. feature engineering. Poor performance can be because of 2 reasons.

Quality of data needs to be re-looked at
The too simple or complex model chosen

Not always getting our desired results that are the word of caution here as we might get poor results as well. That’s machine learning for us.

Data as Greatest Natural Resource – Data Intelligence

Data generation sources like social media as 1st and winners are doing an excellent job. 2nd to this is payment data which is as big as social media or the Western world. Payments on mobile for e-commerce, online food orders, etc are almost 30 – 50 times more than in the U.S. as in Africa and Asia combined above. Off-course all this data is quality data for making more money as well as to improve the user experience. Data is also used as a yardstick for comparing algorithms.

So coming back to our core discussion point which is the importance of training data in machine learning which we also call learning by choosing the best algorithm. Unless the data fed for training is not of correct quality and standards then the machine will only give us garbage and there would be no machine learning but it would be machine spoiling may be. Machines execute algorithms on data as a fixed sequence of steps, upon execution of its task machine can evaluate and track the performance of the best algorithm. Machines’ performance gets an increase over time with

More and more quality
As soon as machine stumbles upon the algorithm it needed.

This appears to the outside world as if the machine is gradually learning over time to master the task it has been assigned to.

Let’s take an example to demystify our jargon above – In the case of scanning email for spam and not spam. The filter process the email and tag it as SPAM or NOT SPAM. The algorithm behind this picks words like “Lotto, Free, Casino, Next of Kin” etc. Here more emails get processed by the filter the stronger it gets as it simply does mapping between the input to output. In the same case data, it starts getting “chocolate, candy, sweet or love” etc you can imagine the performance. Machine learning can diligently evaluate millions of keywords or word list variants to pick the one that most accurately detects spam.

Points to Note:

All credits if any remain on the original contributor only. We have covered all basics around data models or the importance of quality data and training data. The next upcoming post will talk about implementation, usage, and practice experience for markets.

Books + Other readings Referred

Research through open internet, news portals, white papers and imparted knowledge via live conferences & lectures.
Lab and hands-on experience of @AILabPage (Self-taught learners group) members.

Feedback & Further Question

Do you have any questions about AI, Machine Learning, Data Science or Big Data Analytics? Leave a question in a comment section or ask via email. Will try best to answer it.

Machine Learning (ML) - Everything You Need To Know

Conclusion -With the rise of interest in Machine Learning there are a couple of different perspectives out there around the similarities between Statistics and ML. One goes from general to the specific conclusion and vice versa but as a matter fact, the two disciplines cant be divorced. Better known as two sides of the same coin. They represent two key aspects of data science that should become integrated into the long run.

So the Statistical Machine Learning may come up soon. Statistics departments cannot run without people without programming skills. Therefore it seems reasonable to include computer science classes in a statistics curriculum. They’re taught the same way, using the same books, using the same mathematics. It depends upon data and research objective to choose the research methodology either as inductive or deductive methods.

============================ About the Author =======================

Read about Author at : About Me

Thank you all, for spending your time reading this post. Please share your opinion / comments / critics / agreements or disagreement. Remark for more details about posts, subjects and relevance please read the disclaimer.

FacebookPage ContactMe Twitter

====================================================================

By V Sharma

A seasoned technology specialist with over 22 years of experience, I specialise in fintech and possess extensive expertise in integrating fintech with trust (blockchain), technology (AI and ML), and data (data science). My expertise includes advanced analytics, machine learning, and blockchain (including trust assessment, tokenization, and digital assets). I have a proven track record of delivering innovative solutions in mobile financial services (such as cross-border remittances, mobile money, mobile banking, and payments), IT service management, software engineering, and mobile telecom (including mobile data, billing, and prepaid charging services). With a successful history of launching start-ups and business units on a global scale, I offer hands-on experience in both engineering and business strategy. In my leisure time, I'm a blogger, a passionate physics enthusiast, and a self-proclaimed photography aficionado.

Artificial Intelligence FinTech Physics

9 thoughts on “Machine Learning – It all Boils Down to the Training Data”

kplovely says:

at

thanks for sharing nice information and nice artical and very usefulll infroamtion…..

Loading...

Reply
Machine Learning Training in Hyderabad says:

at

Please more of these great articles. I like the way you convey ideas in a simple way that’s easy to understand. Thanks!

Loading...

Reply
Machine Learning Training In Hyderabad says:

at

thanks for sharing nice information……

Loading...

Reply
Data Science Training in Hyderabad says:

at

Machine Learning Training in Hyderabad

Informative post. Concept has been explained very well.Looking forward for such informative posts

http://www.analyticspath.com/machine-learning-training-in-hyderabad

Loading...

Reply
Machine Learning Training in Hyderabad says:

at

Was in search for this information from a long time. Thank you for such informative post. Looking forward for more of such informative postings.

Loading...

Reply
Labelbox says:

at

Is your model not performing well? Try digging into your data. Instead of getting marginal improvements in performance by searching for state-of-the-art models, drastically improve your model’s accuracy by improving the quality of your data.

Loading...

Reply
360digitmg says:

at

360DigiTMG offers the best Data Science course for beginners. Get trained by expert trainers from IIT, IIM, and ISB.
best data science institute in hyderabad

Loading...

Reply
Will Douglas Heaven says:

at

Thanks for sharing. I read many of your blog posts, cool, your blog is very good.

Loading...

Reply
Cynthia Binance Ferrow says:

at

I don’t think the title of your article matches the content lol. Since most data scientists are adapting off-the-shelf algorithms to specific business applications, one of the most difficult challenges that data scientists face today is creating a continuous workflow that consistently feeds high-quality training data into their algorithms.

At the same time, your model is learning and you want to be able to leverage this intelligent model to label the rest of your data set. Building the infrastructure to do annotation that integrates with your model and managing the workflow is the most challenging part of machine learning.

Loading...

Reply