Machine Learning

Everything You Need To Know About Unsupervised Learning

Unsupervised learning helps to find hidden jewel in data by grouping similar things together. Data have no target attribute. Algorithm takes training examples as the set of attributes/features alone. In this post I have summarise my whole upcoming book “Unsupervised Learning – The Unlabeled Data Treasure” in one page. This one pager guide is to know everything about unsupervised learning on high level. For details wait for the book release in June-2019.

 

Unsupervised Learning Demystified

Unsupervised Learning; is one of three types of machine learning i.e. Supervised Machine Learning, Unsupervised Machine Learning (UML) and Reinforcement Learning. The most common method in UML is cluster analysis. Cluster analysis is used for exploring hidden patterns or grouping in data behind data analysis. Algorithm used in this to draw inferences from datasets consisting of input data without labels. In short UML is

  • A technique with the idea to explore hidden gems / patterns.
  • To find some intrinsic structure in data.
  • That something can’t be seen with naked eye requires magnifier (UML)

Unsupervised Learning.png

In UML systems are not trained by feeding it with the intended answers unlike what we do in supervised learning. Rather we just allow algorithms to infer patterns from a dataset without reference to known, or labeled, outcomes. There are mainly 2 ways we achive unsupervised learning goals

  • Clustering – Partitions data into distinct clusters based on distance to the centroid of a cluster
  • Association – The association rules are  used to discover interesting patterns.

UML can’t be applied to a regression or a classification problems as there is no idea what the values for the output data might be. Unsupervised learning algorithms get trained completely differently compared to supervised learning. Instead algorithms here works as secret agents (yah may be 007 style)  for discovering the underlying structure of the data.

 

Clustering The Data

Clustering allows grouping of data points i.e. automatically split the dataset into groups according to similarity. Algorithms in this technique are based on one principal which is  similarity / dissimilarity.

When clustering algorithm is used it classify the data points in groups with similar properties & features and underline them as common reason to group. So each group has data points with similarity while intra-groups feature and properties are dissimilar to each other.

Everything You Need To Know About Unsupervised Learning

Because of reason above it often overestimates the similarity between groups. Overestimates brings poor quality thus bad results. While clustering may work well for customer segmentation but do poorly on targeting.

 

Common Clustering Algorithms

  1. K-Means Clustering: The most common and well know clustering algorithm. Super simple to understand, code and implement. It starts from the centre points of vectors of the same length. Its pretty fast as well.
  2. Mean-Shift Clustering: This algorithm works with baby step strategy. It’s a sliding-window-based algorithm for finding its dense areas of data points. It gives freedom from selecting the number of clusters as it automatically discovers.
  3. DBSCAN Clustering: It’s a density based spatial clustering of applications with noise algorithm. Based on density and starts with an arbitrary data point that has not been visited. It does not need a pre-set number of clusters.
  4. Expectation Maximization : This clustering method uses Gaussian Mixture Models (GMM). To distribute GMM parameters for each cluster randomly it starts by selecting the number of clusters. GMMs are a lot more flexible in terms of cluster covariance and  can have multiple clusters per data point.
  5. Agglomerative Hierarchical Clustering: Cluster tree is build with multilevel hierarchy of clusters. No assumptions on the number of clusters
    • Agglomerative – In this technique its start with the points as each cluster as it move forward; at each step, merge the closest pair of clusters until only one cluster left.
    • Divisive – Here its start with one, all-inclusive cluster. At each step, split a cluster until each cluster has a point.

 

Association Mining In Data

In contrast with sequence mining, association rule learning typically does not consider the order of items either within a transaction or across transactions. Association rules mining is another key unsupervised data mining method, after clustering, that finds interesting associations (relationships, dependencies) in large sets of data items.

The association rule mining is used in unsupervised scenarios to discover interesting patterns. It also gets used in supervised learning as well. It identifies sets of items that often occur together. Data mining transform data into useful information.

E-commerce companies used this commonly and on almost every day basis. Its used for shopping cart or basket analysis. It is very helpful to analysts to discover bundle goods often purchased at the same time and to develop more effective marketing strategies.

Association is just a frequently appeared patterns over large transactional databases. I can suggest comparative study of the following algorithms for advance learners in data mining.

  • Apriori Algorithm
  • Frequent Pattern (FP) Growth Algorithm
  • Rapid Association Rule Mining (RARM)
  • ECLAT Algorithm
  • Associated Sensor Pattern Mining of Data Stream (ASPMS) – For greater needs of  frequent pattern mining algorithms.

Common Algorithms in Unsupervised Learning

An Angle for Unlabelled Data & Secret Labels

Unsupervised learning can be a challenging goal in itself. The training data consists of a set of input vectors x without any corresponding target values; hence known as learning / working without a supervisor.

System does self-discovery of patterns, regularities and features etc. from the input data and relations for the input data over output data. Discovering similarities and dissimilarities to forms clusters i.e. self-discovery is main target here.

Examples given to the learner are unlabeled, there is no error or reward signal to check a potential solution. Since no labels are given to the learning algorithm, leaving it on its own to find structure in its input. This distinguishes unsupervised learning from supervised learning and reinforcement learning.

  • Pros
    • It can detect what human eyes can not understand
    • The potential of hidden patterns can be very powerful for the business or even detect extremely amazing facts, fraud detection etc.
    • Output can decide the un explored territories and new ventures for businesses. Exploratory analytics can be applied to understand the financial, business and working drivers behind what happened.
  • Cons
    • As seen in above explanation unsupervised learning is harder as compared to supervised learning.
    • It can be a costly affair, as we might need external expert look at the results for some time.
    • Usefulness of the results; are of any value or not is difficult to confirm since no answer labels are available. 

 

Guarantee to no guarantee

What is guaranteed in unsupervised learning is; there is no guarantee or assurance that after so much of efforts and hard work of massaging the data we will find anything inspiring or something useful in data.

Since outcomes is known thus there is no way to decide accuracy of it. This makes supervised machine learning more applicable to the real-world problems. The best time to use unsupervised machine learning is when you don’t have data on desired outcomes, like determining a target market for an entirely new product that your business has never sold before.

 

Why is Unsupervised Machine Learning important?

One of the biggest advantage of unsupervised machine learning methods are reusability for other learning method. The patterns uncovering & detection with unsupervised machine learning methods comes in handy when implementing supervised machine learning.

  • Anomaly Detection-  This is the key feature for automatic discovery of unusual data points in a given dataset. As shown in above picture the outlier can pinpoint fraudulent transactions/activities. Discovering faulty pieces of hardware or identifying an outlier caused by a human error during data entry are also seen here.
  • Latent Variable Models- The data preprocessing happens in every business every day and most of the time too much similar data with same features. Latent variable modelling helps in performing dimensional reduction i.e. reducing the number of features in dataset  or decomposing the dataset into multiple components.

 

Some of  Use Cases

Unsupervised learning is used to find anomalies in data or cluster data items to groups that humans can’t assume themselves.  Since output variables are unspecified here so algorithms looks for structures in the data to describe and hidden distribution or structure of data. Some of the examples here are.

  • Customer segmentation in different groups for specific interventions
  • Product Targeting
  • Market Categorisation
  • Recommendation Engines

 

Points to Note:

All credits if any remains on the original contributor only. We have covered Unsupervised machine learning in this post, where we find hidden gems from unlabelled historical data. Last post was on Supervised Machine Learning. In the next upcoming post will talk about Reinforcement machine learning.

 

Books & Other Material Referred

  • Open Internet & AILabPage (group of self-taught engineers) members hands-on lab work.
  • Book “Artificial Intelligence: A Modern Approach”

Feedback & Further Question

Do you have any questions about Deep Learning or Machine Learning? Leave a comment or ask your question via email. Will try my best to answer it.

 

Unsupervised LearningConclusion –  Collecting and labelling a large set of sample patterns can be very expensive. How this type of learning helps business to see some potentials which is usually hidden normally. The goal in such unsupervised learning problems may be to discover groups of similar examples within the data, where it is called clustering, or to decide how the data is distributed in the space, known as density estimation.

 

============================ About the Author =======================

Read about Author at : About Me

Thank you all, for spending your time reading this post. Please share your feedback / comments / critics / agreements or disagreement. Remark for more details about posts, subjects and relevance please read the disclaimer.

FacebookPage    ContactMe      Twitter         ====================================================================

Advertisements

1 reply »

Leave a Reply

This site uses Akismet to reduce spam. Learn how your comment data is processed.