Big Data – Last Sunday I was at big retail store in Harare. It was a busy day because it was month end and people got paid. Grocery shopping was in full swing, I also bought some groceries for my self. When I was in the queue for payment and collection, I saw almost every one making payment either by swiping the magic plastic card or struggling on their mobile handset by punching few numbers etc. AILabPage team did some brainstorming session on same. Below post is around the same.
Cash Payments vs Electronic Payment – Source of Big Data
A smart BigData factory should take smart approach to costly, sensitive, critical asset and maintenance for data management.
During our in lab brain storming session what was evident was;
- The electronic payment queue moves faster compared to the cash payment queue.
- Card swiping and mobile payments generating huge amount of data.
- The scene in store looks like data production factory running restlessly.
- What was happening in that store besides the payments?
- Data, More Data, Lots of Data so called BIG DATA was getting generated.
Now if we relook at above points to think about information security and customer privacy. We will realise that without the right Information-Security and encryption solution in place; big data is a very big problem as well. What AILabPage propose and advocate for is DataIntelligence with DataSecurity as prime goals for any data factory.
What is Big Data?
“What is Big Data”. I am sure most of us knows the answer already; “A term used for huge amount of Digital Data which is mostly in unorganised and unstructured format because it is captured from different sources”.
Big Data is so big that it makes it difficulty to analyse. For instance card holder data should be managed in highly secured data vault, using multiple encryption keys with split knowledge and dual/triple control.
Artificial intelligence and its bundle technologies/techniques helped data security work and process easy. AI is able to map dummy data to real data to mislead the hackers. Data thief would not be able to make use of information stolen from a database without also having multiple level of keys.
Big data presents a tremendous opportunity for enterprises across multiple industries especially in the tsunami like data flow industries i.e. Payments and Social media.
The data we generate as customers or user on various platforms are kept save and used at the data collector discretion. As consumer do we have any control over it (remember Mark Z statement “If you are not paying for the product then you are the product”). Some of the questions are
- Who owns this data?
- What is the use of this data?
- How secure is this data?
FinTech, Social Media, InsureTech, MedTech are major data generating industries i.e massive group of factories. The cost of any business is high in today’s competitive environment. In-order to keep any business alive, its important to make best use of Data but point to note here is ethical use of personal & private data is a part of data science engineering.
Data from Google shows technology based innovative insurance companies pays $0.60-$0.65 dollar in claims against each one dollar premium, with the rest covering costs of admin, marketing and reinsurance.
Information Security, Big Data and Artificial Intelligence
My payment data with all my sensitive information is it secured and in safe hands? What about privacy of my sensitive information?. Thousands of questions started spinning my head. There is a massive scope of big data security. This presents a significant opportunity for disruption. With improvements in technology which anyways happening every day without demand and this will bring reduction in each of these cost items.
More startups are coming in to disrupt this massive and antiquated industry. Artificial intelligence helping in reducing underwriting risk using big data and machine learning; also offer secure data migration to the secured data vaults. Automating policy administration, & claims pay out to bring big smile on customer face, improving distribution via marketplaces. The wide variety of data volumes generated by FinTech, InsureTech and MedTech is inspiring for data scientists (I simply love this and would feel very happy to play with it if I ever get access to this), executives, product managers and marketers.
Leveraging on data from different platforms i.e. CRM platforms, spreadsheets, enterprise planning systems, social media feeds like facebook, twitter, instagram, linkedin, company website feed section, any video file, and any other source. Thanks to mobile devices, tracking systems, RFID, sensor networks, Internet searches, automated record keeping, video archives, e-commerce, etc. -coupled with the more information derived by analyzing all this information, which on its own creates another enormous data set.
Data Sources and Data Security
New data sources create more data which requires new processes to handle and that in turn creates new security vulnerabilities. Over-the-air provisioning of payment credentials and applications are main reason and biggest example of potential for attackers to create vulnerable vectors for unwanted listeners to steal and misuse customer data. It is basically a process of infectious interaction with the people and their environment.
In this era of dynamic digital payments and connected financial services, all businesses need to embrace evolution for growth and continuity. This isn’t just about one source of data from one Payments system. The data sources should be as many as possible to exploit full advantages of big data, companies has to leverage on various forms of data, including structured data in a range of heterogeneous applications & databases and unstructured data that comes in a number of file types.
The data warehouse can help in making more informed decisions, plans for growth, discover & design roadmap for new opportunities, opportunity for optimization, and to deliver breakthrough innovations.
Big Data in FinTech and InsurTech
Today, we don’t know where new data sources may come from tomorrow, but we can have some certainty that there will be more to be contend with and more diversity to accommodate. Big data factories operating and pursuing analytics these days because it can be revelatory in spotting business trends, improving research quality, and gaining insights in a variety of fields, from FinTech to InfoTech to InsureTech to MedTech to law enforcement and everything in between and beyond.
Big data frameworks powered by Hadoop, Tera-data, MongoDB, NoSQL, or another system—massive amounts of sensitive data may be managed at any given time. Big data is the term for a collection of data sets so large and complex that it becomes difficult to process using on-hand database management tools or traditional data processing applications.
Sensitive assets don’t just live on big data nodes, but they can come in the form of system logs, configuration files, error logs, and more. The environment of data generation itself has it’s own challenges including capturing, curation, storage, searching, sharing, transferring, analysis, and visualization methods. Sources can include personal identifiable information, payment card data, intellectual property, health records, and much more.
Data Analytics with AI Techniques.
Out of 3 well known machine learning (a subset of AI) types unsupervised learning is most unused or least used technique. As in Unsupervised Machine Learning (UML).
- The idea to explore data is to look for hidden gems / patterns.
- To find some intrinsic structure in data.
- Something cant be seen with naked eyes requires magnifier (UML)
In unsupervised learning available data have no target attribute. Machine Learning Algorithm takes training examples as the set of attributes/features alone. The most common unsupervised learning method is cluster analysis at the same time two general strategies in UML includes:
- Clustering – Partitions data into distinct clusters based on distance to the centroid of a cluster
- Hierarchical Clustering – Cluster tree is build with multilevel hierarchy of clusters. No assumptions on the number of clusters
- Agglomerative – In this technique its start with the points as individual clusters as it move forward; at each step, merge the closest pair of clusters until only one cluster left.
- Divisive – Here its start with one, all-inclusive cluster. At each step, split a cluster until each cluster contains a point.
System does self-discovery of patterns, regularities and features etc. Discovering similarities and dissimilarities to forms clusters i.e. self-discovery is main target here. Since the examples given to the learner are unlabelled, there is no error or reward signal to evaluate a potential solution. This distinguishes unsupervised learning from supervised learning and reinforcement learning.
The purpose of unsupervised learning is to attempt to find natural partitions in the training set. To AILabPage UML is most appealing machine learning type and we consider this as treasure. Unsupervised learning is main strength of AILabPage team.
We should not forget one biggest rule/law here, The data generations sources need to be secured to address security policies and compliance mandates along with data it self.
Points to Note:
All credits if any remains on the original contributor only. We have covered all basics around data analytics for digital marketing analytics. In next upcoming chapters will talk about implementation, usage and practice experience for markets.
Books + Other readings Referred
- Research through open internet, news portals, white papers and imparted knowledge via live conferences & lectures.
- Lab and hands on experience of @AILabPage (Self taught learners group) members.
Feedback & Further Question
Do you have any questions about AI, Machine Learning, Data Science or Big Data Analytics? Leave a question in comment or ask via email . Will try best to answer it.
Conclusion – In short, big data has transformed Artificial Intelligence, to an almost unreasonable level. Blockchain technology could transform AI too, in its own particular ways but for now thats for my next post. I guess my analysis is reasonable but conclusion at this time might be a bit pre-mature.
The clearing houses gets real-time payment facts to apply their ability also have a vision beyond their current rails and the pockets to support it. Further, more data sources are added all the time. Start and grow businesses and prevent frauds by taking security before innovation. Transactions and data generated out of them will then be safe, quick and easy.
====================== About the Author =================================
Read about Author at : About Me
Thank you all, for spending your time reading this post. Please share your feedback / comments / critics / agreements or disagreement. Remark for more details about posts, subjects and relevance please read the disclaimer.
Categories: Big Data