top of page
blog-header-01.png

Thought leadership, insights, and stories from Brightaira

Updated: Feb 12, 2022


ree

What is Big Data

Big data is a term that describes the large volume of data – both structured and unstructured – that inundates a business on a day-to-day basis. But it’s not the amount of data that’s important. It’s what organizations do with the data that matters. Big data can be analysed for insights that lead to better decisions and strategic business moves. Big Data refers to our ability to make sense of the vast amount of data that we generate every single second. In recent times, our world has become increasingly digitized, we produce more data than ever before. The amount of data in the world are simply exploding at the moment.

With the internet, more powerful computing and cheaper data storage helped to use data much better than ever before. Big Data means companies like Google can personalize our search results, Netflix and Amazon can understand our choices as a customer and recommend the right things for us. And we can use Big Data to even analyse the entire social media traffic around the world to spot trends.

Benefits of Big Data with AI

By bringing together big data and AI technology, companies can improve business performance and efficiency by:

  • Anticipating and capitalizing on emerging industry and market trends

  • Analyzing consumer behavior and automating customer segmentation

  • Personalizing and optimizing the performance of digital marketing campaigns

  • Using intelligent decision support systems fuelled by big data, AI, and predictive analytics

Realtime Examples of AI and Big Data in Business

Here are the examples of companies that use AI with Big Data and seen enormous success in their fields.

Case Study (a): Netflix – Big Data and AI Netflix uses AI and Big Data extensively and achieved great success as an organization. It has over 200 million subscribers around the world.

  • Generate Content: AI with big data helps Netflix in understanding consumers more and more granular level, thereby it helps Netflix to generate ‘content’ that matches the consumers taste to a large extent. Other competitors have a 40% success rate, whereas Netflix enjoys an 80% success rate.

  • Recommend Programmes: Netflix uses AI to recommend new movies and television programmes to consumers. 80% of what the consumers watch is driven by their AI recommendations. Netflix fine-tunes their algorithms in understanding the consumers and provides recommendations to the consumers about their programmes and movies.

  • Auto Generate Thumbnails: Netflix uses AI to auto-generate thumbnails. Consumers spend limited time choosing the films on seeing just the thumbnails for few seconds to minutes. Netflix understood the importance of thumbnails for consumers choosing their favourite programmes. Using Artificial Intelligence, thumbnails are generated dynamically based on the consumers’ interests.

  • Vary Streaming Speed: Netflix uses AI for Predicting the internet based on the consumers’ internet speed. AI algorithms help to scale up or scale down the streaming of movies based on the consumers’ real-time internet bandwidth.

  • Assist Pre-production: Netflix uses AI in pre-production activities. It helps to find location spots to shoot a movie (based on actors availability, actors location, etc)

  • Assist Post-production: Netflix uses AI widely in post-production activities as well. Although editing is manual, quality checks are driven by AI to avoid mistakes in post-production. There were several mistakes that happened due to negligence or lack of time, resources during post-production activities. But with the usage of AI algorithms, Netflix could eradicate these problems to a great extent.

Case Study (b): Disney (Theme Park and Cinemas) – Big Data and AI Disney uses Big Data and AI to give customers a more magical experience. Disney has always been a tech innovator in both Theme Park and in Cinemas to give the customer a wonderful experience.

  • Magic band: Disney offers magic band to its customers while they enter the theme park. Its kind of fitness watch which helps to open hotel room, allows the customers to pay. It has a GPS tracker in the band, which keeps tracking the customers where there are walking within Disney Theme Park. It is to ensure, where they are going within the park, which rides they are spending time, how much time they spend in restaurants.

  • Better Operational Management: It helps to schedule the workers to manage over crowding at one ride or at a single restaurant with in the park.

  • Better Customer Experience: Better management of crowd, giving proper assistance within the park gives the customer a better experience. They might direct the customers to other rides, other restaurants to avoid delay in one place.

  • Realtime Sentiment Analysis: Disney research team started using AI to understand real-time reactions when people watch in the live show or in the cinema. How they do is they are using ‘Machine Vision’ – AI coupled with a Camera, a night vision Camera looking at the audience. They do Sentiment analysis with the people in the show. Cameras will interpret the facial expressions by looking at how the people are responding to the shows or movies to see if they are sad, scared, having fun, etc. This would in turn help Disney to generate quality content based on the customers for their shows and movies.

Case Study (c): Big Data and AI with Motor Insurance Motor Insurance providers have started using AI with Big Data to provide a dynamic flexible insurance plan that will suit different customers based on their driving skills, ability and composure at different times.

  • Motor Insurance companies generally determine the premium based on the age of the vehicle. The insurance providers then started to understand the Customer based on how they drive by considering the age factor. This gave the perception a person aged 18 would drive rashly on comparing with a person aged 55 who will show maturity in driving.

  • Tracking Card: Motor Insurance providers started providing a tracking card to insert in the vehicle, which helps them to track and understand about the driving ability of the customer. This helped the provider to understand the customer better.

  • Mobile App: Now replacing the card with the mobile connected with GPS, it just needs the providers to install a mobile app within the customers mobile. This helps the providers to collect information about the customer driving. With the implementation of AI with Big Data, the providers can study the customer to a granular level. It helps the provider to understand how the customer is driving in a highway, during a rainy day, or on a hilly mountain road. Also, the question comes, they are people aged 18 who can drive better than the people with higher age. With the AI algorithms, over a period of time, the providers can understand each individual, how he is driving in the morning or in the late night, during a rainy day or during peak hours. Hence the data with the granular detail of the customer helps the Insurance providers to provide flexibility based on their driving skills not just merely on the age of the vehicle or the age of the customer.

Conclusion

It’s no hype that AI with big data are another set of high five technologies just to boast with for the IT giants. It has been used widely in several sectors and industries starting from big organizations to small business. The implementation of AI with Big Data in every industry has proved a great success and has helped the company business to a great extent. As said in the beginning, the world is exploding with data at the moment. Big Data with AI is really making sense of the huge data with the internet, more powerful computing and cheaper data storage.

 
 
 

Updated: Feb 12, 2022


ree

Disclaimer: The article does not assume that readers have a data science background and thus excludes and masks any complexities behind sentiment analysis or data science.

Opinion mining has reached its peak with the introduction of tools that facilitates sharing ideas and thoughts with the public. Although subjectivity of opinions affects how factual information is, sentiment analysis plays a huge role in studying a targeted group’s perception of a certain entity or event. To mention a few applications where sentiment analysis shines: Discovering a public event’s reaction, improving the customer satisfaction process, and studying a certain brand’s or an entity’s reputation. However, there’s a huge disconnection between the mentioned valuable applications and sentiment analysis, thus, I will try to connect the dots here and illustrate how sentiment analysis should fulfil business needs. Let’s start with a brief explanation of how sentiment analysis works and then move to satisfy the title’s claim.


Sentiment Analysis

Sentiment analysis as a part of natural language processing is the task of discovering a certain text’s emotional tone that is perceived by readers. It receives a text and outputs how positive, negative, or neutral it is. There are other categories as well that are used for sentiment analysis such as [“Angry”, “Sad”, “Happy”, “Excited”] or [1, 2, 3, 4, 5] similar to a rating that goes from 1 being very negative to 5 that is very positive, and so on. I have chosen to group the techniques in terms of their limitations and end results, which will fall into two groups.


Word-Level

Intuition

There are many words that we categorize conceptually as negative, positive, or neutral. And that’s the very first trials of sentiment classification in the literature that was born right after the outburst of subjectivity analysis (Detecting whether a text is opinionated or not) in the 1990s where the paper “Recognizing subjective sentences: a computational investigation of narrative text” has given a huge contribution to.


Short Overview

Word-level-based models at their core check whether the text has more positive words/phrases than negative words or vice-versa, and then classifies based on that. I won’t go deeper on how it does that as there are many well-known approaches such as looking at the language morphology of a word, using hand-crafted rules, automated “rules” through machine learning, looking at the semantics of words. But the important point to take is that it only operates at the word level and doesn’t go far with the whole text’s semantics. Now let’s see how that works straightforwardly by only focusing on one category: “Negative” Sentiment.

ree

Figure 2 — Translation: I told the cashier Khalid that I got the wrong order, and he said that he can’t change it, what a bad service!

The example is pretty simple here (“Wrong” & “Bad”) but what if it was negating a positive word like saying “Not Good” or “Not Correct”? Here we move to negation handling (Still word-level) where we check words surrounding a positive/negative word and see if they were negating their positivity/negativity.

ree

Figure 3 — Translation: I told the cashier Khalid that my order is not correct, and he said that he can’t change it. The service is not good at all!

This solves the problem of negation. However, what if we have different examples like this:

ree

Figure 4 — Translation: I got the wrong order but the cashier Khalid has solved my problem immediately

ree

Figure 5 — Translation: The pistachio latte’s taste is too bitter. Couldn’t finish it!!

Word-level-based approaches struggle with these kinds of examples where we have in Figure 4 a negative word that precedes “But” and then the negativity gets cancelled by “solved my problem” and turns into a positive text. Figure 5 on the other hand falls into a deeper issue where we have the word in Arabic “مر” that might refer to “Pass” or “Bitter” and it can only be resolved by using Arabic diacritics that not so many people use, or employing an extremely complicated parser. The two problems can be solved through the usage of context and semantics.


Context-Level

Intuition

Words are never independent in a text, each word can change the meaning or opinion of the whole text. Although some natural language processing tasks can run away from the burden of context inclusion (A deeper dive into the semantics of words and their “interactions”), sentiment analysis cannot.

Time-Line Summary

Many trials in the past used rule-based approaches along with word morphology in order to include some semantics, then a movement towards models that try to create groups of words that are similar and by that, documents/sentences will have multiple topics based on the words mentioned (Topic Modeling) where Latent Dirichlet Allocation in 2003 wins as the strongest contributor. After that, deep learning has taken a long course starting from word-level semantics where the star was Word2Vec by Tomas Mikolov through “Efficient Estimation of Word Representations in Vector Space” paper and then moving towards context-level semantics (Contextualized Embedding), until reaching to Transformers to solve many efficiency and quality issues. The basic idea is that there was a huge past where the byproduct is the introduction of models that cater for the context and semantics of words within documents (There’s a huge amazing work on interpreting gigantic deep learning architectures, so the idea that these models cannot be interpreted is not fully true especially when analyzing the core concept of transformers; Attention)

Onto a quick simple example whereby the model includes a contextual representation of text and can understand that the word “مر” is not “pass” but “bitter”.

Figure 6 — Translation: The pistachio latte’s taste is too bitter. Couldn’t finish it!!


ree

Sentiment Analysis and Business Value Disconnect

Disconnection

When we have millions of documents that could be coming from app store or google play comments for an app, google reviews for a place, complaints about a company, twitter region or hashtags tweets…etc. Applying sentiment analysis and getting 10% positive, 20% neutral, and 70% negative for an app or a Twitter hashtag let’s say, is basically useless due to the loss of connecting it to a certain topic. Knowing that some hashtag is too negative only tells you the what, not the why.

You might say that I’ll just filter the text by a keyword but that keyword was chosen by you, not the data! How many words are you going to account for? Are these words being used by customers? Heavily? The data (reviews, comments, tweets) should drive the process of deciding which aspects, or more elaborately, which collection of hundreds of keywords that you should look for. The key takeaway is that you need to know what the aspects are to know what exactly is so positive or negative about your place, app, Twitter marketing campaign, or generally speaking, your business, and then improve.


Connection

We (Brightaira) have researched this subject in order to solve this problem in a different methodology than what is well-known in the literature due to the following reasons:

  1. Scarce Arabic NLP literature

  2. Arabic NLP datasets are of low quality

  3. Arabic NLP base components-of-the-shelf have low quality

  4. Inherent domain-specificity for well-known algorithmic approaches in terms of practicality and generality

We have released our first Generalized Hybrid Aspect-Sentiment Detection and Tracking model which Figure-7 illustrates only its core capability (The model is integrated within Bloom System that is part of Customer-Success platform)

ree

Figure 7 — Translation: I told the cashier Khalid that I got the wrong order, and he said that he can’t change it, what a bad service!

One more thing to notice is that the sentiment has gone through multiple layers of indexing and statistical calculations in order to be served as a comparable metric to the CSAT Score used in Customer-Success Management. However, the aforementioned does not address the issue!


Deeper Dive !

We have discovered that aspects are also not enough. We want to know a very well fine-grained problem specification of the aspects given in Figure 7. What was bad about customer-service above is “Order Exchange” & “Wrong Order” that should be detected by looking at “cannot change it” (ما اقدر اغير) and “Wrong Order” (طلبي غلط). Hence, through a combination of contextualized modeling and graph theory (our first text representation layer to solve the issue), we are currently researching in fully connecting the dots until reaching the core of the problem where Figure 8 will elaborate:

ree

Figure 8 — Translation: I told the cashier Khalid that I got the wrong order, and he said that he can’t change it, what a bad service!

By that, Brightaira can now discover:

  1. What the total CSAT Score is for a business

  2. Why the total CSAT Score is as such

  3. How to change the CSAT Score

and automatically generate an actionable well-defined recommendation that fits our Decision-Making Platform.

 
 
 

Updated: Feb 12, 2022


ree

One of the life-changing decisions that you must have faced discomforting emotions about; is the career path you have to follow. You must have asked, what will happen if I chose this and it turned out to be not at all your interest, or you might have realized that after a couple of years. In this article, I want to focus on choosing the path of being a data scientist, what the other side of data science that is not very well-known to new joiners is, and what data and data science mean outside the scientific realm.

Dilemma of Choice

With the sudden peak in popularity that Harvard Business Review contributed to in 2012 where they have annotated “Data Science” as the sexiest job of the 21st century, businesses started looking for data scientists to employ (Even when they sometimes don’t need to). Consequently, ambitious students started joining this demand wave by choosing this path.

If you were to look up on Google now “Why should I learn data science”, you will find multiple reasons summarized as such: To become good at problem-solving, having a lucrative career path, or due to the very high market demand. These reasons are too broad, not exclusive to data science, never guaranteed, and there might as well be better alternatives. However, they are being repeated everywhere missing out on one main point, people will never be great at something unless they are fully devoted to it, and people popularizing data science unknowingly mask out some challenges that are necessary to be successful. Hence, the title of this article.

Concealed Side

There’s always a difficult side for any field, let’s elaborate on what kind of predicaments or challenges data scientists might face but are not usually well-known.

Reading, Reading, and Reading

Not so many people enjoy reading every day, some of them are new joiners to data science. Data science is about reading books, academic literature, articles, and so on. To bring great ideas that are truly valuable which can improve your output, you must read a ton of knowledge. Following data scientists on social media platforms, subscribing to research organizations’ email lists (My favorite email list is DeepAI), and always being up-to-date is a must, your eyes must be everywhere. Most of what you think about is a byproduct of knowledge you have been introduced to, so be sure to have an abundance of it.

Furthermore, you have a strong backup when trying to fix/detect programming errors, exceptions are raised, program crashes, the output is clearly wrong,…etc, not so much with “Theoretical Bugs”. These bugs are too good at hiding, and you will never catch them if you were not a dedicated reader, you must understand a great level of the inner workings of what you are aiming to apply. Theoretical Bugs sometimes get detected after days, weeks, months, or never; where the model’s true quality is nowhere near to what has been reported.

Living Under Uncertainty

Imagine working for a whole month on a project, then throw it all away, how would that make you feel? Many people cannot accept failure and never let go. They go into a spiral of bad performance or multiple trials of reviving a machine learning project that is already a lost cause. Data science is uncertain, and it will always be, that’s why it’s distinguished by the word science. Managers as well must understand this uncertainty. To lead a successful data science project that is unique and valuable, you have to accept failure and be the first person who supports the team as failure is not so easy to consume. To account for the risk of failure (For AI projects), I have briefly summarized some of the points that boost the probability of success or at least mitigate its failure:

  • Switch your data science jargon off and accurately define and communicate the business requirements

  • Heavy research in order to define the algorithmic approaches and model’s quality KPI that are in alignment with business needs (e.g. Based on these references, we’re confident to mark a > 85% accuracy as a KPI for use-case X)

  • Be clear with stakeholders about requirements & KPI’s. Communicate exactly what the quality metric means (Further information in the Communication section).

  • Choose at least 3-5 fallback approaches if the chosen first approach failed and make sure you have your timeline buffered for this.

  • Fail fast, and let go if there’s no hope in achieving a value, or pushing the deadline


Communication

You must have heard this phrase before “Explain it like I’m 5”, data science communication is all about this. Translating extreme complexity to minimal simplicity is the hardest-to-improve skill for data scientists, as the better you get, the more complexity you will face, and the harder it will be. To mention a few cases where proper communication (AI-Specific) is a must:

  • Project Initiation: Convincing stakeholders to initiate a project necessitates grasping what the end goal is. You need to simulate how it looks like and attach it, always, to a business value. If your main goal is to directly support a decision-making process in a certain industry for example, when presenting a project, you should focus on simulating a decision-making scenario of which the data science project helps at.

  • Limitations: Limitations are unknown to stakeholders, but very well-studied by data scientists. Limitations must be clarified from the beginning as well as documented by focusing on cannot’s. For example: “The project cannot do X”.

  • Timeline: Project timeline choice should align with its value, and a proper Work Breakdown Structure must be prepared and communicated throughout the project life.

  • Performance Report and Continuous Monitoring: You must have communicated your model’s KPI beforehand, you have to bring examples sometimes, people have different perceptions about numbers. 85% accuracy might sound great for a person, but when introduced with an example, it becomes, for the same person, garbage! (I usually like flipping the quality metric by saying, for example, we will make 15 “mistakes” out of 100 “predictions” instead of saying 85% accuracy). Also, when monitoring the model’s performance in production, mistakes can happen, you always have to be ready to offer a proper defense or a proper retrospection when presented by mistakes. One of the things that are most of the time, unfortunately, not included in a data science curriculum is Interpretability. You need to know why the model has predicted an “Apple” instead of an “Orange”, and here where the conundrum peaks! Some projects are critical, and any prediction has a burden of responsibility, so account for the need for interpretability if the project expects it.


Bright Side

Allow me to coat this field with fascination using my own definitions sacrificing some of the scientific jargon.

“Data” in a Different Dimension

Data is our way to represent the real world around us in a slightly different format than what we’re used to. It is a way to share information with others in a more accurate way, it is a method that allows us to play easily with this information using a machine. It’s a technique to convince others with evidence, it’s a method where we capture moments and occurrences of certain real-life events in this world to be later used. Your five senses are a considered data channels to your brain, as much as you can consider your phone’s camera as its sense of sight, or the microphone will be its sense of sound. Each type of computer will have these channelling mechanisms whereby it can receive different data with different formats. What then? The data will set there without any use. Here comes data science!

“Data Science” in a Different Dimension

Data science is an inter-disciplinary field that uses scientific methods, statistics, mathematics, processes, algorithms, and systems to extract knowledge and insights from many structural and unstructured data V. Dhar

Let’s throw that away for a bit and go with a simpler overview. We previously mentioned that data is just a representation of the real world; texts, sounds, images, numbers …etc. but this has no value. Data science transforms this representation, into another representation whereby people can relate to, it adds value and more information to what was only vague data flowing around us into things that are easily understood. After that, it affects our decision, it makes us realize things that we didn’t know before, it changes our actions, and might as well be used to give us a prediction of what will happen if that action was changed. Also, it might tell us things that we could not have known unless we learned, or even if we have learned it, it can tell us in a faster and a more evidential way. Imagine that you spend some amount of money every day, wouldn’t it be useful to see where you spend that money, on a monthly basis, with respect to a certain type of spending. Also, you might have to ask your friend for some amount of money in the next month or reduce how much you spend every day if only you were able to estimate your next month’s budget.

Why Learn Data Science?

“Why Learn Data Science?”, is an interesting question… or… — Questions Alert! — is it? How interesting is it? Why is it interesting? And for whom exactly is it interesting? How many people find that interesting? How many people find it boring? Can I compare how interesting that question is with respect to other questions? But wait? How can I represent the concept “Interesting”? Also, Can I predict the number of people who would be interested in that question this year and in the coming year? Can I predict whether a person would be interested in that question or not before I ask?

Can I — Brainstorming Alert! — answer these questions by just seeing how many people searched for that question on google? Or how many people have clicked on websites that have the answer for that question? Or publish a survey that has related questions with that exact question being answered, and then publish the survey without that question and try to predict whether the person would answer “I am interested in that question” based on his other answers? Or can I just calculate the number of junior data scientists in a region at a certain time?

Data Science will give you the ability to ask questions about anything you see, read, or listen to in your everyday life whether it was as simple as the question above, or as hard as the Large Hadron Collider problem. It will make you capable of thinking about multiple approaches to overcome problems or answer questions. It will change the thought process you follow into an analytical thinker; it will change how you make decisions or receive factual claims from people or assess how truthful the claims are. It will provide you with a logical analytical domain of which you can tell when to accept a claim, reject a claim, or stay neutral.

Data Science is more of a lifestyle, and a philosophy, rather than just a career
 
 
 
bottom of page