How to account for diversity in your data and why is it important

 

One of the first things they teach you in any data-related class is the four dimensions of Big Data or otherwise known as the four ”Vs”: volume, variety, velocity and veracity.

In the most simple terms, they mean:

  • Volume - scale of data

  • Variety - different forms of data

  • Velocity - analysis of streamlining data

  • Veracity - uncertainty of data

Through this article, we want to specifically focus on the second one: Variety. Why is it important to have diverse data? Diversity in data is crucial. Non-representative data sets are less likely to yield workable insights than those which cover all factors of the topic being researched. A key thing to remember is that insight can be found in unexpected places. 



In order to account for diversity in your data we suggest that you combine several data sources/forms and most importantly try to think beyond the data that your organization has already available. Instead of just looking at the customer purchase behavior from your records, combine it with social media data from your Twitter, Facebook and Instagram account. Look at what your customers have said about your brand and make inferences from there.


So what could go wrong if diverse data is not present? Take a dive into “Invisible Women: Exposing data bias in a world designed for men” (Chatto & Windus, 2019: Amazon US). This book illustrates how much data bias is out there and the importance of addressing them in everyday work.


Examples of not accounting for diversity


A few of the examples listed in the book that really highlight the importance of having diverse data are:

  • Women are 50% more likely to be misdiagnosed with a heart attack

The reason for this is that during heart attacks, most women do not have the Hollywood symptoms, the chest pain and numbness on the left side. Most women experience stomach pain and breathlessness, for which they are typically sent home after a medical consultation.

  • 75% of unpaid work is done by women

Almost all women have jobs - maybe not your typical job with a signed contract, but they are doing some type of work (taking care of children or elderly). On average, women do three to six hours of unpaid work, while men only do 30 minutes to two hours.

  • Voice-Recognition Technology is 70% more likely to recognize male speech

The datasets on which VR technology is trained, have typically had male voices and as such, they do not recognize female patterns of speech.



If your company has avoided the perils of not having enough diverse data, here’s your next challenge. 



We have a diverse dataset, what next?

If you think you have very diverse data, then you can go one step further and look at  intersectionality in terms of your data. And as intersectionality is a more recent term, we know that there is a lot more that we all need to learn about it and how to better understand what we can do.



What is intersectionality?

A really good definition of intersectionality, which we found on a Peakon blog, is: “Intersectionality is essentially a call for nuance, a call to listen to a person’s own experiences, and a call to recognise that generalisations are as potentially damaging as no action at all. True inclusion means providing a platform for every voice to be heard”. What it essentially means is that for us to better understand our customers and provide services that are tailored to their needs, we need to stop considering them singular blocks. We need to take a deeper look at characteristics of the group and try to understand whether there are subgroups that you need to account for.


Some of the factors to consider when addressing for intersectionality in your analysis are:

  1. Disability

  2. Sexual Orientation

  3. Gender Identity and Expression

  4. Age

  5. Income

  6. Location

By looking at these factors, we can better understand the audience overall and deliver messages or products that better meet their needs. There can be other factors such as religion, mental health, caste, relationship status that can also be looked at - depending on what your analysis is on. 



So why having diverse data and intersectionality important?

Whether you want to create a marketing campaign or draft a policy to improve the community, you need to make sure that you are hitting the right targets with your proposals. You want to make sure that your impact is high and inclusive for everyone.



Our job as data scientists is to address diversity and intersectionality by using our superpower: analyzing DATA. We should be able to systematically collect granular data on as many exhaustive fields as possible and by sifting through that data we can begin to apply the diversity lens.

 
Guest User