Mariamz

Removing big data bias

Posted on: May 5, 2013

In a recent post Haowen Chan and Robin Morris warn “the last thing you want to do is implement a [big data] system that develops and propagates data, only to learn it’s hopelessly biased.” All research and analysis has bias built in by the very nature of human involvement. However Chan and Morris provide four useful bias-quelling tactics that can be used to improve the big data science process:

  • Employ domain experts Rely on them to help select relevant data and explore which features, inputs and outputs produce the best results. If heuristics are used to gain insights into smaller data sets, the data scientist will work with the domain expert to test the heuristics and ensure they actually produce better results. Like a pitcher and catcher in a baseball game, they are on the same team, with the same goal, but each brings different skill sets to complementary roles.
  • Look for white spaces  Data scientists who work with one data set for periods of time risk complacency, making it easier to introduce bias that reinforces preconceived notions. Don’t settle for what you have; instead, look for the “white spaces” in your data sets and search for alternate sources to supplement “sparse data.”
  • Open a feedback loop This will help data scientists react to changing business requirements with modified models that can be accurately applied to the new business conditions. Applying Lean Startup like continuous delivery methodologies to your big data approach will help you keep your model fresh.
  • Encourage your data scientists to explore.  If you can afford your own team of data scientists, be sure they have the space and autonomy to explore freely. Some equate big data to the solar system, so get out there and explore this uncharted universe!

We can also consider what bias we are encouraging when we develop systems – from social media plugins to smart objects – which collect ‘big data,’ or data which could be aggregated into big data analysis. Might we be unfairly representing a picture from our data subjects, either by representation or omission? Collection, processing and analysis are all crucial to consider in the quest for useful and accurate big data outcomes.

Image of what the Internet looks like via Flowing Data – the work of Peer 1 Hosting & team

Leave a Reply

Fill in your details below or click an icon to log in:

WordPress.com Logo

You are commenting using your WordPress.com account. Log Out / Change )

Twitter picture

You are commenting using your Twitter account. Log Out / Change )

Facebook photo

You are commenting using your Facebook account. Log Out / Change )

Google+ photo

You are commenting using your Google+ account. Log Out / Change )

Connecting to %s

This blog is about utilizing and optimizing the social web for business, pleasure and social change

My tweets

Enter your email address to follow this blog and receive notifications of new posts by email.

Creative Commons License
This work is licenced under a Creative Commons Licence.

PositionDial

The views in this blog do not reflect that of my employer