Does size matter - The role of big and small data in organisations

Knowledge is certainty, then, and probability is uncertainty. This brings us to the question of what certainty is. – Frederick F. Schmitt (2014, p.51).

Big data has truly become a prevalent topic of conversation for organisations lately. With the recent financial successes that have accompanied the optimal use of big data; it’s no wonder that organisations are starting to take it seriously. But why is big data such a hot topic at the moment? Perhaps it starts with a company called Target.

Predicting pregnancy using big data: Target Online Shopping

Just imagine the following scenario, you are a parent of an 18 year old high school girl and you receive mail from your favourite retailer. Upon opening the envelope, which is covered in the logo of the retailer you discover that it contains a set of coupons. These coupons are for nappies, bottles, and baby food. The note accompanying the coupons congratulates your daughter on her pregnancy. You scoff and laugh at the message knowing full well that your daughter cannot possibly be pregnant. That day after work, your daughter and wife are waiting to talk to you. It appears your daughter is pregnant. To your horror, you realise that the retailer knew your daughter was pregnant weeks before your daughter knew. You know this because the letter was posted a week ago and she only had the pregnancy test today.

This may sound like science fiction, but let me assure you this scenario has already happened. The retailer’s name is Target, and the statistician responsible for predicting the pregnancy is Andrew Pole (Hill, 2012). Target collects a massive amount of data from their clients and uses it to make behavioural predictions. Target realised for example, that pregnant women have very unique online shopping habits and that these women are less brand loyal (Hill, 2012). Why is this important? Well, all retailers want to increase their market share and thus getting clients to change their established brand for a Target brand would be most advantageous. It would be especially advantageous if the organisation could identify people who are losing their brand loyalty so that Target can, excuse the pun, target these individuals. Target therefore wanted to predict pregnancy amongst its female consumers so that it could reinforce its own brand to would-be mothers and/or to turn non-Target customers away from their established brands by luring them to Target during this special period. Using only freely and publicly available demographic and online shopping data, Pole and his team were able to develop a predictive model that could determine the probability a woman was pregnant. So far, Pole hasn’t been wrong.

The Target big data example is just one of many. Google also used data on online shopping habits to profile individuals that have the seasonal flu. Soon, Google was able to track and predict the spread of the seasonal flu far quicker and more accurately than the Center for Disease Control could (Hartford, 2014).

Big data in South Africa

Big data has therefore been making headlines because of its predictive potential where human behaviour can be forecasted much like the weather is forecast today.  South African health insurance company, Discovery Health, uses big data to mitigate financial risk (I.T. Web, 2013). By obtaining data from their clients through popular programs like Vitality, Discovery Health knows things about its client database that many other health insurers don’t know. If you’re a Vitality member you know what I am talking about. A standard Vitality member who has gold status will have told Discovery whether they are a smoker or non-smoker, how much they exercise per week, whether they get depressed or anxious, or whether they have a family history of heart disease (Vermeulen, 2013). With retail partnerships Discovery Health also knows what healthy and/or unhealthy foods a Vitality member is purchasing from certain grocery stores, or whether they visit the gym regularly (I.T. Web, 2013).

With free Vitality health assessments, which are a requirement for Vitality benefits, the insurer will also know your latest blood pressure, cholesterol, and blood sugar readings (Vermeulen, 2013). This information helps the organisation mitigate and forecast financial risk and to actively help improve the physical and psychological well-being of its members. Put simply, if health insurance members are healthy, then there are fewer medical claims. If there are fewer medical claims the insurance company profits. Discovery Health is a sterling example of how big data can inform business about human behaviour, and subsequently help business to change human behaviour for the better. Not to mention that Discovery Health uses the biometric, psychological, and retail data of its members to predict whether an individual is at risk for certain diseases, whether they will frequently or infrequently make medical claims, or how long they will live (I.T. Web, 2013). All of this information is useful and helps to inform business strategy, marketing, and improve profitability. Discovery Health has reported that big data has helped them forecast fraudulent claims so accurately that they saved R 250 million rand in 2012 alone (I. T. Web, 2013).

Using big and small data to make predictions

In short then, big data is useful and profitable. But how do organisations tap into big data and use it to make predictions? Also, is it only big data that can help organisations become more profitable? The answers to these questions are by no means simple. Firstly, organisations don’t just have to tap into big data. Sometimes, data obtained from small samples can be very useful. This is especially true when information is gathered on a particularly important focus group such as a board of directors, executive management team, or high profile clients. Information gathered on these so-called speciality groups can still inform practice even though hardline researchers will often argue that the bigger the data the better. It is important to remember however that not all findings made from data need to be generalised to the larger population. In many cases information is sought from small populations that exist within a company. Any information garnered from such small populations (i.e., executive board, disabled clients, angry suppliers etc.) only need to be generalizable to the population itself.

Secondly, because of the size of big data, organisations often implement people analytics software solutions that instantaneously analyse, capture and manage data. This is not necessarily required. In many cases organisations only have an interest in specific topics or aspects of the business (i.e., good leadership in the executive team, common behavioural factors found in poor performers, or whether an assessment battery predicts performance). In these cases the organisation would often seek out specific data and ask questions about only that data. These projects may take weeks or months and be limited to only a small number of variables. This does not mean that these smaller and more time consuming data analytics projects don’t provide information that is as useful as what massive people analytics systems can provide. It is important however, that organisations consciously decide what types of data they want to capture and how they will capture this data. Small projects often make use of convenience data (data that is captured passively as part of the running of the organisation) or data is captured as part of a project (where data is actively sought out).

Finally it is important to know that the data cannot provide information that is not inherent in the data. This comes down to the old saying “garbage in, garbage out”. Sometimes large people analytics systems capture data that is irrelevant, confounding or practically useless. The researchers therefore need to specify that data be captured that is useful and informative and streamline big data systems to obtain information that can benefit the organisation in some manner. Smaller, focused data analytics projects often do not have this problem as they select variables that are considered important and then find or capture the data required for these variables. Unfortunately, these smaller people analytics projects often have the opposite problem to larger systems in that important data sources are sometimes ignored or neglected. In such cases, data important for answering particular questions may not be available and/or feasible to capture.

In summary then, small data can be as useful as big data if the right questions are asked and the right data is gathered to answer the right questions. With all the benefits associated with big and small data, why not get your people analytics systems in place?

Need advice on people analytics in your organisation?
Contact the research department at JvR Psychometrics who will help you design and implement the people analytics solutions you need.


Hill, K. (2012). How Target figured out a teen girl was pregnant before her father did. Forbes, retrieved from

I.T. Web (2013). Discovery Health finds the needles in its big data haystack. Retrieved from

Schmitt, F. F. (2014). Hume’s epistemology in the Treatise: A Veritistic interpretation. London, England: Oxford University Press.

Vermeulen, A. (2013). Health-record app puts patients’ health records at doctors’ fingertips. Engineering News, retrieved from

Photo Credit: justgrimes via Compfight cc