How to Avoid Data Pitfalls by Self Spark Chief Science Officer

In a corporate world, where data is king and a humongous volume of data is generated every day, it is an essential duty of a Product manager to filter out the right data, to make an accurate prediction on the lifespan of the Product. It is crucial to not fall to the fallacies that surface in the volumes of data experimentation and analysis. Orin Davis, as an experienced Human Capital Consultant, rightly informs us of the possibility of data pitfalls in our day to day encounters.
A Spark of Genius – Orin Davis

Orin Davis is the principal investigator of the Quality of Life Laboratory and the Chief Science Officer of Self Spark. He is a passionate engineer, who holds a degree in positive psychology and works as a Human Capital & creativity consultant. He helps the budding startups with pitches, propositions, culture and human capital.
Avoiding Data Pitfalls
In the ocean of information that we delve in, on an everyday basis, it is important not to draw sudden conclusions from the data set. Mr. Davis stresses on confounding ‘what?’ with ‘why?’ He says, often we find the ‘what?’ of data easily and assume the why? For instance, from the last Presidential election in the USA, we can get data about the people who voted for Trump and Clinton. We know the ‘what?’ of data here, and we end up assuming ‘why?’ with it.
This is a blunder because what ultimately matters is that the data & conclusions are meaningful.

As a senior mentor and capital advisor, he says, he is often questioned on ‘How do we hire candidates in a company?’ He addresses this concern with a clarification that one should never hire a person based on his personality or strength, as any personality combination or strength combination, can do any given job.
Mr. Davis adds to this by saying that, he has turned many firms upside down in the meetings, as their criteria for hiring were the basic surveys and didn’t check the convergent validity. He calls this situation GIGO – Garbage In and Garbage Out.
GIGO rule states that your conclusions are only as valid as the surveys that collect them.
Labels Lead to Validity
This theory questions the truthfulness of data. That is, the degree to which the findings are “[T/t]rue”. While the truthfulness of data is measured, it is important to hold the argument of the presence of Construct Validity…
Construct Validity – Did we measure what we said we would measure?

This approach to finding the truthfulness of data is important because when we operationalize variables, there are limits. Many variables are abstract, and we also can’t cover every possible angle of a given variable.
There are many variants of Construct Validity. They are:
- Face Validity – Do creative people “think outside the box”? Generalizing that all creative people think away from mundane ways
- Content Validity – Confirmation bias
- Predictive Validity – Predictive tests to validate? Ex: personality test for hiring?
- Concurrent Validity – Measuring concurrency to draw conclusions
- Convergent Validity – Correlates with openness to experience and creativity
- Discriminant Validity – The measures are not related to things from which it should be independent

Method Problems
Some of the factors that affect the validity of the information gathered are:
- History – some external event that affects the result
- Fatigue – survey gets too long and people stop thinking about the questions
- Instrumentation – changes in the instrument due to use and age
- Selection Bias – Groups are not chosen randomly
- Dropout – people may leave the survey without filling the complete questionnaire
While gathering information or data, Validity and Reliability depends on various factors. Here reliability relates to validity. There are potential threats encountered to gain data that is valid and reliable. Some of them are:
- Extreme/moderate response patterns
- Experimenter expectations
- Mood of the participants
- Social desirability
- Language difficulty

Key Takeaways – Best Practices
- Include tracking the failures too. We often only track the survivors. As it only gives us the pattern or ‘what?’. It doesn’t give us the ‘why?’
- Avoid jumping to conclusions. Concentrate of generalizability, meaningful results and valid statistics.
- Get clarity between ‘what?’ and ‘why?’
- Survey =/= experiment.
- Be careful with your words.