**By Prof Utpal K De, Dr Arjun Kumar**

Everyone is well aware of the uses of statistics — how data is used to reveal some facts like changes of some variables over time and across communities depending on various aspects of society. Also, data is used to correlate various factors and form welfare policies. However, there are several abuses of statistics too. For example, a positive change in any aspect might directly be credited to its respective policy.

However, we ignore the other side impacts or external factors due to which the change might have occurred. If one excludes these factors, one might realise that the policy, by itself, might have had minimal impact.

To discuss the fallacies and misinterpretation of statistics, Genalpha DC at IMPRI Impact and Policy Research Institute, New Delhi organised an IMPRI Special Lecture – The State of Statistics – #DataDiscourses on Uses and Abuses of Statistics by Prof Manoranjan Pal, a former Professor at the Economic Research Unit, and former Director of International Statistical Education Centre, Indian Statistical Institute (ISI), Kolkata.

Prof Pal’s research work is currently focused on measurement of poverty, inequality and segregation; applied econometrics; measurement of status of health and nutrition; gender bias and empowerment of women among others. He is also the co-author of “Basic Econometrics” (sixth edition) with DN Gujarati and DC Porter.

Prof Utpal K De, a Professor at North-Eastern Hill University (NEHU), Shillong, was the moderator of this session. He mentioned that while collecting data and information, there is scope for several mistakes and errors, which cause a hindrance in the analysis of the data. Knowingly or unknowingly, mistakes can be committed in the analytical methods too.

Without having knowledge or using common sense, if we directly just use the data to reach a conclusion, it’s possible we present a wrong interpretation of the data. This will, in turn, lead to wrong outcomes and policy formulation.

Growth of two rupees to four rupees exhibits a 100% growth and zero to one shows an infinite rate of growth. This shows that one has to be very meticulous, methodical and understand the actual relationship among variables.

To quote Benjamin Disraeli – “There are three kinds of lies: lies, damned lies, and statistics.”

Prof Pal didn’t completely agree with this famous quote because a statistical result depends on how one interprets and applies the data. If one fails to apply the data correctly, they may come up with false answers. Uses and abuses have been described by him with appropriate reasoning and logical arguments.

**Deceptive Statistics**

According to Prof Pal, the way in which a question is worded matters a lot. It is difficult to say no to certain questions, for example, about whether children should be given more open space to play freely or about how women harassment is increasing on a daily basis. These types of questions are called directed questions. He notes that one should avoid directed questions while collecting data.

The approach of the investigator to the respondent is also crucial. The background and presence of other persons during the interview also matters. It also depends on many other factors like the availability of time of the respondent, religious beliefs and social dictums, etc.

It is less known that the style of the questions, like “should” vs “should not”, “is” vs “is not”, etc also matters in a significant manner.

**The Experiment**

He also gave some examples of some experiments and showed how the outcome changes with variation in sample size, pattern of question, mode of analysis etc.

An experiment was conducted with these styles of questions. They set 9 questions to get family related views and 14 questions to get social views on different aspects of gender violence from adult males to females. There were two styles of questions, each giving the opposite meaning.

Example, Type A, or the Affirmative Question, “Women have the right to express their opinion if they disagree with their partner.”

Type B, or the Negative Question, “Women do not have the right to express their opinion if they disagree with their partner.”

Both these questions should not be posed to the same person. Type A questions should be asked to some people and Type B to some other people. This rule was followed in this experiment too and the results were presented.

**Results**

Out of 51 males, 49 agreed with the Type A question and 2 disagreed. In the Type B question, 36 out of 52 males disagreed and 16 agreed. Now, if 49 out of 51 agreed with Type A, then ideally, 49 males out of the 52 should have disagreed. That is because disagreement in Type B is equivalent to agreement in Type A. So, we would expect 49 persons to disagree with Type B but that was not the case. This type of pattern was seen in almost all the questions.

Coming back to deceptive statistics, another major source of deception is the use of inappropriate methods of drawing sample and collecting data. They tend to paint a false picture of a particular phenomenon.

**Random Sample**

To understand the true nature of a random sample, Prof Pal gave an example of a survey conducted among the readers of a particular magazine on whether they liked a certain political party to be in power. The readers were instructed to just fill the form and post it without disclosing their identity.

Suppose more than 60% say yes. Does the opinion of the survey reflect the correct percentage? The answer is no. By random, we mean that every individual in the population has a positive probability of coming in the sample and this probability is known beforehand. And suppose, this magazine is only catering to the higher income group. So, this random sample is not from the whole population.

**Sample Size**

Another important point to keep in mind is that the sample size should always be large in order to arrive at a conclusion. For example, if the sample size of a survey to gauge the likeability of a product is 10 and 7 of them rate the product positively, it doesn’t mean that this is the popular opinion. Even though 70% of the sample rated the product positively, we cannot confidently ascertain the opinion of that product since the sample size in question is too small.

**Simple Solutions**

It is not true that we always need sophisticated statistical techniques to solve a problem. An example could be of the salt case. Just after Independence, many refugee camps were set up, especially in the border states such as West Bengal. The refugees were given rice and pulses by the government of West Bengal.

However, the government felt that the number of people actually in the camps is lower than the number provided by the contractors of those camps. Thus, expert help was requested to ascertain the real size of the population. By using the simple unitary method of dividing total salt consumption of the group by average amount of salt consumed by each person, the experts were able to figure out the actual number of refugees in the camps. Thus, sometimes solutions can be found through simple methods.

**Interpretation of Results**

Hypothetically, If the mean depth of a river is 3 ft, one would assume that it is safe to cross the river. However, factors such as the variation of the range of depth at different points and velocity of its flow also need to be taken into consideration. Therefore, while interpreting a result, one must take into account the associated factors.

Another more common example would be that of inflation. When the inflation rate goes down, many assume that prices are falling. In reality, prices are still increasing but at a lower rate. Prices will only begin falling when the inflation rate becomes negative.

**Correlation Coefficients**

A linear relationship between two variables doesn’t imply direct correlation. Both the variables might be corresponding to a common factor. For example, production of paddy in Assam may have high correlation with number of road accidents in Kolkata. But in reality, this correlation has no logic supporting it. Thus, we shouldn’t seek unnecessary relation.

After a very comprehensive presentation by Prof Pal on uses and abuses of statistics, the discussant, Dr Gour Gobinda Goswami, Professor at Department of Economics, School of Business and Economics, North-South University, Dhaka, Bangladesh mentioned a few more important points.

Dr Goswami spoke about the importance of the setup in a survey. He substantiated his point with an example of how most responses he received from a survey of a group of children were unauthentic. This was due to the fact that the children were talking to each other and not responding to the questions independently.

While studying the uses of statistics, it is equally important to study the abuses. If one only studies the uses and not the abuses, one might be misguided when it comes to conducting research or interpreting data. Information must be used in an appropriate manner in order to arrive at the most accurate possible conclusion.

—

*Acknowledgment: Chhavi Kapoor is a research intern at IMPRI and is pursuing bachelors in Political Science, Literature, and Economics from St. Xavier’s College, Mumbai*