Lying with Statistics

Introduction

Firms can “lie” using statistics to attract consumers. The reason I put ‘lie’ in quotation marks is because they are not actually lying, but smartly using statistics to convey facts that consumers find appealing. This is generally done for the sole goal of profit. Let’s begin by first defining and discussing median and mean individually.

Median

The median, in simple words, is the ‘middle point’ of a data set. Let’s consider the setting of a gym. We record the personal best deadlifts of 5 people. The first four people lift the following weights (lbs): 100, 120, 80, 150. Suddenly, Eddie Hall walks in and deadlifts 1100 lbs. So our data set consists of 5 numbers: 80, 100, 120, 150, 1100. Since the median is the midpoint of a dataset, the median here would be 120 lbs. This is reasonable as we intuitively know most of the people in our hypothetical experiment lifted around 120 lbs.

Mean

Shifting the focus to the mean - otherwise called the average, let’s consider the same data set above. We can calculate the average: we sum all our numbers and divide by the number of numbers. There are 5 numbers, which have a sum of 80 + 100+ 120 + 150 + 1100 = 1550 lbs, and 1550 lbs divided by 5 gives approximately 300 lbs. Is this accurate? Yes. Is it reliable? No. This massive number is an overestimate for most of the people in our dataset (and many people generally), and an underestimate for Eddie Hall.

Outliers & Which to Use

We see that the median is not affected by outliers (e.g. the 1100 lbs deadlift). So, even if someone hypothetically lifted a million lbs, it wouldn’t affect our median. Conversely, the mean considers every single value in the data set and so is affected by outliers.

This is why, in many cases, we should use the median. It does not get affected by e.g. typos in a financial sheet. However, we should note that the median is a more useful tool when considering a skewed dataset. For example, when considering incomes, it is heavily skewed due to the many billionaires from businesses. Since the average person goes into a job, the more reliable tool is the median.

Conversely, the mean is a more reliable tool when considering data sets that are distributed symmetrically; the mean is almost always used when considering the normal distribution - or the bell curve.

The Lie

With this in mind, firms can exploit this knowledge to create adverts for profit. For example, let’s create another hypothetical scenario: a weight-loss assisting firm produces a product that they claim helps reduce weight. 8 people buy their product. 6 lose some weight, while the other 2 extraordinary amounts of weight - perhaps these 2 were morbidly obese and bought this product, but also got a weight-loss surgery the next day.

In this scenario, the firm wouldn’t tell consumers about the surgery they got, but they would simple put down the mean weight lost by those 8 people, which would be quite big, since the 2 outliers would increase the mean significantly - thus attracting many consumers to buy the product, creating profit.

Though perhaps unethical, lots of firms use this trick in addition to many other methods of statistically ‘lying’ to generate profit.

Book Suggestion

If you are interested in the topic of how we can lie with statistics, you can read the book: How to Lie with Statistics. I haven’t read this book yet, and so can’t definitely recommend it, but it is definitely a future read for me.

Previous
Previous

Boosting Productivity

Next
Next

The Secretary Problem