The Curse of Dimensionality:
Imagine a hypothetical life in a parallel universe: You sell a new product and want to find out which variable(s) correlate with consumer satisfaction. So, you conduct a survey of 400 questions. The questions may ask consumers to rate the price, design, usability, functionality, etc., on a scale of 1 - 5 stars. Then, you wait for enough responses to come in.
After collecting all the responses, you analyze the data and find out that 95% of the people rated design 5 stars, a statistically significant finding. This results you to conclude that product design does not only correlate with consumer satisfaction, but it causes it.
Now, for your next product, you spend millions of dollars on R&D to improve the design as much as possible. This time, however, you notice that your sales haven’t improved. All that extra money you spent gone to waste… But why? Product design and consumer satisfaction have a causal relationship, right? Here is where the curse of dimensionality (CoD) comes in.
Too many dimensions
The reason why you got a ‘statistically significant’ finding was because there were too many dimensions (questions). If there were 400 dimensions in your survey, one rare occurrence was bound to happen. This means that, if there are a plethora of dimensions, you would likely observe many false positives. In this case, the false positive was the product design’s relationship with consumer satisfaction. Thus, product design didn’t cause greater consumer satisfaction, it only correlated with it by random chance.
CoD in real life
Research was conducted to find out if specific genes were more common in people with a high IQ. The researchers analyzed the genes of people with an IQ of 160 or more, and those with a below average IQ. They found that the gene IGF2r was twice as common in those with a high IQ than those with a below average IQ.
However, only after they published their research, they found out that their finding wasn’t actually true. They had repeated their experiment, and this time, there wasn’t a significant correlation between that gene and IQ. The previous ‘causal effect’ happened solely because of random chance. Since they tested so many genes, meaning many dimensions, a random gene was expected to show something so rare - by chance.
I learnt about ‘The Curse of Dimensionality’ from the book: “Everybody Lies”. If you’d like to read more into the CoD, feel free to check out the book.