Data scientists at Facebook soon hit back with their own ‘study:’ “In keeping with the scientific principle (used by Princeton) ‘correlation equals causation,’ our research unequivocally demonstrated that Princeton may be in danger of disappearing entirely.”
Is it surprising that the original Princeton study found its way onto the front pages of newspapers and magazines across the world? Probably not – the fact is statistical results with a causal interpretation have a stronger effect on our thinking than non-causal information.
What the data scientists at Princeton relied upon in presenting their paper was our individual human inability to think statistically. On the other-hand there is a well-known phrase within science and statistics where it is emphasized that correlation does not imply causation. The Facebook researchers were in fact poking fun at the ‘silly use of correlation equals causation.’ Causality can be very complex.
This is where Machine Learning and Behavioral Economics meets Big Data.
Machine Learning is proving to be a very good (and much in demand) methodology for automating data analysis, especially using probability theory. The probabilistic approach to machine learning is closely related to the field of statistics, but differs ever so slightly in terms of its emphasis and terminology. Probability theory can be applied to any problem involving uncertainty.
Behavioral economics recognizes that people cannot instinctively understand the nonlinear aspect of probability. The very foundation of Behavioral Economics rests on work done by Daniel Kahneman and Amos Tversky, Prospect Theory. (Kahneman – a psychologist – won the Nobel Prize in Economics for their work – sadly Amos had passed away when the prize was awarded).
Prospect theory describes the way people choose between probabilistic alternatives that involve risk and therefore uncertainty.
Machine learning helps computers to automatically detect patterns in data, and then use the uncovered patterns to predict future data, or to perform other kinds of decision making under uncertainty.
Machine Learning is thus often about pattern recognition. Daniel Kahneman in his excellent book Thinking, Fast and Slow states: “We (humans) are pattern seekers.” He goes on to provide examples of: “the ease with which people see patterns where non exists.”
With respect to data analysis, this pattern seeking was famously referred to as the ‘Crud Factor’ by psychologist Paul E. Meehl, leading us to believe there are real relationships in the data where in fact the linkage is insignificant.
At its core behavioral economics provides a tool for decision-making. A goal of Machine Learning is to find ‘interesting patterns’ in the data. Data science and data analysis is concerned with providing knowledge discovery for improved decision making.
Behavioral economics is at the forefront of research in human biases and heuristics, of which we have many. With Machine Learning programmers design the algorithms to begin with and embed in them all their biases and, in many cases, wrong assumptions.
A central conceptual theme in Machine Language is the use of Bayesian modeling to describe and build inference algorithms. Behavioral economists recognize the standard problems of Bayesian inference; the fact that something is more representative does not make it more likely, knowledge of which helps data scientists better comprehend the data.
Big Data analysts and machine learning engineers will benefit significantly through a broad understanding of behavioral economics.
Data analysis is complex and requires both human and machine’s working together. The data scientists with knowledge of the biases, heuristics and works on decisions under uncertainty that behavioral economics provide will likely offer far more knowledgeable analysis than those without behavioral economics reasoning.
IBM, who earned a reported $4billion from Big Data in 2013 and are investing significantly in their Watson Artificial Intelligence and Machine Learning system, recognize this and recently invited Daniel Kahneman to speak at their ‘Cognitive Computing’ event.
The potential for conclusions to be affected by inaccurate, delayed, or misunderstood consumer data is real. Without expert human guidance and interpretation, the data may be little more than empty noise.
The following is an interesting video of Dr. Kahneman speaking at Google:
Photo EC and the Brain