9 Discussion & Next Steps
Inspired by the observation that sub-second fluctuations in the levels of the neurotransmitter dopamine seemingly reflect factual and counterfactual information,[30] I hypothesized counterfactual predicted utility theory (CPUT) as a generative model of decision-making under risk. CPUT provides an empirical framework for how the human brain might represent and integrate information from a risky prospect to guide choice behavior. I compared CPUT’s model fit on behavioral data from a sure-bet or gamble task[35] to EUT, a normative theory of decision-making often applied as a descriptive model.12
Largely because of observations that people systematically violated EUT’s predictions,[4,5] I expected CPUT to provide a better explanation for human choice data on the sure-bet or gamble task than EUT. My results suggest the opposite, however. Looking at the three model comparison metrics discussed in Section 8.3, EUT explains the observed data more, may perform better on unobserved data, and can recapitulate choice behavior using recovered parameters better than CPUT.13
Before concluding that EUT is the best fit, however, I want to point out that my results do not reflect EUT in its original formulation. That is, classical economic theories like EUT assume ‘hyper-rational’ Homo economicus agents that act to maximize their expected utility.[50] Yet a consistent finding of human decision-making is the stochastic, or randomness, of people’s actions. When presented with the same prospects, people often-times make different choices.[51,52] To account for this, computational modelers often use a ‘soft’ maximum choice policy which assumes that people usually (but not always) pick the option with the higher expected utility. If people are true utility maximizers, the softmax sensitivity parameter, \(\tau\), would be very a large positive number. This would suggest that people are very sensitive to differences in utility and have a high probability of making a utility maximizing choice.14 From this perspective, the predictions of EUT in its original depiction are not observed in my analysis. The population-level 95% HDI for EUT’s \(\tau\) estimate, [1.51, 2.62] with median of 2.05, allows for stochasticity in choice preference.
Moreover, in looking at the model fit for CPUT, the population-level posterior estimates for \(\gamma\) are clustered around zero (95% HDI is [<0.001, 0.015] with median of 0.006). I believe that such homogeneous estimates, which indicate nearly no weight is placed on counterfactual information, may be a limitation of the task design. Although I acknowledge I was able to accurately recover simulated parameters (described in Section 8.1), I did so using all 252 prospects possible given the sure-bet or gamble design. For the data collected, no participant completed more than 158 unique choices, some seeing as little as 115. It is possible that the choices for the unseen prospects may be provide necessary information about the likelihood of \(\gamma\) being a specific value and the lack of such data resulted in the tightly clustered parameter distribution. Further, no prospects were presented in the loss domain which decreases the range of counterfactual information available.
For these reasons, I advocate for continued research into the neurobiological mechanisms of processing factual and counterfactual information and the downstream behavioral consequences. What might this look like? Well, as Homo sapiens, our decisions are necessarily influenced by emotions (see Lerner and colleagues[53] for a review). Do our emotions arise from the same neurobiological representations of factual and counterfactual information? Is the feeling of regret or relief following a decision simply a consequence of neurochemical fluctuations? Investigations of the neurocomputational mechanisms underlying decision-making and reward processing may start to address these questions.
One possible way forward is with valence-partitioned reinforcement learning, a framework recently proposed by Kishida and Sands as part of a ‘Dynamic Affective Core.’[54] This framework makes explicit predictions for how the dopaminergic system could interact with an opponent one and inform not only our choice behavior in situations with multi-valenced outcomes but also our associated subjective experiences and feelings.[54] There are clear applications of valence-partitioning when investigating asymmetric choice behavior in decisions involving gains and losses (e.g., prospect theory).[4]
To further my thesis work, I plan to incorporate the ideas of valence-partitioned reinforcement learning with the counterfactual signaling literature in order to better understand the neurobiological basis of decision-making under risk. For example, it’s possible (probable?) that counterfactual outcomes are contextually represented as both punishments (losses) and rewards (gains). Examining the predictions of CPUT on a multi-valenced decision-making task might offer such insights.