- Scott H Young - https://www.scotthyoung.com/blog -

Explore or Exploit: The Hidden Decision that Guides Your Life

Here’s a useful tip next time you go to a restaurant: The best dish on the menu is often the first dish you ever ordered while eating at that place.

Why?

Well if you live in a big enough city, with enough locations to eat out, you end up going to a lot of places. If you have a lousy meal, you probably won’t go there again. If you have a great meal, you’ll go back often.

Any time you go to a restaurant for the first time, some of the dishes will be excellent, others mediocre. When you have a great meal, and thus decide to go back again, those meals are disproportionately from the better end of the range.

What to eat at a restaurant is a simple decision, but it embodies one of the most common types of decisions you make in your life every day: explore or exploit.

Explore or Exploit

Listen to this article

When you go to the restaurant you always have a choice: should you eat something you know is pretty good (exploit), or try something else which might be even better (explore)?

As I’ve argued above, I think the logic mostly favors picking the first thing you ordered at restaurants you like. This doesn’t mean you should never try new dishes, just that your first pick tends to be better than a random selection would imply.

However, explore/exploit decisions come up in every aspect of life:

Even micro-decisions like whether to drive home the usual way during your commute, or take a detour are a version of the explore-exploit problem.

Solving the Explore/Exploit Problem

It would be nice if there were a handy rule for deciding when to explore or when to exploit. It turns out that no solution is known for solving these problems in the general case, and there’s reason to suspect we may never have such a solution.1 [1]

Even in the highly contrived situation known as a multi-arm bandit (where you can choose from a finite list of opportunities, whose values won’t change when you’re not using them), a solution is known, but it’s so difficult to compute that it’s unlikely that any biological system could actually calculate the right choice.2 [2]

Side note: Birds, when given a variation of this problem seem to approximate the optimal solution, suggesting our heuristics for deciding when to explore and exploit may be pretty good.3 [3]

So if explore/exploit problems are super common, yet a general solution is unknown, how do we do it?

One option is simply to make the “best” decision you know each time, given the information available (exploit), but have some randomness added so you sometimes try something different (explore). Order your favorite dish two thirds of the time, but one third of the time, pick a different one at random.

Another option is to deliberately explore when you’ll have more time. In experiments, people are more likely to explore new options when they think they’ll have more time to act. Constrain the time, and the safer, known options are more likely to get picked. If you expect to go to a restaurant for years to come, you may want to try all the dishes, compared to if you’re only visiting that city for a week.

A third option is to integrate information outside our personal experience to guess at how good our current opportunities are. If your friends are raving about the linguini, and your pizza is so-so, you may want to switch next time even if you haven’t tried it yet.

Age and Exploration

Time plays a crucial role in explore/exploit trade-offs. Think you’ll have tons of time left to take advantage of whatever you encounter, and people become much more willing to try new things. Think time is limited and you’ll stick to what you know best.4 [4]

This impacts how we age. Children are the ultimate explorers. They’ll try things they’re bad at. Make new friends easily. Approach new situations with curiosity. (Interestingly, food may be an exception, as they may be hardwired to avoid accidentally ingesting something poisonous.5 [5])

In contrast, as we get older, we orient our lives around known rewards. We spend time with old friends and family, rather than meet new people. We stick to existing careers and hobbies. How young/old you feel may have more to do with your perceived long-term time horizon than your actual age or physical limitations.

Local Maxima Traps

A local maxima is a small hill next to a mountain. When you get to the top of the hill, the only way you can go higher is by going down first. But, stay at the top of the hill and you’ll miss the views from the mountaintop.

Failing to explore enough can result in local maxima traps. A woman I knew had a promising start as a medical student, but in her early days of college started making a lot of money bartending. It became harder to do both, her grades suffered and she ended up dropping out.

Being exposed to a particularly high-value opportunity early on can trick our brains. Having seen an opportunity so much better than anything else seen, it’s easy to prematurely exploit—choosing big tips at the expense of a better long-term career.

In my own life, I felt the opposite pattern was important. It took me years to earn enough money from writing that I could go full-time. Except I was a student, so my alternatives for making money were less attractive. A different friend of mine had a successful start to his writing career, but he was making so much money as a contract programmer, after a year’s work he decided to quit.

Be wary how you judge early successes or failures. Sometimes an early success can be a trap, forever pegging your expectations to a standard most good long-term opportunities will not meet.

Ambition and Exploration

Openness to new experiences seems obviously correlated with exploration. Some people may have a personality trait that pushes them to explore more, while others opt for taking the safe bet.

However, I suspect ambition is also a factor. Ambition seems to be a combination of a knowledge of the potential that could be achieved that exists in the world, as well as a certain confidence that you might actually be able to achieve it.

More ambitious people will likely explore more, turning down great jobs and gigs, since their baseline expectations for how good rewards are “out there” is much higher. This change in strategy itself may precipitate the success ambitious people experience. If my friend had seen past her (relatively) high earnings as a bartender, she might have stuck through medical school.

I remember turning down freelance writing gigs when I still had barely enough for basic expenses. Even though those gigs paid much better than anything else I had going on at the time, I knew I wanted to build my own business, not somebody else’s. That decision cost me in the moment, but it allowed me space to work on projects that would eventually bring success.

In other cases, exploration isn’t driven by a specific ambition that you think you ought to get paid more, but by a lowered sensitivity to traditional rewards in the first place. The project that has made the biggest impact on my career thus far was the MIT Challenge [6].

Yet I didn’t expect to profit much from it at all, and a close friend even strongly discouraged me from pursuing it, because he felt I had better opportunities. Instead, my decision wasn’t driven by expected reward, but simply because it was a bigger unknown than anything else I was considering.

Enabling Exploration

I mentioned before that time horizon plays a pretty important role in explore/exploit decisions. This isn’t just the amount of time you have left to live, but also, how quickly you think you need a payoff in order to keep going.

A heroin addict is the extreme exploiter. Someone who has a very high reward value for a known option (doing heroin) and who needs a fix right now. Addicts don’t generally dabble for the uncertain promise of future rewards.

Even if you’re not addicted to drugs, your life circumstances also determine your theoretical time horizon for which to consider explore and exploit decisions. If you feel safe, comfortable and confident, you’ll be much more willing to quit your job and try a new career, switch majors in school, date someone different or try out a new business that may fail utterly.

More exploration isn’t always better, but having the space to explore more usually is. Our intuitions about when to exploit or explore tend to be fairly good, or at least better than a formal analysis. However, if that decision is forced—because getting by today demands sacrificing trying anything new—we may end up making worse decisions.

I think there’s a couple ways you can create this kind of space, psychologically, to at least allow for the possibility of more exploration in your life:

I don’t think spaciousness necessitates exploration. If you’ve got a great spouse, you probably won’t get divorced just because you can. Rather, it allows you to better make optimal decisions rather than stay stuck picking between crappy choices.

Cultivating spaciousness itself isn’t easy. If you feel trapped, no platitude will open the world up instantly for you. Instead, you make small changes that slowly increase the space you have. Improve your finances, health and time management so that constraints which felt overwhelming loosen up a little.

More than anything, however, I think cultivating space is a goal you should have. For many, the aim of life is to squeeze the space out of every moment–every dollar spent, every second scheduled. In this case, I think the attitude of optimization may actually be a weakness, if it closes you off to new opportunities that might be better than anything you’ve seen so far.

Footnotes

  1. Cohen, Jonathan D., Samuel M. McClure, and Angela J. Yu. “Should I stay or should I go? How the human brain manages the trade-off between exploitation and exploration.” Philosophical Transactions of the Royal Society B: Biological Sciences 362, no. 1481 (2007): 933.
  2. Ibid. p. 935 [7]
  3. Krebs, John R., Alejandro Kacelnik, and Peter Taylor. “Test of optimal sampling by foraging great tits.” Nature 275, no. 5675 (1978): 27.
  4. Wilson, Robert C., Andra Geana, John M. White, Elliot A. Ludvig, and Jonathan D. Cohen. “Humans use directed and random exploration to solve the explore–exploit dilemma.” Journal of Experimental Psychology: General 143, no. 6 (2014): 2074. [8]
  5. Brown, Steven Daniel, and Gillian Harris. “A theoretical proposal for a perceptually driven, food-based disgust that can influence food acceptance during early childhood.” International Journal of Child Health and Nutrition 1, no. 1 (2012): 1-10.