Solving Montezuma's Revenge with Planning and Reinforcement Learning


How much domain knowledge do a SoTA planning algorithm, and a basic RL algorithm, need to solve Montezuma’s Revenge, the benchmark for sparse rewards in RL? It turns out that a few simple tweaks do it, but we speculate that it’s not straightforward to do equivalent tweaks automatically for other games.

Traditionally, methods for solving Sequential Decision Processes (SDPs) have not worked well with those that feature sparse feedback. Both planning and reinforcement learning, methods for solving SDPs, have trouble with it. With the rise to prominence of the Arcade Learning Environment (ALE) in the broader research community of sequential decision processes, one SDP featuring sparse feedback has become familiar: the Atari game Montezuma’s Revenge. In this particular game, the great amount of knowledge the human player already possesses, and uses to find rewards, cannot be bridged by blindly exploring in a realistic time. We apply planning and reinforcement learning approaches, combined with domain knowledge, to enable an agent to obtain better scores in this game. We hope that these domain-specific algorithms can inspire better approaches to solve SDPs with sparse feedback in general.

Solving Montezuma’s Revenge with Planning and Reinforcement Learning