PhD Defense: Applying Policy Gradient Methods to Open-Ended Domains
IRB-5105 or https://umd.zoom.us/j/93063337480
Deep reinforcement learning (RL) has been successfully used to train agents in complex video game environments including Starcraft 2, Dota 2, Minecraft, and Gran Turismo. Each of these projects utilized curriculum learning to train agents more efficiently. However, systematic investigations of curriculum learning are limited and it is rarely studied outside of toy research environments. Modern RL methods still struggle in stochastic, sparse-reward environments with long planning horizons. This thesis studies these challenges from multiple perspectives to develop a stronger empirical understanding of curriculum learning in complex environments. By introducing novel visualization techniques for reward surfaces and empirically investigating key implementation details, it explores why policy gradient methods alone are insufficient for sparse-reward tasks. These findings motivate the use of curriculum learning to decompose problems into learnable subtasks and to prioritize learnable objectives. Building on these insights, this dissertation presents a general-purpose library for curriculum learning and uses it to evaluate popular automatic curriculum learning algorithms on challenging RL environments. Curricula have historically been effective for training reinforcement learning agents, and a fundamental understanding of automatic curriculum learning is an essential step toward developing generally capable agents in open-ended environments.