Dynamic Algorithm Configuration for Evolutionary Algorithms

In CMA-ES, the step size controls how fast or slow a population traverses through a search space. Large steps allow you to quickly skip over uninteresting areas (exploration), whereas small steps allow a more focused traversal of interesting areas (exploitation). Handcrafted heuristics usually trade off small and large steps given some measure of progress. Directly learning in which situation which step-size is preferable would allow us to act more flexible than a hand-crafted heuristic could. To learn such dynamic configuration policies, one approach is dynamic algorithm configuration through the use of reinforcement learning.

Through the use of DAC we could learn policies that are capable of generalizing beyond the training setting. DAC policies could transfer the learned step-size adaptation to functions of higher dimensions, over longer optimization trajectories of CMA-ES as well as other function classes.