The quality of performance of a Machine Learning model heavily depends on its hyperparameter settings. Given a dataset and a task, the choice of the machine learning (ML) model and its hyperparameters is typically performed manually. Hyperparameter Optimization (HPO) algorithms aim to alleviate this task as much as possible for the human expert.
The design of an HPO algorithm depends on the nature of the task and its context, such as the optimization budget and available information. Below are some of the different flavors of performing HPO.
Bayesian Optimization is widely recognized as one of the most popular approaches for HPO, thanks to its sample efficiency, flexibility, and convergence guarantees. The central concept revolves around treating all desired tuning decisions within an ML pipeline as a search space or domain for a function. This function represents the evaluation of an ML pipeline under a fixed compute budget and yields a performance metric that is typically minimized. Through the iterative suggestion of promising configurations, HPO algorithms strive to converge toward the global optimum.
Combined Algorithms Selection and Hyperparameter Optimization (CASH)
An AutoML system needs to select not only the optimal hyperparameter configuration of a given model but also which model to be used. This problem can be regarded as a single HPO problem with a hierarchy configuration space, where the top-level hyperparameter decides which algorithm to choose and all other hyperparameters depend on this one. To deal with such complex and structured configuration spaces, we apply for example random forests as surrogate models in Bayesian Optimization.
- Auto-sklearn provides out-of-the-box supervised machine learning by modeling the search space as a CASH problem.
- Auto-Pytorch is a framework for automatically searching neural network architecture and its hyperparameters and also makes use of structured configuration space.
- SMAC implements a random forest as a surrogate model which can efficiently deal with structured search spaces.
Using many-fidelities for early-stopping HPO
The black-box view adopted by Bayesian Optimization can be relaxed with a gray-box view, which allows access to intermediate states of a targeted machine-learning model. That is, the function to be optimized has a proxy state along one or more variables (fidelities) that can be obtained at a cheaper cost and likely indicates the performance of the target state. HPO algorithms that can leverage search over fidelities can provide better anytime performance.
Speeding up HPO with learning curves extrapolation techniques
Various machine learning algorithms that are trained iteratively yield learning curves. Under different hyperparameter settings, different learning curves can be obtained. Exploiting the smooth trends of a learning curve from a partially trained machine learning model to predict future performance is an active area of research with promising results.
HPO with expert prior inputs
Although HPO can be seen as removing the human from the loop, the intuition and experience of the human expert offer valuable information as a guide for an HPO algorithm. The challenge then is to find suitable interfaces and principled methodologies to realize practical algorithms.
Benchmarks for reproducible research
Evaluation of AutoML and especially of HPO faces many challenges. For example, many repeated runs of HPO can be computationally expensive, the benchmarks can be fairly noisy, and it is often not clear which benchmarks are representative of typical HPO applications. Therefore, we develop HPO benchmark collections that improve reproducibility and decrease the computational burden on researchers.
- The book “AutoML: Methods, System, Challengers” provides a concise overview about HPO.
- For more focused HPO for DL, refer here.
- HPO to optimize for multiple objectives (MOO) can be found here.
- An Overview of various HPO tools available as open-source.
Please also check our blog posts for our work in HPO (including BO)