Hyperparameter Optimization (HPO) aims at finding a well-performing hyperparameter configuration of a given machine learning model on a dataset at hand, including the machine learning model, its hyperparameters and other data processing steps. Thus, HPO frees the human expert from a tedious and error-prone hyperparameter tuning process.
The loss landscape of a HPO problem is typically unknown (e.g., we need to solve a black-box function) and expensive to evaluate. Bayesian Optimization (BO) is designed as a global optimization strategy for expensive black-box functions. BO first estimates the shape of the target loss landscape with a surrogate model and then suggests the configuration to be evaluated in the next iteration. By trading off exploitation and exploration based on the surrogate model, it is well known for its sample efficiency.
- SMAC is a versatile tool for optimizing algorithm hyperparameters, implementing different surrogate models, acquisition functions and model transformations.
- BOHB implements a variant of TPE as a BO approach.
Combined Algorithms Selection and Hyperparameter Optimization (CASH)
An AutoML system needs to select not only the optimal hyperparameter configuration of a given model, but also which model to be used. This problem can be regarded as a single HPO problem with a hierarchy configuration space, where the top-level hyperparameter decides which algorithm to choose and all other hyperparameters depend on this one. To deal with such complex and structured configuration spaces, we apply for example random forests as surrogate models in Bayesian Optimization.
- Auto-sklearn provides out-of-the-box supervised machine learning by modelling the search space as a CASH problem.
- Auto-Pytorch is a framework for automatically searching neural network architecture and its hyperparameters and also makes use of a structured configuration space.
- SMAC implements a random forest as a surrogate model which can efficiently deal with structured search spaces.
The increasing data size and model complexity makes it even harder to find a reasonable configuration within a limited computational or time budget. Multi-Fidelity techniques in general approximate the true value of an expensive blackbox function with a cheap (maybe noisy) evaluation proxy and thus, increase the efficiency of HPO approaches substantially. For example, we can use a small subset of the dataset or train a DNN for only a few epochs.
- Auto-sklearn increased its efficiency in version 2.0 by using multi-fidelity optimization.
- Auto-Pytorch was designed as a multi-fidelity approach from the first moment and demonstrates how important it is for AutoDL.
- SMAC implements the approach of BOHB, by combining Hyperband as a multi-fidelity approach and Bayesian Optimization.
Evaluation of AutoML and especially of HPO facesmany challenges. For example, many repeated runs of HPO can be computationally expensive, the benchmarks can be fairly noisy, and it is often not clear which benchmarks are representative for typical HPO applications. Therefore, we develop HPO benchmark collections that improve reproducibility and decrease the computational burden on researchers.
- HPOBench (formerly HPOlib) is a benchmark collection for HPO benchmarks.
- ACLib is a benchmark collection for algorithm configuration
The book “AutoML: Methods, System, Challengers” provides a concise overview about HPO.