Blog

AutoML adoption in software engineering for machine learning

By Koen van der Blom, Holger Hoos, Alex Serban, Joost Visser

In our global survey among teams that build ML applications, we found ample room for increased adoption of AutoML techniques. While AutoML is adopted at least partially by more than 70% of teams in research labs and tech companies, for teams in non-tech and government organisations adoption is still below 50%. Full adoption remains limited to under 10% in research and tech, and to less than 5% in non-tech and government. (more…)

Read More

Auto-PyTorch: Multi-Fidelity MetaLearning for Efficient and Robust AutoDL

By , ,

Auto-PyTorch is a framework for automated deep learning (AutoDL) that uses BOHB as a backend to optimize the full deep learning pipeline, including data preprocessing, network training techniques and regularization methods. Auto-PyTorch is the successor of AutoNet which was one of the first frameworks to perform this joint optimization. (more…)

Read More

NAS-Bench-301 and the Case for Surrogate NAS Benchmarks

By , , , , ,

The Need for Realistic NAS Benchmarks

Neural Architecture Search (NAS) is a logical next step in representation learning as it removes human bias from architecture design, similar to deep learning removing human bias from feature engineering. As such, NAS has experienced rapid growth in recent years, leading to state-of-the-art performance on many tasks. However, empirical evaluations of NAS methods are still problematic. Different NAS papers often use different training pipelines, different search spaces, do not evaluate other methods under comparable settings or cannot afford enough runs for reporting statistical significance. NAS benchmarks attempt to resolve this issue by providing architecture performances on a full search space using a fixed training pipeline without requiring high computational costs. (more…)

Read More

Learning Step-Size Adaptation in CMA-ES

By

Comparison of example optimization trajectory and corresponding step-size of our approach (“LTO”) to a hand-crafted heuristic (“CSA”)

In a Nutshell

In CMA-ES, the step size controls how fast or slow a population traverses through a search space. Large steps allow you to quickly skip over uninteresting areas (exploration), whereas small steps allow a more focused traversal of interesting areas (exploitation). Handcrafted heuristics usually trade off small and large steps given some measure of progress. Directly learning in which situation which step-size is preferable would allow us to act more flexible than a hand-crafted heuristic could. To learn such dynamic configuration policies, one approach is dynamic algorithm configuration (DAC) through the use of reinforcement learning. (more…)

Read More

Auto-Sklearn 2.0: The Next Generation

By ,

Since our initial release of auto-sklearn 0.0.1 in May 2016 and the publication of the NeurIPS paper “Efficient and Robust Automated Machine Learning” in 2015, we have spent a lot of time on maintaining, refactoring and improving code, but also on new research. Now, we’re finally ready to share the next version of our flagship AutoML system: Auto-Sklearn 2.0.

This new version is based on our experience from winning the second ChaLearn AutoML challenge@PAKDD’18 (see also the respective chapter in the AutoML book) and integrates improvements we thoroughly studied in our upcoming paper. Here are the main insights:

(more…)

Read More

Dynamic Algorithm Configuration

By

A versatile DAC can handle any situation at any time. (Image credit: Ina Lindauer)

 

Motivation

When designing algorithms we want them to be as flexible as possible such that they can solve as many problems as possible. To solve a specific family of problems well, finding well-performing hyperparameter configurations requires us to either use extensive domain knowledge or resources. The second point is especially true if we want to use algorithms that we did not author ourselves. We most likely know nothing of the internals of the algorithm, so to figure out what makes this black-box tick, far too often, researchers of all flavors fall back to “Graduate Student Descent”.

Automated hyperparameter optimization and algorithm configuration offer far better alternatives, allowing for more structured and less error-prone optimization in potentially infinitely large search spaces. Throughout the search, various useful statistics are logged which can further allow us to gain insights into the algorithm, it’s hyperparameters and tasks the algorithm has to solve. At the end of the search, they present us with a hyperparameter configuration that was found to work well, e.g quickly finds a solution or has the best solution quality.

However the resulting static configurations these approaches are able to find exhibit two shortcomings:

  1. The configuration will only work well when the configured algorithm will be used to solve similar tasks.
  2. The iterative nature of most algorithms is ignored and a configuration is assumed to work best at each iteration of the algorithm.

To address Shortcoming 1, a combination of algorithm configuration and algorithm selection can be used. First search for well-performing but potentially complementary configurations (which solve different problems best) and then learn a selection mechanism to determine with which configuration to solve a problem. However, even this more general form of optimization (called per-instance algorithm configuration) is not able to address Shortcoming 2. (more…)

Read More

RobustDARTS

By ,

Understanding and Robustifying Differentiable Architecture Search

Optimizing in the search of neural network architectures was initially defined as a discrete problem which intrinsically required to train and evaluate thousands of networks. This of course required huge amount of computational power, which was only possible for few institutions. One-shot neural architecture search (NAS) democratized this tedious search process by training only one large “supergraph” subsuming all possible architectures as its subgraphs. The NAS problem then boils down to finding the optimal path in this big graph, hence reducing the search costs to a small factor of a single function evaluation (one neural network training and evaluation). Notably, Differentiable ARchiTecture Search (DARTS) was widely appreciated in the NAS community due to its conceptual simplicity and effectiveness. In order to relax the discrete space to be continuous, DARTS linearly combines different operation choices to create a mixed operation. Afterwards, one can apply standard gradient descent to optimize in this relaxed architecture space. (more…)

Read More