2020 is over. Time to look back at the amazing major features we introduced to Auto-Sklearn.
By Koen van der Blom, Holger Hoos, Alex Serban, Joost Visser
In our global survey among teams that build ML applications, we found ample room for increased adoption of AutoML techniques. While AutoML is adopted at least partially by more than 70% of teams in research labs and tech companies, for teams in non-tech and government organisations adoption is still below 50%. Full adoption remains limited to under 10% in research and tech, and to less than 5% in non-tech and government. (more…)
Auto-PyTorch is a framework for automated deep learning (AutoDL) that uses BOHB as a backend to optimize the full deep learning pipeline, including data preprocessing, network training techniques and regularization methods. Auto-PyTorch is the successor of AutoNet which was one of the first frameworks to perform this joint optimization. (more…)
The Need for Realistic NAS Benchmarks
Neural Architecture Search (NAS) is a logical next step in representation learning as it removes human bias from architecture design, similar to deep learning removing human bias from feature engineering. As such, NAS has experienced rapid growth in recent years, leading to state-of-the-art performance on many tasks. However, empirical evaluations of NAS methods are still problematic. Different NAS papers often use different training pipelines, different search spaces, do not evaluate other methods under comparable settings or cannot afford enough runs for reporting statistical significance. NAS benchmarks attempt to resolve this issue by providing architecture performances on a full search space using a fixed training pipeline without requiring high computational costs. (more…)
In CMA-ES, the step size controls how fast or slow a population traverses through a search space. Large steps allow you to quickly skip over uninteresting areas (exploration), whereas small steps allow a more focused traversal of interesting areas (exploitation). Handcrafted heuristics usually trade off small and large steps given some measure of progress. Directly learning in which situation which step-size is preferable would allow us to act more flexible than a hand-crafted heuristic could. To learn such dynamic configuration policies, one approach is dynamic algorithm configuration (DAC) through the use of reinforcement learning. (more…)
Since our initial release of auto-sklearn 0.0.1 in May 2016 and the publication of the NeurIPS paper “Efficient and Robust Automated Machine Learning” in 2015, we have spent a lot of time on maintaining, refactoring and improving code, but also on new research. Now, we’re finally ready to share the next version of our flagship AutoML system: Auto-Sklearn 2.0.
This new version is based on our experience from winning the second ChaLearn AutoML challenge@PAKDD’18 (see also the respective chapter in the AutoML book) and integrates improvements we thoroughly studied in our upcoming paper. Here are the main insights:
When designing algorithms we want them to be as flexible as possible such that they can solve as many problems as possible. To solve a specific family of problems well, finding well-performing hyperparameter configurations requires us to either use extensive domain knowledge or resources. The second point is especially true if we want to use algorithms that we did not author ourselves. We most likely know nothing of the internals of the algorithm, so to figure out what makes this black-box tick, far too often, researchers of all flavors fall back to “Graduate Student Descent”.
Automated hyperparameter optimization and algorithm configuration offer far better alternatives, allowing for more structured and less error-prone optimization in potentially infinitely large search spaces. Throughout the search, various useful statistics are logged which can further allow us to gain insights into the algorithm, it’s hyperparameters and tasks the algorithm has to solve. At the end of the search, they present us with a hyperparameter configuration that was found to work well, e.g quickly finds a solution or has the best solution quality.
However the resulting static configurations these approaches are able to find exhibit two shortcomings:
To address Shortcoming 1, a combination of algorithm configuration and algorithm selection can be used. First search for well-performing but potentially complementary configurations (which solve different problems best) and then learn a selection mechanism to determine with which configuration to solve a problem. However, even this more general form of optimization (called per-instance algorithm configuration) is not able to address Shortcoming 2. (more…)
Neural Architecture Search (NAS) is a very hot topic in AutoML these days, and our group is very actively publishing in this area. We have seven NAS papers in 2019, which may make us one of the world’s most active groups in NAS (only closely surpassed by a small company called Google ;-). Here are the papers and quick blog posts about them: (more…)
Understanding and Robustifying Differentiable Architecture Search
Optimizing in the search of neural network architectures was initially defined as a discrete problem which intrinsically required to train and evaluate thousands of networks. This of course required huge amount of computational power, which was only possible for few institutions. One-shot neural architecture search (NAS) democratized this tedious search process by training only one large “supergraph” subsuming all possible architectures as its subgraphs. The NAS problem then boils down to finding the optimal path in this big graph, hence reducing the search costs to a small factor of a single function evaluation (one neural network training and evaluation). Notably, Differentiable ARchiTecture Search (DARTS) was widely appreciated in the NAS community due to its conceptual simplicity and effectiveness. In order to relax the discrete space to be continuous, DARTS linearly combines different operation choices to create a mixed operation. Afterwards, one can apply standard gradient descent to optimize in this relaxed architecture space. (more…)