Review of the Year 2023 – AutoML Hannover

by the AutoML Hannover Team

The year 2023 was the most successful for us as a (still relatively young) AutoML group in Hannover. With the start of several big projects, including the ERC starting grant on interactive and explainable AutoML and a BMUV-funded project on Green AutoML, the group has grown and we were able to publish our first papers on these topics. Furthermore, we structured the group into sub-topics to allow a stronger focus on essential topics: core AutoML, human-centered AutoML, green AutoML and AutoRL.

Core AutoML

Hyperparameter Optimization: Foundations, Algorithms, Best Practices and Open Challenges

Hyperparameter optimization is one of the core fields of AutoML. It is the basic first step for getting better performance for an existing ML model or pipeline. In our survey, we discuss (nearly) the full variety of existing HPO methods, from simple random search to state-of-the-art techniques. It is meant as an introduction to the field that is easy to digest and nevertheless, provides all the guidance that is necessary to understand the state of the art. 

[Paper | Preprint]

Self-Adjusting Weighted Expected Improvement for Bayesian Optimization

Bayesian optimization (BO) encompasses a class of surrogate-based, sample-efficient algorithms for optimizing black-box problems with small evaluation budgets. However, BO itself has numerous design decisions regarding the initial design, the surrogate model and the acquisition function (AF). Depending on the problem at hand, a different configuration is required, directly impacting the general robustness and sample-efficiency. In this work we propose SAWEI, Self-Adjusting Weighted Expected Improvement, to adjust the exploration-exploitation trade-off online. SAWEI works out-of-the-box, adapts to any problem landscape and achieves a better sample-efficiency.

[Paper | GitHub | Blogpost]

AutoML in Heavily Constrained Applications

Tailoring an ML pipeline to a specific task requires meticulous configuration of hyperparameters and other design decisions, typically facilitated by an AutoML system. However, the effectiveness of such systems can vary significantly based on their own second-order meta-configuration (i.e., the hyperparameters of the AutoML system itself), and they often lack the ability to automatically adapt to diverse use cases. Additionally, incorporating user-defined constraints on pipeline efficiency poses a challenge for current AutoML systems. We introduce CAML, a cutting-edge solution that leverages meta-learning to autonomously adjust its own AutoML parameters, including search strategy, validation strategy, and search space, for a given task. Unlike existing systems, CAML dynamically adapts to user-defined constraints, ensuring the generation of pipelines that not only satisfy specific criteria but also deliver high predictive performance. CAML is a novel AutoML strategy, highlighting its potential to address the limitations of current systems and providing an adaptive, user-friendly approach to hyperparameter optimization.

[Paper | Preprint | GitHub]

Application of machine learning for fleet-based condition monitoring of ball screw drives in machine tools

Ball screws are frequently used as drive elements in the feed axes of machine tools. Failing ball screws cause high downtimes and costs for manufacturing companies. Data-based monitoring approaches derive the ball screw condition based on sensor data in cases where no knowledge is available to derive a physical model-based approach. A necessity for condition assessment is the availability of fault data. Previously, fault patterns often were artificially created, however, our dataset originates from a machine tool fleet used in series production in the automotive industry collected over 8 months. We present ball screw drive monitoring approaches for machine tool fleets based on machine learning for two scenarios: First, supervised anomaly detection with AutoML and second, semi-supervised anomaly detection using outlier scores. Our solutions for both scenarios outperform methods commonly used in industry.

[Paper | Builds on Auto-Sklearn]

Automated Machine Learning for Remaining Useful Life Predictions

Predicting the remaining useful life (RUL) of systems is a crucial task in engineering, prognostics and health management. Traditional model-based approaches are giving way to data-driven methods due to their effectiveness, reducing the need for deep knowledge of the system’s underlying physics. However, this shift often requires expertise in machine learning (ML), posing a challenge for domain experts lacking ML proficiency. AutoML enables a novel approach that automates the entire ML pipeline, enabling even those without deep ML expertise to effortlessly create their own predictive models. For RUL, we introduce AutoRUL, an AutoML-driven approach specifically designed for automatic RUL predictions. This innovative method combines fine-tuned standard regression techniques into a potent ensemble with high predictive power. We put AutoRUL to the test on eight diverse datasets, both real-world and synthetic, against state-of-the-art hand-crafted models. The results are compelling, showcasing AutoML as a viable alternative for RUL predictions, eliminating the need for intricate ML know-how in constructing data-driven models.

[Paper | GitHub]

AutoML in the Age of Large Language Models: Current Challenges, Future Opportunities and Risks

The fields of both Natural Language Processing (NLP) and Automated Machine Learning (AutoML) have achieved remarkable results over the past years. In NLP, especially Large Language Models (LLMs) have experienced a rapid series of breakthroughs very recently. We envision that the two fields can radically push the boundaries of each other through tight integration. To showcase this vision, we explore the potential of a symbiotic relationship between AutoML and LLMs, shedding light on how they can benefit each other. In particular, we investigate both the opportunities to enhance AutoML approaches with LLMs from different perspectives and the challenges of leveraging AutoML to further improve LLMs. To this end, we survey existing work, and we critically assess risks. We strongly believe that the integration of the two fields has the potential to disrupt both fields, NLP and AutoML. By highlighting conceivable synergies but also risks, we aim to foster further exploration at the intersection of AutoML and LLMs


Human-centered AutoML

PriorBand: Practical Hyperparameter Optimization in the Age of Deep Learning

The optimal configuration of hyperparameters in Deep Learning (DL) pipelines is essential for achieving high performance. While a large number of methods for Hyperparameter Optimization (HPO) has been developed, their associated costs are often impractical for modern DL practices, pushing users towards manual hyperparameter tuning. With PriorBand we addressed this misalignment between HPO algorithms and DL researchers, ensuring that the approach fulfills all the key necessities for HPO methods suitable for current DL: the need for strong performance under low compute budgets, integration of cheap proxy tasks, consideration of expert beliefs, handling mixed search spaces, simplicity of implementation, and scalability to parallelism. 

[Preprint | GitHub]

Symbolic Explanations for Hyperparameter Optimization

The vast majority of existing HPO tools work in a black-box fashion without offering any form of explanation of the optimization process and the returned configuration, lacking insights and transparency. To alleviate this situation and move towards a more human-centered and interpretable HPO process, we proposed to make use of symbolic explanations: by applying genetic programming based symbolic regression to the meta-data collected during HPO, this approach allows to obtain simple and interpretable explanations in the form of closed-form analytic expressions. This way, they provide insights into how the hyperparameter configuration influences the model performance. We believe that this is in particular very valuable for ML researchers in the need of understanding the complex behavior of their models.

[Paper | GitHub | Blogpost]

Green AutoML

Learning Activation Functions for Sparse Neural Networks

Sparse Neural Networks (SNNs) are an efficient way to deploy large networks on edge applications. Unfortunately, they suffer from a decrease in accuracy, and minimizing this accuracy drop is a significant challenge. In this work, we use the AutoML perspective to tackle this head-on by focusing on the often-overlooked aspects of Sparse Networks – hyperparameters and activation functions. We discovered that the reliance on ReLU as a universal activation function and the fine-tuning of SNNs using hyperparameters designed for dense networks contribute significantly to this accuracy drop. To address these issues, we introduced an innovative approach of tuning activation functions specifically for sparse networks, combined with a separate hyperparameter optimization regime. Our experiments with popular DNN models like LeNet-5, VGG-16, ResNet-18, and EfficientNet-B0, using datasets such as MNIST, CIFAR-10, and ImageNet-16, demonstrated remarkable improvements in accuracy. These findings underscore the importance of bespoke activation functions and hyperparameter settings for optimizing SNNs, marking a significant step forward in neural network efficiency.

[Paper | GitHub | Blogpost]

Green-AutoML for Plastic Litter Detection

In tackling the pressing issue of ocean plastic pollution, we’ve harnessed the potential of Green-AutoML to enhance a plastic detection system. Using a pre-existing plastic waste dataset and system as our starting point, we trained five standard neural architectures for image classification. What sets this research apart is the comparison of their performance and carbon footprints to the well-known Efficient Neural Architecture Search, a key player in AutoML. The results are very promising. Our Green-AutoML approach surpasses the original plastic detection system by 1.1% in accuracy, all while requiring 33 times fewer floating point operations at inference. Notably, the carbon footprint is reduced to just 29% of the best-known baseline, illustrating the potential of AutoML in addressing climate-change-related challenges.



Hyperparameters in Reinforcement Learning and How to Tune Them

In order to improve reproducibility, deep reinforcement learning has been adopting better scientific practices such as standardized evaluation metrics and reporting. However, the process of hyperparameter optimization still varies widely across papers, which makes it challenging to compare RL algorithms fairly. Here we show that hyperparameter choices in RL can significantly affect the agent’s final performance and sample efficiency, and that the hyperparameter landscape can strongly depend on the tuning seed which may lead to overfitting. We therefore compare multiple state-of-the-art HPO tools on a range of RL algorithms and environments to their hand-tuned counterparts, demonstrating that HPO approaches often have higher performance and lower compute overhead. As a result of our findings, we recommend a set of best practices for the RL community, which should result in stronger empirical results with fewer computational costs, better reproducibility, and thus faster progress.

[Paper | GitHub | Blogpost]

AutoRL Hyperparameter Landscapes

We delve into the complexities of Automated Reinforcement Learning (AutoRL) and its dependency on hyperparameters. Our primary focus is on the dynamic nature of hyperparameter landscapes and their evolution over time. Using return landscapes, we rigorously analyze various RL algorithms, including DQN, PPO, and SAC, across diverse environments such as Cartpole, Bipedal Walker, and Hopper. Our findings reveal a pivotal insight: the necessity of dynamically adjusting hyperparameters during training. This approach is not merely theoretical; we provide robust empirical evidence supporting the dynamic adjustment of hyperparameters. Such an understanding is crucial for advancing AutoRL methodologies, as it opens up possibilities for more refined and efficient learning strategies, and potentially building better optimizers for Reinforcement Learning.

[Paper | GitHub | Blogpost]

Contextualize Me — The Case for Context in Reinforcement Learning

While Reinforcement Learning (RL) has achieved remarkable progress, it still struggles with adaptability to subtle environmental changes. Contextual Reinforcement Learning (cRL) is an important framework designed to model these changes systematically. This approach not only allows for precise task specification but also enhances interpretability, offering a promising avenue for addressing challenges in RL. Our focus is on demonstrating how the cRL framework contributes to the enhancement of zero-shot generalization in RL. Through rigorous benchmarks and structured reasoning on generalization tasks, we showcase the pivotal role of context information in achieving optimal behavior within cRL. To empirically validate these insights, we introduce various context-extended versions of common RL environments as part of the innovative benchmark library, CARL. CARL, serving as a testbed, features cRL extensions of popular benchmarks, opening doors for further exploration of general agents.

[Paper | GitHub | Blogpost]

A Patterns Framework for Incorporating Structure in Deep Reinforcement Learning

Applying Reinforcement Learning (RL) to real-world problems is challenging because of factors such as sparse rewards, complex dynamics, noise, and large state and action spaces. Commonly, RL pipelines grapple with issues like Data Inefficiency and Poor Generalization, often resulting in models that are uninterpretable and potentially unsafe. In this work, we present an innovative framework for categorizing RL research through the lens of design patterns, which have been wildly successful in Software Engineering. Using this framework, we survey the RL literature to categorize methods based on the type of structural information they incorporate and their benefits. Our survey not only illuminates what has already been done but also highlights the areas of research that might be immediately lucrative. By adopting the design pattern approach, we lay the groundwork for future research in RL algorithms that are potentially learnable from data, enhancing their capability to adeptly handle the subtle and unpredictable nuances of real-world environments.


POLTER: Policy Trajectory Ensemble Regularization for Unsupervised Reinforcement Learning

The goal of Unsupervised Reinforcement Learning (URL) is to find a reward-agnostic prior policy on a task domain, such that the sample-efficiency on supervised downstream tasks is improved. It is still an open question how an optimal pretrained prior policy can be achieved in practice. In this work, we present POLTER (Policy Trajectory Ensemble Regularization) – a general method to regularize the pretraining that can be applied to any URL algorithm and is especially useful on data- and knowledge-based URL algorithms. It utilizes an ensemble of policies that are discovered during pretraining and moves the policy of the URL algorithm closer to its optimal prior. Under a fair comparison with tuned baselines and tuned POLTER, we establish a new state-of-the-art for model-free methods on the Unsupervised Reinforcement Learning Benchmark (URLB).

[Paper | Code (Download)]