With machine learning become ever more prevalent in society, it’s important that the methods we use are fair, non-discriminatory and that practitioners are aware of the biases that are present in both their data and the methods they use. We highlight the word fair as there is no single notion of fairness that we can optimize in machine learning that captures all the subtleties that comes with how the actual output of a machine learning model is used. So what can AutoML do? We wrote a blog post about this very topic at the intersection of AutoML and fairness, for which you can find the full paper here.
To highlight, what AutoML can do is be fairness-aware, enabling experts, who really know what concepts of fairness they care about to be in charge. AutoML systems take the human out-of-the-loop but there’s an increasing precedent for putting the human back in-the-loop, putting them in the drivers seat, so to speak. How can AutoML do this?
Multi-Objective and Constrained Optimization
AutoML provides the tools for optimizing for several metrics, such as some computable fairness metric and accuracy. Such metrics are often juxtaposed and there is a trade-off to consider. By using constrained optimization, AutoML finds performant models that hit a minimum threshold of what is deemed a good fairness metric, but it does not allow the domain expert to really understand this trade-off. AutoML tools can provide what is called a Pareto Front, a set of possible models that capture the trade-off inherent to these metrics, allowing the domain-expert to carefully consider what trade-off is suitable for their use case. This puts the domain expert in the driver seat as only they can know the downstream implications of their model, not AutoML.
Fair Architectures for Deep Learning
Deep learning is applied to a wide variety of socially-consequential domains, e.g., credit scoring, fraud detection, hiring decisions, criminal recidivism, loan repayment, and face recognition, with many of these applications impacting the lives of people more than ever — often in biased ways. Dozens of formal definitions of fairness have been proposed, and many algorithmic techniques have been developed for debiasing according to these definitions. Many debiasing algorithms fit into one of three categories: pre-processing, in-processing, or post-processing. Most of these bias mitigation strategies start by selecting a network architecture and set of hyperparameters which are optimal in terms of accuracy and then apply a mitigation strategy to reduce bias while minimally impacting accuracy. Contrary to this our paper posits that architectures and hyperparameters of the selected model are much more important and impactful than the mitigation strategy itself. Thorough empirical investigation on face-recognition datasets establishes this claim.
Towards Fairer Face Recognition
Conventional belief in the fairness community is that one should first find the highest performing model for a given problem and then apply a bias mitigation strategy. One starts with an existing model architecture and hyperparameters, and then adjusts model weights, learning procedures, or input data to make the model fairer using a pre-, post-, or in-processing bias mitigation technique. In our paper we observe that finding an architecture that is more fair offers significant gains compared to conventional bias mitigation strategies in the domain of face recognition, a task that is notoriously challenging to de-bias. Our paper firstly conducts a large-scale analysis of the fairness accuracy trade-off of different handcrafted architectures ranging from convolutional networks to transformers. We further apply black-box NAS+HPO to automatically discover architectures that are fairer than the handcrafted ones while being highly accurate on large-scale datasets like CelebA and VGGFace2. The overview of our analysis is presented below:

As observed below joint NAS+HPO does indeed find architectures which are robust while being fairer and more accurate on the CelebA dataset. This proves the immense promise that NAS+HPO offers for fairness in deep learning applications.

Furthermore the architectures discovered are transferable to other face-recognition datasets with same and different protected attributes (eg: race in RFW) with no further finetuning, while maintaining significant improvements over handcrafted architectures and hyperparameter choices.

We expect the future work in this direction to focus on studying different multi-objective algorithms and NAS techniques to search for inherently fairer models more efficiently . Further, it would be interesting to study how the properties of the architectures discovered translate across different demographics and populations. Another potential direction of future is including priors and beliefs about fairness in the society from experts to further improve and aid NAS+HPO methods for fairness by integrating expert knowledge.