Using synthetic data has become an increasingly popular approach in machine learning, particularly in areas such as supervised learning and reinforcement learning. Synthetic data refers to artificially generated data that mimics the characteristics of real data. It offers several advantages, including the ability to generate large amounts of labeled data and the ability to simulate different scenarios for training and evaluation.
So far, at AutoML.org, we use synthetic data in mostly two active areas of research to train neural networks: PFNs and learning synthetic RL environments. In the former, we use synthetic data generating processes to (meta-)pretrain PFNs. In the latter, we learn the synthetic data (or RL environments) itself through meta-learning to efficiently train RL agents. For more details, see the below blog posts: