Аннотация:Data distillation is the problem of reducing the volume of training data while keeping only the necessary information. With this paper, we deeper explore the new data distillation algorithm, previously designed for image data. Our experiments with tabular data show that the model trained on distilled samples can outperform the model trained on the original dataset. One of the problems of the considered algorithm is that produced data has poor generalization on models with different hyperparameters. We show that using multiple architectures during distillation can help overcome this problem.