Distillation for adaptation language models to Russian language

Kovalev, G.K.; Tikhomirov, M.M.

Авторы: Kovalev G.K., Tikhomirov M.M.
Журнал: Supercomputing Frontiers and Innovations
Том: 12
Номер: 4
Год издания: 2025
Первая страница: 5
Последняя страница: 15
Аннотация: Adapting large language models (LLMs) to morphologically rich languages like Russian presents a major challenge, as multilingual models often exhibit limited transfer due to predominantly English-centric pre-training. This study investigates knowledge distillation (KD) as a more effective alternative to supervised fine-tuning (SFT) for the final calibration stage of language adaptation. We introduce an efficient offline top-K distillation approach that transfers knowledge from a 32B Russian-adapted teacher model to a 4B student model through tokenizer alignment and direct logit transfer. Experimental results demonstrate that KD consistently surpasses SFT, achieving up to a 4.22% performance improvement, with top-100 distillation yielding the highest gains (3.27% on average) albeit with increased memory consumption (62 GB vs. 7 GB for top-10).Moreover, the advantages of KD are most pronounced for student models with lower adaptive capacity (i.e., smaller LoRA α values). These findings underscore the efficacy of KD as a practical and scalable approach for language adaptation, while emphasizing the necessity of balancing performance improvements against computational efficiency.
Добавил в систему: Степаненко Виктор Михайлович

	ИСТИНА	Войти в систему Регистрация
	ИСТИНА ПсковГУ
	Главная Поиск Статистика О проекте Помощь

ИСТИНА

ИСТИНА ПсковГУ

Distillation for adaptation language models to Russian languageстатья