Optimizing Embedding Linearity for Robust Speaker Diarization in Overlapping Speech Scenarios - доклад на конференции | ИСТИНА – Интеллектуальная Система Тематического Исследования НАукометрических данных

Авторы: Popov Dmitrii, Maysuradze Archil
Международная Конференция : DAMDID/RCDL 2025
Даты проведения конференции: 29-31 октября 2025
Дата доклада: 30 октября 2025
Тип доклада: Устный
Докладчик: Maysuradze Archil
Место проведения: Санкт-Петербург, Россия
Аннотация доклада:
Speaker diarization, the task of segmenting and identifying speakers in audio recordings, is critical for applications like automatic speech recognition, transcription, and analysis of multi-speaker recordings such as meetings and podcasts. Overlapping speech, prevalent in datasets like AMI and CALLHOME, poses a significant challenge, as it complicates accurate speaker segmentation. This work addresses this issue by investigating the linearity of biometric embeddings, a property enabling the representation of overlapping speech as a linear combination of individual speaker embeddings, which is essential for robust diarization, particularly in cascaded schemes with Target-Speaker Voice Activity Detection (TSVAD). We propose a novel fine-tuning method for the ECAPA-TDNN model to enhance embedding linearity, utilizing a synthetic dataset derived from VoxCeleb and a modified loss function combining AAM-Softmax with a linearity term. Integrated into a cascaded TSVAD-based diarization framework, our approach supports both full-context and streaming modes. Experiments on standard benchmarks (AMI, DIHARD, VoxConverse) demonstrate reduced Diarization Error Rate (DER) compared to state-of-the-art methods, highlighting improved handling of overlapping speech. The proposed method bridges a gap in optimizing embedding linearity, offering practical benefits for real-world multi-speaker scenarios.

Доклад на конференции выполнен в рамках проекта (проектов):

Вероятностные модели глубинного обучения, процедуры их настройки и применения при решении прикладных задач анализа данных

Добавил в систему: Околышев Даниил Анатольевич

	ИСТИНА	Войти в систему Регистрация
	ИСТИНА ПсковГУ
	Главная Поиск Статистика О проекте Помощь

ИСТИНА

ИСТИНА ПсковГУ

Optimizing Embedding Linearity for Robust Speaker Diarization in Overlapping Speech Scenariosдоклад на конференции