Controlled Generation of Synthetic Data for Tasks of Diarizationстатья
Статья опубликована в журнале из списка RSCI Web of Science
Информация о цитировании статьи получена из
Scopus
Статья опубликована в журнале из перечня ВАК
Статья опубликована в журнале из списка Web of Science и/или Scopus
Дата последнего поиска статьи во внешних источниках: 23 января 2026 г.
Аннотация:This article presents a methodological approach to generating synthetic data for speech processing tasks, including diarization, detection of overlapping speech, and detection of speaker changes and voice activity. Given the limited and high cost of real data labeling, synthetic data represents a promising solution, allowing reproducible simulation of real acoustic scenarios. The generation method involves creating audio tracks that simulate complex speech situations with the ability to flexibly adjust parameters such as noise level, pauses, overlapping rate, and acoustic environment variability. The benchmark presented includes high-quality annotated data for training and testing models, as well as a programming interface for evaluation on popular metrics. The experimental results confirmed the effectiveness of synthetic data: models trained on the proposed benchmark showed performance comparable to or superior to the results of modern methods on real data. The comparison was carried out on well-known corpora (AMI, DIHARD, VoxConverse, CallHome), where the models showed a reduction in errors in the tasks of diarization and segmentation of audio sequences. The introduction of the proposed approach allows to accelerate the development of speech processing systems while maintaining high accuracy of results and reduces dependence on expensive labeled datasets. In addition to the generation method, the work includes the publication of an open synthetic benchmark, giving the research community a tool for further standardization and development of the field.