Evaluating Three Corpus-based Semantic Similarity Systems for Russian - доклад на конференции | ИСТИНА – Интеллектуальная Система Тематического Исследования НАукометрических данных

Авторы: Lesota O.O., Arefyev N.V.
Международная Конференция : Международная конференция по компьютерной лингвистике и интеллектуальным технологиям Диалог'2015
Даты проведения конференции: 27-30 мая 2015
Дата доклада: 28 мая 2015
Тип доклада: Пленарный
Докладчик: Arefyev N.V.
Место проведения: Москва, РГГУ, Russia
Аннотация доклада:
This paper reports results of our participation in the first shared task on Russian Semantic Similarity Evaluation (RUSSE). We compare three corpus-based systems that measure semantic similarity between words. The first one uses lexico-syntactic patterns to retrieve sentences indicating a particular semantic relation between words. The second one builds traditional context window approach on the top of Google N-Grams data to take advantage of the huge corpora it was collected on. The third system uses word2vec trained on a huge lib.rus.ec book collection. word2vec is one of the state-of-the-art methods for English. Our initial experiments showed that it yields the best results for Russian as well, comparing to other two systems considered in this paper. Therefore, we focus on study of word2vec meta-parameters and investigate how the training corpus affects quality of produced word vectors. Finally, we propose a simple but useful technique for dealing with out-of-vocabulary words.
Добавил в систему: Арефьев Николай Викторович

	ИСТИНА	Войти в систему Регистрация
	ИСТИНА ПсковГУ
	Главная Поиск Статистика О проекте Помощь

ИСТИНА

ИСТИНА ПсковГУ

Evaluating Three Corpus-based Semantic Similarity Systems for Russianдоклад на конференции