An information theory-based approach to feature selection - доклад на конференции | ИСТИНА – Интеллектуальная Система Тематического Исследования НАукометрических данных

Автор: Кожевин А.А.
Международная Конференция : 4th International Conference on Stochastic Methods
Даты проведения конференции: 2-9 июня 2019
Дата доклада: 3 июня 2019
Тип доклада: Устный
Докладчик: не указан
Место проведения: пос. «Дивноморское» (г. Новороссийск), Russia
Аннотация доклада:
The talk is devoted to the estimators of conditional entropy [1] and mutual information in the mixed model [2] and to the new result for feature selection procedure based on information theory approach. Mixed model is described in the paper [1]. Let $\zeta_n = \{(X^i, Y^i)\}_{i=1}^n$ be a sample of i.i.d. observations, $(X^1, Y^1) \sim (X, Y)$ where $X = (X_1, \dots, X_d)$ is absolutely continuous random vector in $\mathbb{R}^d$, $Y$ is a random variable taking values in a finite set $M$. The set of indices $S = \{s_1, \dots s_m\} \subset \{1, \dots, d\}$ ($s_i \ne s_j$ for $i \ne j$) and the set of factors $X_{S}$, where $u_L = (u_{l_1}, \dots, u_{l_m})$ for $u = (u_1, \dots, u_d)$ and $L = \{l_1, \dots, l_m \}$, are called relevant if for each $y \in M$ relation $f_{Y|X}(y| X) = f_{Y|X_S}(y | X_S)$ is valid a.s. Let $Q_m = \{\{l_1, \dots, l_m\} \subset \{1, \dots, d\} \colon l_i \ne l_j, i \ne j \}$. For each $L \in Q_m$ define $\zeta_{n, L} = \{(X^i_L, Y^i)\}_{i=1}^n$ and estimate mutual information $I(X_L; Y)$ for each sample $\zeta_{n, L}$ by the method proposed in [2]. The resulting estimates we denote as $\hat{I}_{n, k, L}$ where $k \in \{1, \dots, n-1\}$ is a parameter of the method. Define $\hat{S}_{n,k} = argmax_{L \in Q_m} \hat{I}_{n, k, L}$. In case the maximum $\hat{I}_{n, k, L}$ is reached at several sets $Q_m$, $\hat{S}_{n,k}$ can be defined as the first of such sets in the sense of lexicographical order. The following new result is valid. \textbf{Theorem}. Let $m$ be known, relevant set of factors of length $m$ be unique. Density $f_X$ is strictly positive, for each $L \subset \{1, \dots, d\}$ and $y \in M$ density $f_{X_L, Y}(\cdot, y)$ is $C_0$-constricted ($C_0 > 0$) and for some $\varepsilon > 0$ relation $\mathsf{E}|\log f_{X_L}(X_L)|^{2 + \varepsilon} < \infty$ is valid. Then $\mathsf{P}(\hat{S}_{n,k} = S) \to 1$ when $n \to \infty$ for each $\alpha \in (0,1 )$ and $k \propto n^{\alpha}$ .
Добавил в систему: Кожевин Алексей Александрович

	ИСТИНА	Войти в систему Регистрация
	ИСТИНА ПсковГУ
	Главная Поиск Статистика О проекте Помощь

ИСТИНА

ИСТИНА ПсковГУ

An information theory-based approach to feature selectionдоклад на конференции