t-SNE (t-Distributed Stochastic Neighbor Embedding)

October 5, 2023 1 minute read

고차원 데이터를 저차원 공간으로 변환하여 데이터의 국소적인 구조와 유사도를 유지하면서 시각화를 가능하게 하는 차원 축소 기법

이 알고리즘은 유사한 데이터 포인트들이 저차원에서도 가까이 위치하도록 하며, 원본 고차원 공간의 데이터 포인트 간 유사도를 t-분포를 사용하여 저차원 공간에 재현함으로써, 데이터의 클러스터 구조를 시각적으로 해석하기 용이하게 만든다.

SNE (Stochastic Neighbor Embedding)

At present dimension state S, 에서의 데이터에 대한 distance 정보를, dimension reduction 된 state S’ 에서도 유지시키는 목적

Data Sample $x_i$ 를 기준으로 다른 샘플들에 대해 Gaussian distribution 를 활용해 weight (distance) 부여

Each samples 마다 다른 sample 과의 distance 에 대한 distribution 은 매우 다르다. ($\sigma_i$ 사용)

원형 데이터에서의 distance 와 reducted dimension 에서의 distance 를, 비슷하게 만들어 주는 define the loss function.

Distance of between distribution’s 를 quantization (정량화) 하는 KL divergence (kullback leibler divergence) 를 사용한다.

Gradient descent method 를 이용해 $y_i$ 를 이동

두 sample 간 weight 여도, 어떤 sample 이 standard sample 인 지에 따라 차이가 있다. ($\sigma$ 가 다르기 때문에)

Weight with symmetric 을 갖기 위해, 다음과 같이 정의한다.

Gaussian distribution 은 change to likelihood value 의 gradiant 가 steep(급경사) 한 특성이 있다.

therefore, 일정 거리 이상으로 부터는 value 가 매우 작아지는 have crowding problem

먼 거리의 샘플에 대한 weight value 가 have a problem with lose to object (목적, 의미).

t-SNE is exchange Gaussian distribution to t distribution, and using them.

Normal distribution(정규분포) 으로부터 define 되었기 때문에, 이와 매우 유사한 형태의 분포를 가지고 있다.

Number of sample 이 적고, standard deviation(표준편차) of population(모집단) 를 알 수 없을 때, average of population 를 assumption 위해 사용한다.

Degree of freedom (자유도) $(n-1)$ 에 따라 shape 가 달라지지만, 적은 degree of freedom 에서는 normal distribution 보다 tail shape 가 더 thickly 하다.

Reducted dimension 에서의 weight 를 degree of freedom, 1’ t distribution 를 사용

t-SNE learning 에서는 hyper-parameter of perplexity 가 필요하다.

설정된 perplexity 를 satisfy 하는 $\sigma_i$ 를 찾아서 algorithm 에 사용한다.

$\textrm{Perplexity} = 2^{\textrm{entropy}}=2^{\sum-p_{j|i}\log p_{j|i}}$

Entropy 는 클수록 increase to uncertainty (uniform distribution), 작을수록 decrease to uncertainty (one-hot distribution)

If set the perplexity to large value, 모든 sample 에 대한 weight 가 비슷해지도록 큰 값의 $\sigma_i$ 를 사용한다.

큰 값의 perplexity 는 dimension reduction 에 관하여 모든 샘플에 대해 increase to influence

In offsite case, nearly sample 에 대한 영향력이 매우 커진다.

In normally, It set to 5 to 50.

Screenshot_2023-03-15_at_12 21 31_PM