DBSCAN (Density-Based Spatial Clustering of Applications with Noise)

October 5, 2023 less than 1 minute read

밀도 기반 군집화 알고리즘으로, 밀도가 높은 영역에서 군집을 형성하고, 밀도가 낮은 영역은 노이즈로 간주하는 방식

밀도 기반의 클러스터링 알고리즘으로, 특정 공간 내 데이터 포인트의 밀도를 기반으로 클러스터를 형성하며, 사용자가 지정한 반경 ε 내에 충분한 수(일반적으로 사용자가 지정한 최소 포인트 수 MinPts)의 이웃 포인트가 있으면 하나의 클러스터를 형성하거나 기존 클러스터를 확장하고, 이 과정을 통해 임의의 형상의 클러스터를 찾을 수 있으며, 이상치를 구분할 수 있다.

DBSCAN

Density 가 높은 point 를 center 로 두고, 이 point 를 중심으로 clustering 하는 method.
A any standard point 반경 $\epsilon$ 내에 샘플이 min-points 보다 많으면, 같은 cluster 로 assignment.

figure 1

e.g., Set min-points = 3 and number of samples are over than,

Cluster 로 할당 된 샘플들을 해당 cluster’ core-point 로 setting 하여 repeat

Min-points 갯수를 dissatisfaction 하는 border-point sample (If a sample is assigned to cluster but, can’t be core-point) 가 생성될 경우 brake.

Figure 2

모든 Data sample 에 대해 계산하며, Cluster point 와 Noise point를 구분.

Figure 3

Figure 4

Advantage

Variable 한 shape 의 cluster class 를 classification 가능
Noise point (아웃 라이어) 를 찾아낼 수 있다.

Disadvantage

Cluster 의 갯수 설정에서는 자유롭지만, Necessary to set $\epsilon$ and min-points,
Calculate cost 가 높아서, It takes a long time.

Leave a comment

You may also enjoy

Partial Derivative

July 19, 2025 less than 1 minute read

A partial derivative measures how a function changes when only one variable changes and the rest stay fixed.

Derivative

July 19, 2025 1 minute read

Derivative, Differentiate

Mathematics GuideMap

July 18, 2025 less than 1 minute read

Learning Rate

July 17, 2025 1 minute read

Learning Rate