DBSCAN Clustering Algorithm in R

DBSCAN ( Density-based spatial clustering of application with noise ) is an unsupervised algorithm which is used to identify clusters of any shape in a data set containing noise and outliers.

The DBSCAN algorithm is based on this intuitive notion of “clusters” and “noise”. It groups together point that are close to each other based on eps(min. distance between two points) and a minimum number of points(minPoints).

Parameters

eps: the minimum distance between two points. It means that if the distance between two points is lower or equal to this value (eps), these points are considered neighbors.

minPoints:  the minimum number of points required to form a cluster.

Implementation

Parameter Estimation
1. Determine minPoints: Generally, a minPoints can be derived from a number of dimensions(D) in a dataset, as minPoints>=D+1. minPoints value should be minimum 3 and larger dependending on the dataset choosed.

2. Determining optimum ‘eps’ value: To determine the optimum eps value we used K-distance plot method, a knee corresponds to a threshold where a sharp change occurs along the k-distance curve.
Function used: knndistplot()

Package used

install.packages("dbscan")

For K-distance Plot

library(dbscan)
iris_mat = as.matrix(iris[,-5])
kNNdisplot(iris_mat,k=4)
abline(h=0.4,col='red')

K-Distance Plot

Apply DBSCAN and plot clusters

db = dbscan(iris_mat,0.4,4)
hullplot(iris_mat,db$cluster)

HULL PLOT

Advantages of DBSCAN algorithm
1. It can discover any number of clusters.
2. Clusters of varying shapes and sizes can be obtained using the DBSCAN algorithm.
3. It can detect and ignore outliers.

Leave a Reply

Fill in your details below or click an icon to log in:

WordPress.com Logo

You are commenting using your WordPress.com account. Log Out /  Change )

Google photo

You are commenting using your Google account. Log Out /  Change )

Twitter picture

You are commenting using your Twitter account. Log Out /  Change )

Facebook photo

You are commenting using your Facebook account. Log Out /  Change )

Connecting to %s

This site uses Akismet to reduce spam. Learn how your comment data is processed.

Powered by WordPress.com.

Up ↑

%d bloggers like this: