# SUBCLU by example
SUBCLU1 is a subspace clustering algorithm based on DBSCAN2 and A-priori3.
Assume the following small exemplary data set with three dimensions. DBSCAN is parameterized with eps=3 and minpts=2 (including the core point).
2 10 2
1 20 1
9 30 9
8 40 9
The algorithm is bottom-up, so let's start with singular dimensions 1 to 3. D(x,y) is the euclidean distance between two points.
Dimension 1 (points: 2, 1, 9, 8): D(2,1) = 1 < eps. Since we have two points, 2 is a core point. Same can be said about 1. So, the first cluster consists of 2 and 1. Same can be applied to 9 and 8. So dimension 1 contains two clusters.
Dimension 2 (points: 10, 20, 30, 40): All points are too far away from each other (distance is at least sqrt(10) between each pair), so no cluster can be found here.
Dimension 3 (points: 2, 1, 9, 9): Analogous to dimension 1. Two clusters.
Next, we generate the k+1 subspace candidates (where k=1, since we just did singular dimensions). Keep A-priori in mind: The k+1 candidates need to have k-1 attributes in common. Not relevant for 2 dimensions, but will be relevant for 3 dimensions later.
So here we have the following combinations of dimensions: 1,2; 1,3; 2,3. We may now prune some of those candidates using the A-priori principle, i.e. if k dimensions of a given candidate do not contain a cluster, that candidate will also not contain a cluster in k+1 dimensions. For proofs, refer to the original paper1. In this case, with k=1, dimension 2 has no clusters and thus both the candidates 1,2 and 2,3 can be pruned.
Next, we apply DBSCAN on subspace 1,3 (points: (2,2), (1,1), (9,9), (8,9)). D((2,2), (1,1)) = D((1,1), (2,2)) = sqrt(2) < eps. That's a cluster with points (2,2) and (1,1) (since we have 2 points and minpts=2). D((9,9), (8,9)) = D((8,9), (9,9)) = 1 < eps. Again, another cluster.
To generate three dimensions, we would need more candidates. For example, with 1,2 and 1,3 one could generate 1,2,3 (since both share the first dimension: 1). However, currently the only 2-dimensional candidate with clusters is 1,3. Thus, we stop here.
Result: The two dimensions 1 and 3 contain two subspace clusters
2 2
1 1
and
9 9
8 9
References: