Quantcast
Viewing all articles
Browse latest Browse all 9

Defining clusters, part three

Our definition of cluster informally draws upon our formal definitions of connected components and cliques.

In an undirected graph G=(V,E) a cluster is a subgraph induced by node set S⊆V (i.e., GS = (S,ES)) with the following two properties:

  1. The average degree of GS is "relatively high"; (a relaxed adaptation of clique-ness)
  2. There are "relatively few" edges in E that join a node in S to a node not in S; (a relaxed adaptation of connected component-ness)

Example: In the graph G=(V,E) drawn below, the following subsets of nodes induce subgraphs that can fairly be called clusters:

Subsets of nodes:
  • {1,2,3,4,5,7}
  • {20,21,22,23,24,25}
  • {14,15,16}
Corresponding clusters: 
Image may be NSFW.
Clik here to view.
clusters

Not all connected components are clusters. For example:

  • In the graph above, the subgraph induced by {9,10,11,12,13} is a connected component, but it is arguably not a cluster, because the average degree of that induced subgraph is relatively low.
  • The subgraph induced by {6} is a connected component, but it is definitely not a cluster, because the average degree of the induced subgraph is zero.

Not all cliques are clusters--even relatively large cliques. For example:

  • The subgraph induced by {1,2,3,4} is a clique, but it is not a cluster because every single one of those nodes is adjacent to node 5; there are too many edges joining nodes in {1,2,3,4} to nodes not in {1,2,3,4}.
  • Even the largest clique in the above graph--the subgraph induced by {1,2,3,4,5}--is arguably still not a cluster because node 7 is adjacent to so many nodes in {1,2,3,4,5}.

Viewing all articles
Browse latest Browse all 9

Trending Articles