Our definition of cluster informally draws upon our formal definitions of connected components and cliques.
In an undirected graph G=(V,E) a cluster is a subgraph induced by node set S⊆V (i.e., GS = (S,ES)) with the following two properties:
- The average degree of GS is "relatively high"; (a relaxed adaptation of clique-ness)
- There are "relatively few" edges in E that join a node in S to a node not in S; (a relaxed adaptation of connected component-ness)
Example: In the graph G=(V,E) drawn below, the following subsets of nodes induce subgraphs that can fairly be called clusters:
Subsets of nodes:
| Corresponding clusters: Image may be NSFW. Clik here to view. ![]() |
Not all connected components are clusters. For example:
- In the graph above, the subgraph induced by {9,10,11,12,13} is a connected component, but it is arguably not a cluster, because the average degree of that induced subgraph is relatively low.
- The subgraph induced by {6} is a connected component, but it is definitely not a cluster, because the average degree of the induced subgraph is zero.
Not all cliques are clusters--even relatively large cliques. For example:
- The subgraph induced by {1,2,3,4} is a clique, but it is not a cluster because every single one of those nodes is adjacent to node 5; there are too many edges joining nodes in {1,2,3,4} to nodes not in {1,2,3,4}.
- Even the largest clique in the above graph--the subgraph induced by {1,2,3,4,5}--is arguably still not a cluster because node 7 is adjacent to so many nodes in {1,2,3,4,5}.