Jordan Barrett, Ryan DeWolfe, Bogumił Kamiński, Paweł Prałat, Aaron Smith, François Théberge · 2025
ABCD+o² (Outliers + Overlapping Communities)
Method
Overlapping Communities and the Need for ABCD+o²
In the broader landscape of data science, the concept of overlapping communities — often framed as soft clustering, fuzzy clustering, or mixed-membership modeling — acknowledges that real-world entities rarely fit into rigid, mutually exclusive categories. Instead of forcing a data point into a single bucket, these models allow elements to hold fractional or simultaneous membership across multiple groups. In natural language processing, a single news article might heavily relate to both "Technology" and "Finance" topics; in e-commerce, a customer's behaviour might seamlessly place them in both the "Budget Shopper" and "Outdoor Enthusiast" segments. Recognizing these overlapping affiliations is crucial because forcing hard boundaries strips away nuanced information, leading to overly simplistic models that misrepresent the complex, multidimensional nature of actual data.
When translated to network science, overlapping communities occur when individual nodes act as shared members across multiple tightly knit topological clusters. A classic example is a social network where a single person is simultaneously embedded in their family group, their university alumni circle, and their professional workplace network; or a biological network where a single protein plays a critical role in several distinct metabolic pathways.
It is vital to include these overlapping structures in synthetic graph generators because empirical networks are fundamentally interconnected. If community detection algorithms are exclusively benchmarked against synthetic models that use strict partitions, they are being tested in an artificial vacuum. Incorporating overlap challenges these algorithms to disentangle the dense, intersecting structures they will inevitably encounter in the real world, ensuring they are robust and not just optimized for oversimplified scenarios.
Overlapping clusters.
Role of Geometry in Synthetic Network Models
To generate overlapping communities in a natural way, ABCD+o² incorporates geometry. Integrating geometry into synthetic network models fundamentally shifts how we generate connections, moving from purely topological rules to spatially driven logic. In these geometric (often called "latent space") models, every node is assigned coordinates in some d-dimensional vector space. This space rarely represents physical geography; rather, its dimensions represent the underlying attributes or features of the entities the nodes represent. In a social network model, different dimensions might correspond to the age, political leaning, or interests of the associated users. By giving nodes a specific geometric location, the model quantitatively captures the multidimensional identity of a real-world object.
The primary role of this geometry is to govern edge formation through the principle of distance. In real-world networks, connections are almost never random; they are heavily driven by homophily and aversion — the well-documented tendency for similar entities to interact and dissimilar entities to avoid each other. Geometric models capture this by using spatial proximity as a direct proxy for similarity: the probability of an edge forming between two nodes is a decreasing function of the distance between them in the latent space. The closer they are, the more likely they are to link. This simple, distance-based rule naturally yields realistic network topologies, including dense local clustering and high transitivity (the "friend of a friend is my friend" dynamic) that purely random models often fail to reproduce.
Finally, geometry makes synthetic models much more robust when replicating complex, macroscopic structures like overlapping communities or structural outliers. Instead of relying on rigid, pre-defined lists that force nodes into specific architectural buckets, communities emerge organically as dense "clouds" of points within the latent space. If a node is positioned in the transitional space between two distinct clusters, its geometric proximity naturally dictates that it will form connections with both groups, effortlessly acting as an overlapping member. Conversely, a node placed far from any cluster naturally becomes an outlier. This spatial approach lets synthetic generators build nuanced, interwoven networks that serve as far more rigorous and realistic benchmarks for testing community detection algorithms.
The role of geometry.
Creating Overlapping Communities
The primary challenge in extending the ABCD model to ABCD+o² lies in generating realistic overlapping communities. Specifically, the objective is for non-outlier nodes to belong to an average of η communities, governed by the new parameter η ∈ [1, ∞). To facilitate this structure, community membership is partitioned into primary and secondary roles. Each non-outlier node acts as a primary member of exactly one community, with the capacity to be a secondary member of multiple others. In particular, primary memberships induce a familiar partition of non-outliers.
To construct the d-dimensional latent space for the n nodes, each node v is assigned a vector sampled independently and uniformly at random from the d-dimensional unit ball. To establish the baseline partition of primary memberships, we apply a geometric clustering heuristic aimed at simultaneously maximizing inter-cluster distance and minimizing intra-cluster variance. This partition is constructed iteratively, generating clusters of the targeted sizes one at a time. To seed a new cluster, we isolate the unassigned vector furthest from the origin and sequentially absorb its nearest available neighbours until the desired community size is achieved.
Finally, each primary community is independently expanded into its final, overlapping configuration by absorbing its designated secondary members. To achieve this geometrically, we determine the center of mass (centroid) for each baseline primary community within the latent space. We then radially expand a bounding hypersphere outward from this centroid, progressively encompassing the nearest neighbouring vectors to serve as secondary members until the predefined target size for that community is fulfilled.
Leveraging the modular architecture of the ABCD framework, the subsequent stages of network generation proceed similarly to the original model. First, degrees are randomly assigned to nodes, strictly enforcing the constraint that high-degree nodes cannot be allocated to small communities — a rule that applies uniformly to both primary and secondary memberships. Next, for each node v, an expected fraction ξ of its stubs is designated for the background graph, while the remaining internal stubs are partitioned among the specific communities to which v belongs. Finally, the internal community graphs and the global background graph are independently generated, and a targeted rewiring procedure is applied to resolve any emergent self-loops or multi-edges.
Generating overlapping communities.
References
References
- Barrett, J., DeWolfe, R., Kamiński, B., Prałat, P., Smith, A., and Théberge, F. (2025). The Artificial Benchmark for Community Detection with Outliers and Overlapping Communities (ABCD+o²). arXiv preprint arXiv:2506.05486.
- Kamiński, B., Prałat, P., and Théberge, F. (2021). Artificial Benchmark for Community Detection (ABCD) — Fast Random Graph Model with Community Structure. Network Science, 9(2), 153–178.
- Kamiński, B., Prałat, P., and Théberge, F. (2023). Artificial Benchmark for Community Detection with Outliers (ABCD+o). Applied Network Science, 8(1), 25.
Reference: Paper link
