Kamiński, B.; Prałat, P.; Théberge, F. · 2023

Hypergraph Artificial Benchmark for Community Detection (h-ABCD)

Method

Higher-Order Structures and the Need for h-ABCD

While the original ABCD model and its variations focus on pairwise interactions between nodes, many real-world systems are characterized by group interactions that involve multiple entities simultaneously. This necessitates the use of higher-order structures, specifically hypergraphs, where a single edge (a hyperedge) can connect any number of nodes rather than just two.

Hypergraphs offer a more natural and accurate representation for many complex systems because they explicitly capture multi-node relationships without the information loss that occurs when groups are projected onto binary edges. In metabolic networks, a chemical reaction often involves a specific set of multiple enzymes and metabolites that must be treated as a single unit. In communication networks, a single email thread or group chat connects several individuals simultaneously, which is better represented as a hyperedge than a collection of separate pairs. Other examples include recommendation systems, where a single transaction involves a "basket" of multiple items, and social networks, where hyperedges can represent specific committees, clubs, or social gatherings. By preserving these higher-order structures, researchers can better investigate how community boundaries overlap and how the formation of group interactions is influenced by existing network patterns.

Beyond hypergraphs, the next frontier in network modelling involves capturing the internal structure and functional roles within group interactions. While a hyperedge identifies who is involved, it often ignores how they interact. An email is not just a set of participants; it is an interaction where the roles of a sender, CC, and BCC recipients carry distinct informational weights. Similarly, a discussion forum thread is more than a gathering of users; it is a structured tree where the hierarchy of replies determines the flow of communication. The ultimate goal is therefore to move toward even more general frameworks such as directed hypergraphs or attributed simplicial complexes that can model these internal connections.

Higher-order structures.

Higher-order structures.


Adjusting the Original Model

The h-ABCD model extends the ABCD framework into the realm of higher-order structures, transitioning from traditional graphs to hypergraphs where interactions involve arbitrary numbers of nodes. Many of the fundamental building blocks — such as the generation of power-law degree sequences and sequences of community sizes — remain consistent with the original model. Specifically, if k_i is the degree assigned to node i, each node is initially viewed as having k_i "stubs" (half-edges) available for connection. While the original ABCD model uses these stubs to form pairwise edges via a standard configuration model, h-ABCD employs a generalized configuration model. Here, a "bucket" containing the total volume of stubs (Σ k_i) is used to create hyperedges of various sizes.

In a standard ABCD graph, every edge consists of exactly two nodes. In contrast, h-ABCD accommodates hyperedges of varying sizes, governed by a distribution vector q = (q_1, ..., q_L), where Σ q_i = 1. The value q_d represents the fraction of the total volume (total number of stubs) devoted to forming hyperedges of size d. While the model allows for hyperedges of size 1, these carry no information regarding community structure; consequently, they are generated at the outset of the process to isolate them from the multi-node interactions.

As in the original ABCD model, hyperedges of size d ≥ 2 are divided into two categories: community hyperedges and background hyperedges. The balance between these types is determined by the mixing parameter ξ ∈ [0, 1]. More precisely, ξ defines the expected fraction of stubs (excluding those already allocated to hyperedges of size 1) that will be used to form background hyperedges. The remaining 1-ξ fraction is used to form community-specific hyperedges, where a majority of the nodes belong to one of the ground-truth communities.

Background hyperedges are generated independently of the community structure; their members are effectively "sprinkled" across the entire hypergraph without paying attention to node affiliations. In contrast, each community hyperedge is explicitly associated with a single community, under the constraint that a majority (more than 50%) of its nodes belong to that community. A community hyperedge is said to be of type (c, d), where d/2 < c ≤ d, if it has total size d and contains exactly c nodes from its assigned community. The distribution of these edges is governed by parameters w_{c,d}, which specify the number of community hyperedges of each type (c, d) produced by the model. While the framework can accommodate any user-defined family of parameters, h-ABCD provides three default profiles — majority, linear, and strict — to represent varying levels of community homogeneity.


While the underlying generation process involves various technical nuances, the high-level logic remains intuitive. The process begins by determining the specific number of community hyperedges of each type (c, d) required for every community. Following this, the total set of available stubs is partitioned into three functional groups:

  • 𝓑 — stubs reserved for the creation of background hyperedges.
  • 𝓒ⱼ — stubs associated with nodes in the j-th community; these fulfill the c primary memberships for hyperedges assigned to that community.
  • 𝓒 — a general pool of the remaining stubs, used to fill the d − c minority memberships in community hyperedges.

To construct a community hyperedge of type (c, d) for community j, the algorithm randomly selects c stubs from 𝓒ⱼ and the remaining d − c stubs from the pool 𝓒 (each time sampling without replacement). Once all community-specific hyperedges have been formed, the background hyperedges are generated by randomly partitioning the stubs in 𝓑 into sets that match the required background hyperedge sizes.

To maintain the goal of producing simple hypergraphs, the model must address any duplicate hyperedges or multi-sets that may be generated. This is handled via a rewiring procedure analogous to the one used in the original ABCD model. Any hyperedges that violate the simplicity constraint are labelled bad, while all others are labelled good. For each bad hyperedge b, the algorithm randomly selects a good hyperedge g and merges their constituent nodes into a single pool. These nodes are then randomly re-partitioned into two new hyperedges that preserve the original required sizes. This reshuffling is designed to eliminate the problematic connections while maintaining the intended degrees and hyperedge sizes. The procedure is repeated for a fixed number of iterations; although the algorithm includes a fallback for cases where it cannot resolve a conflict, this is a rare event in practice.

For a comprehensive treatment of the implementation details, we refer the reader to the reference below.

Generating h-ABCD.

Generating h-ABCD.


References

References

  1. Kamiński, B., Prałat, P., and Théberge, F. (2023). Hypergraph Artificial Benchmark for Community Detection (h-ABCD). Journal of Complex Networks, 11(4), cnad028.
  2. Kamiński, B., Prałat, P., and Théberge, F. (2021). Artificial Benchmark for Community Detection (ABCD) — Fast Random Graph Model with Community Structure. Network Science, 9(2), 153–178.

Reference: Paper link