Łukasz Kraiński, Michał Czuba, Piotr Bródka, Paweł Prałat, Bogumił Kamiński, François Théberge · 2025

Multilayer ABCD (mABCD)

Method

Multilayer Networks and the Need for mABCD

Traditional network models have profoundly advanced our understanding of complex systems, yet they often rely on a critical simplification: projecting all interactions into a single, homogeneous structure. In reality, the architecture of real-world systems is inherently multifaceted. Multilayer networks were developed to overcome this limitation by introducing a framework where entities can interact simultaneously across distinct "layers" of relationships. Within a social system, the same individuals might be connected via family ties in one layer, professional collaborations in another, and digital communications in a third. By preserving the heterogeneity of these diverse interactions, rather than collapsing them into a single generic graph, multilayer networks offer a significantly more accurate representation of complex environments. This layered approach enables researchers to capture intricate structural dynamics and cross-layer dependencies that single-layer models fundamentally obscure.

The family of ABCD graphs aims to address one of the central challenges in network analysis: community detection — identifying groups of nodes that are densely connected internally but sparsely connected to the rest of the system. When transitioning to multilayer networks, this task becomes significantly more complex. Communities might persist consistently across all layers, vary drastically from one layer to another, or exhibit intricate cross-layer dependencies. While researchers have developed a wide array of sophisticated algorithms to detect these multilayer communities, a critical bottleneck remains: how do we rigorously evaluate their accuracy? Real-world multilayer datasets rarely come with a known ground-truth community structure. This gap highlights the necessity for robust synthetic models. By generating realistic multilayer networks with built-in, verifiable community structures, researchers can systematically test, benchmark, and refine community detection algorithms under controlled conditions before applying them to complex real-world data.

Multilayer networks.

Active Nodes

Due to the inherent complexity of multilayer systems, mABCD adopts several strategic simplifications to remain both flexible and computationally efficient. The model considers a multilayer network composed of ℓ layers and a fixed set of n actors, each of whom can potentially participate in any of the layers.

In a multilayer network, actors are not always present across all layers. This is modelled using an activity vector q = (q_1, q_2, ..., q_ℓ), where q_i defines the probability that a given actor is active in layer i. For each layer, actors are independently selected to be active based on these probabilities; those selected are instantiated as the nodes that will eventually form the graph representing that layer. A practical example occurs in social systems, where an individual may be a prominent node in a professional layer such as LinkedIn but remain entirely inactive in a personal layer like Snapchat. By incorporating this stochastic activity, mABCD provides a realistic simulation of "non-aligned" networks, allowing researchers to evaluate how community detection algorithms perform when actors appear or disappear across different relational contexts.

Correlations Between Nodes Degrees in Various Layers

In multilayer networks, an actor's level of engagement or prominence often varies significantly across different contexts, making it crucial to model the correlations between their node degrees in various layers. Assuming an actor maintains a similar number of connections in every layer fails to capture the complex reality of human and system behaviour. A renowned researcher might be highly collaborative and possess a massive number of connections (a high degree) in an academic co-authorship layer, yet remain almost entirely disconnected (a low degree) in a social media layer such as Instagram.

The degree sequences for all ℓ layers are generated independently, so we may concentrate on a given layer i. As in all members of the ABCD family, the degree sequence satisfies (a) a power law with parameter γ_i, (b) a minimum value of at least δ_i, and (c) a maximum value of at most Δ_i.

To accurately replicate correlations between node degrees across layers, mABCD requires the desired inter-layer degree correlation as an input parameter. Labeling actors with integers {1, 2, ..., n}, we employ the Kendall rank correlation coefficient τ_i ∈ [-1, 1] to quantify this relationship. Here, τ_i provides a nonparametric measure of ordinal association by comparing the ranking of active actors by their assigned degrees in layer i against their integer labels. Three specific values illustrate its meaning:

No correlation (τ_i = 0): degrees are assigned uniformly at random to the active nodes.
Perfect positive correlation (τ_i = 1): the order of active nodes by their labels perfectly matches their degree ranking — the actor with the smallest label receives the largest degree. If actors a₁ and a₂ are active in layer i with 1 ≤ a₁ < a₂ ≤ n, then deg_i(a₁) ≥ deg_i(a₂).
Perfect negative correlation (τ_i = -1): the order is perfectly reversed; actors with larger labels receive the highest degrees, so 1 ≤ a₁ < a₂ ≤ n implies deg_i(a₁) ≤ deg_i(a₂).

Because τ_i can be specified independently for each layer, a single actor might exhibit a remarkably high degree in one layer while remaining largely disconnected in another. This flexibility allows mABCD to generate synthetic networks that faithfully mirror the diverse degree dynamics observed in empirical multilayer data.

To achieve the desired property, each active node independently generates a normally distributed random variable X_a = N(a/n, σ_i), where the variance σ_i is a specific function of τ_i. We sort active nodes in increasing order of their X_a values and assign the degree sequence accordingly. If σ_i = 0, then X_a = a/n deterministically, recovering the perfect correlation between degrees and labels (τ_i = 1). If σ_i → ∞, the order of nodes becomes perfectly random (uniform), recovering the other extreme (τ_i = 0). The function σ_i : [0,1] → [0, ∞) can be empirically approximated so that the variance yields a Kendall correlation close to the desired τ_i ∈ [0,1]. To handle negative correlations τ_i ∈ [-1, 0), we simply "flip" the order generated for |τ_i|.

Correlations between node degrees.

Correlations Between Partitions in Various Layers

In the study of multilayer networks, it is essential to recognize that community structures across different layers are rarely completely independent. In many real-world systems, the partitioning of actors into communities exhibits strong cross-layer correlations: individuals who form a tightly knit group in one context often cluster together in another. In a university setting, a subgroup of researchers in the same academic department — a distinct community in a professional co-authorship layer — is highly likely to form a corresponding community within an informal social layer such as a departmental chat group. If a synthetic model generated communities independently for each layer, it would fail to capture these fundamental structural overlaps and dependencies. To create realistic testbeds for multilayer community detection, it is therefore crucial to explicitly model and control the degree of correlation between partitions across layers.

To quantify and control this overlap, mABCD accepts the desired partition correlation as an input parameter, measured using adjusted mutual information (AMI). AMI is a standard information-theoretic metric for comparing the similarity of two partitions of the same set; here it compares the ground-truth community assignments between any two layers. The metric is scaled such that an AMI of 1 indicates perfectly identical community structures, while an AMI of 0 implies that any observed alignment is no better than random chance. To ensure a mathematically valid comparison, the model evaluates AMI exclusively on the shared set of nodes; any actor inactive in at least one of the two compared layers is ignored.

Community size sequences are generated independently for each layer. For a given layer i, the distribution must satisfy: (a) a power law with parameter β_i, (b) a minimum size s_i, and (c) a maximum size S_i. To induce correlation between community structures across layers, we introduce a latent reference layer that guides the assignment of actors to communities. This auxiliary layer can be conceptualized as a set of intrinsic actor attributes — age, geographic location, or shared beliefs — that consistently shape an actor's affiliations across contexts. This single reference layer serves as the foundation for all ℓ layers. Within it, each actor is assigned a coordinate vector in ℝ^d (default d = 2), sampled independently and uniformly at random from a unit ball centred at the origin.

Now concentrate on a given layer i. The community sizes have already been generated, but no nodes have been assigned. Let R be the set of active nodes. We assign nodes to communities one community at a time, in random order. When a community is formed, we first select the node from R at the largest distance from the origin (in the reference layer). This node, together with its required number of nearest neighbours in R, forms a new community, which we then remove from R. This strategy ensures that proximity in the latent space translates directly into shared community membership — intuitively, like eating a biscuit from the edge toward the center. Just as nodes positioned close together are likely to be "eaten" in the same bite, they are clustered into the same group. By reusing the same reference layer across all ℓ layers (akin to different gourmets enjoying the same biscuit), the model maintains a consistent cross-layer community correlation while accommodating the varying community sizes found in empirical data.

To modulate the strength of cross-layer community correlation, mABCD uses a parameter r_i ∈ [0,1] to introduce noise into the partition. Under this procedure, each active node independently vacates its initial, geometry-based community with probability 1 − r_i. These displaced nodes are then redistributed uniformly at random into the resulting vacancies across the layer. At the extreme r_i = 1, the geometric structure is perfectly preserved, maximizing correlation. When r_i = 0, community assignments become entirely independent of the latent reference layer, effectively eliminating any intended cross-layer correlation.

Correlations between partitions.

Correlations Between Edges in Various Layers

While modelling correlations between node degrees and community partitions is crucial, it is equally important to account for the direct correlation of individual edges across layers. In empirical multilayer networks, the existence of a connection between two actors in one layer often significantly increases the probability of an edge between them in another, independent of their broader community affiliations. Consider two close family members with entirely different professions and interests: they might belong to completely distinct communities in a professional networking layer and a hobbyist layer, yet remain highly likely to maintain direct, persistent connections across multiple platforms. If a generative model only correlates large-scale structures like communities, it fails to capture these resilient, micro-level pairwise interactions. mABCD therefore explicitly incorporates a mechanism to govern the direct overlap of edges between layers.

To quantify edge correlations, we define a correlation matrix whose entries r_{i,j} ∈ [0,1] capture the edge overlap between layers i and j. Let E_i^j denote the set of edges present in layer i that involve actors who are also active in layer j. The correlation coefficient is defined as:

r_{i,j} = |E_i^j ∩ E_j^i| / min{ |E_i^j|, |E_j^i| }.

If min{|E_i^j|, |E_j^i|} = 0 (at least one layer has no edges between the shared active actors), r_{i,j} is left undefined. This metric is naturally normalized: the maximum value of 1 is attained when the edges in one layer form a perfect subset of the edges in the other, while the minimum value of 0 occurs when the two edge sets are completely disjoint.

To align the empirical edge correlations of the generated network with the desired input matrix, mABCD performs a series of edge rewirings in batches. Crucially, this rewiring strictly preserves the degree of every involved node; both the internal community degrees and the background degrees remain constant, so the overall noise level of the partition is entirely unaffected. Before each batch, the empirical correlation matrix is recomputed and compared against the target. A layer pair (i, j) is then selected at random, with probability proportional to the discrepancy between its empirical and desired values. Within the batch, a small fraction of edges in the selected layers is iteratively rewired to bring the empirical correlation closer to the target.

Suppose the empirical correlation between layers i and j is lower than the target. To increase edge overlap, each rewiring attempt proceeds as follows:

Edge Selection. Randomly designate one of the two layers as the source graph and select a random edge uv connecting two actors active in both layers. The objective is to introduce uv into the target graph in the other layer. If uv already exists there, the attempt terminates prematurely.
Intra-Community Rewiring. If u and v belong to the same community in the target graph, we identify random neighbours u' (connected to u) and v' (connected to v) within that same community, if any. Because all four nodes share a community, we can rewire without altering internal degree properties: delete edges uu' and vv', and insert uv and u'v'.
Inter-Community Rewiring. If u and v belong to different communities, the procedure is conceptually identical, but the target neighbours u' and v' are selected so that all four nodes belong to distinct communities. The edges are swapped as before, safely preserving the inter-community (background) degree distributions.

Conversely, if the empirical correlation exceeds the target, an analogous procedure deliberately breaks shared edges to reduce overlap.

The overarching objective is to iteratively converge the empirical correlation matrix toward the target matrix. However, because the layers are interconnected, adjusting one entry can inadvertently alter others, beneficially or adversely. Consequently, monotonic convergence is not guaranteed, and the optimal network might not be the one generated at the very end of a fixed number of batches. To account for these interdependent dynamics, the algorithm continuously tracks the overall discrepancy between the empirical and target matrices at the start of each batch. The final output is then explicitly selected as the network state that achieved the highest overall quality — the minimum discrepancy — throughout the entire optimization process.

Correlations between edges.

References

Kraiński, Ł., Czuba, M., Bródka, P., Prałat, P., Kamiński, B., and Théberge, F. (2025). Multilayer Artificial Benchmark for Community Detection (mABCD). Expert Systems with Applications, p. 130920.
Kamiński, B., Prałat, P., and Théberge, F. (2021). Artificial Benchmark for Community Detection (ABCD) — Fast Random Graph Model with Community Structure. Network Science, 9(2), 153–178.

Reference: Paper link