Kamiński, B.; Olczak, T.; Pankratz, B.; Prałat, P.; Théberge, F. · 2022
ABCDe (Multi-threaded ABCD Generator)
Method
In modern computing, multi-threading is a programming and execution model that allows a single process to manage multiple "threads" of execution concurrently. While a traditional single-threaded algorithm processes tasks sequentially — one after another — a multi-threaded algorithm can decompose a large problem into smaller sub-tasks that are executed in parallel across the multiple cores of a processor and communicate to coordinate their work.
This approach significantly enhances performance and throughput, particularly in computationally intensive applications like graph generation. By sharing the same memory space while executing different instructions, threads allow for efficient data exchange; however, they also require careful synchronization to ensure data integrity and to prevent race conditions (where multiple threads attempt to modify the same data simultaneously) and false sharing (where different threads work on physically close memory areas). In the context of the ABCD model, transitioning to a multi-threaded implementation like ABCDe allows for the rapid generation of massive synthetic networks that would otherwise be bottlenecked by the limits of a single CPU core.
The architectural design of the ABCD family inherently encourages multi-threading because it is structured as a union of many disjoint community graphs and a single global background graph. Since each community sub-graph is generated independently, they represent a collection of granular, isolated tasks that can be distributed across multiple CPU threads simultaneously with minimal overhead. This modularity is the primary driver behind the efficiency of ABCDe.
The implementation also addresses a critical bottleneck: the background graph. In a sequential model, the background graph is generated last to avoid edge collisions, but ABCDe parallelizes this phase as well. This is essential for maintaining high performance — leaving the background graph for sequential processing would create a bottleneck that, according to Amdahl's Law, would significantly limit the overall speedup regardless of how many processor cores are available.
To overcome the serialization bottleneck, ABCDe adopts a two-phased approach to generating the background graph, as full parallelization is hampered by the sequential nature of the configuration model:
- Phase 1. The background graph is generated as a standalone, independent task in parallel with the community graphs. During this stage it is internally simplified by rewiring any self-loops or multi-edges, making this phase functionally identical to the parallel generation of a community sub-graph.
- Phase 2. After all individual components (the background graph and all community graphs) have completed, the graphs are merged into the final ABCDe network. A final, targeted rewiring procedure then resolves any new parallel edges created during the merger between the background and community structures.
This two-batch strategy lets the generator maximize thread utility without the high synchronization costs that would come from trying to parallelize the underlying sampling algorithm itself.
To ensure a well-balanced distribution of work and minimize total processing time, ABCDe employs a strategic task allocation policy. Since the background graph is typically the largest and most time-consuming component, it is prioritized and assigned to the first available thread to begin processing immediately. The remaining community graphs are organized in a random order within a FIFO queue and processed by a pool of threads following a classic producer–consumer pattern, ensuring a uniform workload in expectation. While an exceptionally large background graph could still lead to unintended serialization (as a single thread processes it), this generally only occurs when the mixing parameter ξ is close to one. In practical applications where ξ is typically less than 0.5, this scheduling approach effectively maximizes parallel efficiency.
Generating ABCDe in parallel.
Reproducibility is a fundamental pillar of scientific research, ensuring that experiments can be independently verified and validated. Achieving this in parallel algorithms such as ABCDe presents a unique challenge: because the operating system scheduler determines the order and allocation of tasks to threads at runtime, a naive implementation would produce a different graph each time it is executed, even with identical input parameters.
To resolve this, ABCDe associates every individual task — whether a community sub-graph or the background graph — with its own unique random number generator seed before any generation begins. By isolating the pseudo-random streams in this way, the model ensures that the resulting network remains identical regardless of the number of threads used or the specific order in which tasks are processed, thereby guaranteeing consistent and reproducible results for the scientific community. A detailed analysis of graph generation speed using ABCDe can be found in the reference below.
References
References
- Kamiński, B., Olczak, T., Pankratz, B., Prałat, P., and Théberge, F. (2022). Properties and Performance of the ABCDe Random Graph Model with Community Structure. Big Data Research, 30, 100348.
- Amdahl, G. M. (1967). Validity of the single processor approach to achieving large scale computing capabilities. In Proceedings of the April 18–20, 1967, Spring Joint Computer Conference, pp. 483–485.
- Kamiński, B., Prałat, P., and Théberge, F. (2021). Artificial Benchmark for Community Detection (ABCD) — Fast Random Graph Model with Community Structure. Network Science, 9(2), 153–178.
Reference: Paper link
