Kamiński, B.; Prałat, P.; Théberge, F. · 2023

ABCD+o (ABCD with Outliers)

Method

The Role of Outliers and the Need for ABCD+o

In the broader realm of data science, an outlier is a data point that stubbornly refuses to fit the established pattern. It is an observation that diverges so significantly from the rest of the dataset that it demands special attention. Outliers are often viewed as a double-edged sword: on one hand, they can be the result of experimental errors, faulty sensors, or random noise that threatens to skew statistical averages and mislead machine learning algorithms. On the other hand, they are sometimes the most valuable pieces of information in the entire dataset, representing a fraudulent credit card transaction, a rare genetic mutation, or a sudden shift in consumer behaviour. Whether they are a nuisance to be filtered out or a vital anomaly to be studied, properly identifying and handling outliers is a fundamental requirement for building robust, reliable analytical models.

When we translate this concept to complex networks, outliers take on a structural role. In a network characterized by modular community structure, an outlier is a node that does not belong to any distinct, tight-knit group. Natural examples are everywhere: a spam bot on a social media platform that randomly interacts with users across unrelated circles, or a highly multidisciplinary researcher whose collaborations are spread thinly across many fields rather than concentrated inside one department.

Historically, many synthetic generators and community detection algorithms forced every single node into a community. This rigid constraint artificially distorts the network's true structure by cramming noisy nodes into established groups and blurring the boundaries of legitimate clusters. ABCD+o addresses this directly by incorporating outlier nodes that connect purely as background noise without a designated community home. This creates a more realistic benchmark for testing algorithms that must both recover communities and remain robust to structural noise.

Outliers.

Adjusting the Original Model

The adjusted model, ABCD+o, introduces an additional parameter $s_0$ that specifies the number of outliers. Because the original ABCD construction is modular, extending it with outliers only requires a small number of targeted changes.

The first change is in the community-size generation step. Since there are $s_0$ outliers, the community sizes are generated exactly as in the original model, except that their total must now sum to $n - s_0$ instead of $n$.

The second change affects how stubs are split between community and background roles. In the original model, each node $i$ partitions its $k_i$ stubs into community stubs and background stubs according to the structural-noise level $\xi$. Non-outlier nodes continue to follow this rule in ABCD+o. Outlier nodes, however, are not assigned to any community, so all of their stubs become background stubs.

After that adjustment, the usual ABCD workflow continues: community graphs are generated for the non-outlier communities, a background graph is generated for the external stubs, and the final graph is assembled from their union.

There is one edge case worth noting when $\xi$ is close to zero. In the extreme case $\xi = 0$, only outliers contribute stubs to the background graph. To ensure that a simple graph with the requested degree sequence can still exist, all outlier degrees must then be strictly smaller than $s_0$. The model is designed to handle this case, but in the large-graph regime that is most relevant in practice, where $n$ is large, $s_0$ is relatively small, and $\xi$ is not zero, there are typically many non-outlier nodes with background degree as well, so this restriction is not active.

References

Kamiński, B., Prałat, P., and Théberge, F. (2023). Artificial Benchmark for Community Detection with Outliers (ABCD+o). Applied Network Science, 8(1), 25.
Kamiński, B., Prałat, P., and Théberge, F. (2021). Artificial Benchmark for Community Detection (ABCD) - Fast Random Graph Model with Community Structure. Network Science, 9(2), 153-178.
Lancichinetti, A., Fortunato, S., and Radicchi, F. (2008). Benchmark Graphs for Testing Community Detection Algorithms. Physical Review E, 78(4), 046110.

Reference: Paper link