Journal of the Royal Statistical Society. Series A: Statistics in Society

Zero-inflated stochastic block modelling of efficiency-security trade-offs in weighted criminal networks

Chaoyi Lu 1
Daniele Durante 2
Nial Friel 1
1
 
School of Mathematics and Statistics, University College Dublin , South Belfield 4, Dublin ,
2
 
Department of Decision Sciences and Bocconi Institute for Data Science and Analytics, Bocconi University , Via Roengten 1, Milan ,
Publication typeJournal Article
Publication date2025-03-19
scimago Q1
wos Q2
SJR0.775
CiteScore2.9
Impact factor1.5
ISSN09641998, 1467985X
Abstract

Criminal networks arise from the attempt to balance a need of establishing frequent ties among affiliates to facilitate coordination of illegal activities, with the necessity to sparsify the overall connectivity architecture to hide from law enforcement. This efficiency-security trade-off is also combined with the creation of groups of redundant criminals that exhibit similar connectivity patterns, thus guaranteeing resilient network architectures. State-of-the-art models for such data are not designed to infer these unique structures. In contrast to such solutions, we develop a tractable Bayesian zero-inflated Poisson stochastic block model (ZIP–SBM), which identifies groups of redundant criminals having similar connectivity patterns, and infers both overt and covert block interactions within and across these groups. This is accomplished by modelling the weighted ties (corresponding to counts of interactions among pairs of criminals) via zero-inflated Poisson distributions with block-specific parameters that quantify complex patterns in the excess of zero ties in each block (security) relative to the distribution of the observed weighted ties within that block (efficiency). The performance of ZIP–SBM is illustrated in simulations and in a study of summit co-attendances in a complex Mafia organization, where we unveil efficiency-security structures adopted by the criminal organization that were hidden to previous analyses.

Mantziou A., Lunagómez S., Mitra R.
Annals of Applied Statistics scimago Q1 wos Q2
2024-03-01 citations by CoLab: 2
Legramanti S., Rigon T., Durante D., Dunson D.B.
Annals of Applied Statistics scimago Q1 wos Q2
2022-09-27 citations by CoLab: 8 Abstract  
Reliably learning group structures among nodes in network data is challenging in several applications. We are particularly motivated by studying covert networks that encode relationships among criminals. These data are subject to measurement errors, and exhibit a complex combination of an unknown number of core-periphery, assortative and disassortative structures that may unveil key architectures of the criminal organization. The coexistence of these noisy block patterns limits the reliability of routinely-used community detection algorithms, and requires extensions of model-based solutions to realistically characterize the node partition process, incorporate information from node attributes, and provide improved strategies for estimation and uncertainty quantification. To cover these gaps, we develop a new class of extended stochastic block models (esbm) that infer groups of nodes having common connectivity patterns via Gibbs-type priors on the partition process. This choice encompasses many realistic priors for criminal networks, covering solutions with fixed, random and infinite number of possible groups, and facilitates the inclusion of node attributes in a principled manner. Among the new alternatives in our class, we focus on the Gnedin process as a realistic prior that allows the number of groups to be finite, random and subject to a reinforcement process coherent with criminal networks. A collapsed Gibbs sampler is proposed for the whole esbm class, and refined strategies for estimation, prediction, uncertainty quantification and model selection are outlined. The esbm performance is illustrated in realistic simulations and in an application to an Italian mafia network, where we unveil key complex block structures, mostly hidden from state-of-the-art alternatives.
Diviák T.
Social Networks scimago Q1 wos Q1
2022-05-01 citations by CoLab: 19 Abstract  
Data quality is considered to be among the greatest challenges in research on covert networks. This study identifies six aspects of network data collection, namely nodes, ties, attributes, levels, dynamics, and context. Addressing these aspects presents challenges, but also opens theoretical and methodological opportunities. Furthermore, specific issues arise in this research context, stemming from the use of secondary data and the problem of missing data. While each of the issues and challenges has some specific solution in the literature on organized crime and social networks, the main argument of this paper is to try and follow a more systematic and general solution to deal with these issues. To this end, three potentially synergistic and combinable techniques for data collection are proposed for each stage of data collection – biographies for data extraction, graph databases for data storage, and checklists for data reporting. The paper concludes with discussing the use of statistical models to analyse covert networks and the cultivation of relations within the research community and between researchers and practitioners.
Campana P., Varese F.
Social Networks scimago Q1 wos Q1
2022-05-01 citations by CoLab: 31 Abstract  
Network studies of organized crime (OC) normally explore two key relational issues: the internal structure of groups and the interactions among groups. The paper first discusses in depth two data sources that have been used to address these questions -- phone wiretaps and police-generated “events”– and reviews issues of validity, reliability and sampling. Next, it discusses challenges related to OC network data in general, focusing on the ‘double boundary specification’ problem and the time span of data collection. We conclude by arguing that structural analysis cannot be divorced from a deep contextual (qualitative) knowledge of the cases. The paper refers to concrete research dilemmas and solutions faced by scholars, including ourselves.
Ng T.L., Murphy T.B.
2021-09-13 citations by CoLab: 13 Abstract  
We propose a weighted stochastic block model (WSBM) which extends the stochastic block model to the important case in which edges are weighted. We address the parameter estimation of the WSBM by use of maximum likelihood and variational approaches, and establish the consistency of these estimators. The problem of choosing the number of classes in a WSBM is addressed. The proposed model is applied to simulated data and an illustrative data set.
Ficara A., Cavallaro L., Curreri F., Fiumara G., De Meo P., Bagdasar O., Song W., Liotta A.
PLoS ONE scimago Q1 wos Q1 Open Access
2021-08-11 citations by CoLab: 21 PDF Abstract  
Data collected in criminal investigations may suffer from issues like: (i) incompleteness, due to the covert nature of criminal organizations; (ii) incorrectness, caused by either unintentional data collection errors or intentional deception by criminals; (iii) inconsistency, when the same information is collected into law enforcement databases multiple times, or in different formats. In this paper we analyze nine real criminal networks of different nature (i.e., Mafia networks, criminal street gangs and terrorist organizations) in order to quantify the impact of incomplete data, and to determine which network type is most affected by it. The networks are firstly pruned using two specific methods: (i) random edge removal, simulating the scenario in which the Law Enforcement Agencies fail to intercept some calls, or to spot sporadic meetings among suspects; (ii) node removal, modeling the situation in which some suspects cannot be intercepted or investigated. Finally we compute spectral distances (i.e., Adjacency, Laplacian and normalized Laplacian Spectral Distances) and matrix distances (i.e., Root Euclidean Distance) between the complete and pruned networks, which we compare using statistical analysis. Our investigation identifies two main features: first, the overall understanding of the criminal networks remains high even with incomplete data on criminal interactions (i.e., when 10% of edges are removed); second, removing even a small fraction of suspects not investigated (i.e., 2% of nodes are removed) may lead to significant misinterpretation of the overall network.
Bright D., Brewer R., Morselli C.
Social Networks scimago Q1 wos Q1
2021-07-01 citations by CoLab: 48 Abstract  
• Data from criminal justice records has been sourced from all points in the criminal justice process. • Criminal justice records can facilitate SNA on criminals and criminal groups. • Criminal justice records suffer from challenges and limitations. • We articulate the challenges and provide recommendations about the use of criminal justice records to undertake SNA. The use of social network analysis to study groups of offenders engaged in illicit activities such as drug trafficking and terrorism has grown in popularity over the last three decades. Along with such growth, however, researchers have been confronted with a suite of challenges related to the use of data extracted from criminal justice records. In this paper, we review these challenges through a discussion of the extant empirical literature utilizing social network analysis approaches that draw data from the criminal justice system. First, we outline and discuss the different types of data used across this literature. Second, we chronicle the challenges that have emerged across the field of criminal networks via a comprehensive review of the literature. In particular, we draw on the documented experiences of researchers in the field, including our own, and detail “archeological” approaches that future researchers can utilize to adapt and overcome said challenges. The use of criminal justice records can suffer from a number of limitations, mainly with respect to accuracy, validity and reliability. Such data may include errors, both intentional (e.g. aliases, false information) and unintentional (e.g. transcription errors), including missing data. The use of criminal justice records present particular problems with defining the network boundary as the boundary as determined by law enforcement or prosecution agencies may not correspond to the boundary as defined by network members. We conclude by offering a number of recommendations for researchers about data collection and preparation when utilizing criminal justice records.
Calderoni F., Catanese S., De Meo P., Ficara A., Fiumara G.
2020-12-01 citations by CoLab: 44 Abstract  
• We describe a novel criminal dataset derived from an Italian crime case. • We extract two criminal networks capturing meetings/phone calls between suspected. • We tested many link prediction algorithms on our networks. • We investigated on the robustness of link prediction algorithms on criminal networks Link prediction exercises may prove particularly challenging with noisy and incomplete networks, such as criminal networks. Also, the link prediction effectiveness may vary across different relations within a social group. We address these issues by assessing the performance of different link prediction algorithms on a mafia organization. The analysis relies on an original dataset manually extracted from the judicial documents of operation “Montagna”, conducted by the Italian law enforcement agencies against individuals affiliated with the Sicilian Mafia. To run our analysis, we extracted two networks: one including meetings and one recording telephone calls among suspects, respectively. We conducted two experiments on these networks. First, we applied several link prediction algorithms and observed that link prediction algorithms leveraging the full graph topology (such as the Katz score) provide very accurate results even on very sparse networks. Second, we carried out extensive simulations to investigate how the noisy and incomplete nature of criminal networks may affect the accuracy of link prediction algorithms. The experimental findings suggest the soundness of link predictions is relatively high provided that only a limited amount of knowledge about connections is hidden or missing, and the unobserved edges follow some kind of generative law. The different results on the meeting and telephone call networks indicate that the specific features of a network should be taken into careful consideration.
Gollini I., Caimo A., Campana P.
Social Networks scimago Q1 wos Q1
2020-10-01 citations by CoLab: 3 Abstract  
Illegal markets are notoriously difficult to study. Police data offer an increasingly exploited source of evidence. However, their secondary nature poses challenges for researchers. A key issue is that researchers often have to deal with two sets of actors: targeted and non-targeted. This work develops a latent space model for interdependent ego-networks purposely created to deal with the targeted nature of police evidence. By treating targeted offenders as egos and their contacts as alters, the model (a) leverages on the full information available and (b) mirrors the specificity of the data collection strategy. The paper then applies this approach to analyse a real-world example of illegal markets, namely the smuggling of migrants. To this end, we utilise a novel dataset of 21,555 phone conversations wiretapped by the police to study interactions among offenders.
Cavallaro L., Ficara A., De Meo P., Fiumara G., Catanese S., Bagdasar O., Song W., Liotta A.
PLoS ONE scimago Q1 wos Q1 Open Access
2020-08-05 citations by CoLab: 51 PDF Abstract  
Compared to other types of social networks, criminal networks present particularly hard challenges, due to their strong resilience to disruption, which poses severe hurdles to Law-Enforcement Agencies (LEAs). Herein, we borrow methods and tools from Social Network Analysis (SNA) to (i) unveil the structure and organization of Sicilian Mafia gangs, based on two real-world datasets, and (ii) gain insights as to how to efficiently reduce the Largest Connected Component (LCC) of two networks derived from them. Mafia networks have peculiar features in terms of the links distribution and strength, which makes them very different from other social networks, and extremely robust to exogenous perturbations. Analysts also face difficulties in collecting reliable datasets that accurately describe the gangs’ internal structure and their relationships with the external world, which is why earlier studies are largely qualitative, elusive and incomplete. An added value of our work is the generation of two real-world datasets, based on raw data extracted from juridical acts, relating to a Mafia organization that operated in Sicily during the first decade of 2000s. We created two different networks, capturing phone calls and physical meetings, respectively. Our analysis simulated different intervention procedures: (i) arresting one criminal at a time (sequential node removal); and (ii) police raids (node block removal). In both the sequential, and the node block removal intervention procedures, the Betweenness centrality was the most effective strategy in prioritizing the nodes to be removed. For instance, when targeting the top 5% nodes with the largest Betweenness centrality, our simulations suggest a reduction of up to 70% in the size of the LCC. We also identified that, due the peculiar type of interactions in criminal networks (namely, the distribution of the interactions’ frequency), no significant differences exist between weighted and unweighted network analysis. Our work has significant practical applications for perturbing the operations of criminal and terrorist networks.
Newman M.E., Cantwell G.T., Young J.
Physical Review E scimago Q1 wos Q1
2020-04-23 citations by CoLab: 46 Abstract  
The information theoretic quantity known as mutual information finds wide use in classification and community detection analyses to compare two classifications of the same set of objects into groups. In the context of classification algorithms, for instance, it is often used to compare discovered classes to known ground truth and hence to quantify algorithm performance. Here we argue that the standard mutual information, as commonly defined, omits a crucial term which can become large under real-world conditions, producing results that can be substantially in error. We demonstrate how to correct this error and define a mutual information that works in all cases. We discuss practical implementation of the new measure and give some example applications.
Lindquist M.J., Zenou Y.
2019-08-15 citations by CoLab: 17 Abstract  
AbstractSocial network analysis can help us understand the root causes of delinquent behaviour and crime and provide practical guidance for the design of crime prevention policies. To illustrate these points, we first present a selective review of several key network studies and findings from the criminology and police studies literature. We then turn to a presentation of recent contributions made by network economists. We highlight ten policy lessons and provide a discussion of recent developments in the use of big data and computer technology.
Calderoni F., Superchi E.
Crime, Law and Social Change scimago Q2 wos Q2
2019-03-05 citations by CoLab: 21 Abstract  
Criminal leaders enhance their social capital by strategically brokering information among associates. To balance security and efficiency, leaders may favor meetings instead of telephones, potentially affecting analyses relying solely on wiretap data. Yet, few studies explored criminal leaders’ use of meetings in the management of criminal groups. We analyze criminal leaders’ participation in meetings and telephone calls in four distinct investigations. For each case, we extracted meetings and wiretap networks, analyzed leaders’ network positioning and identified leadership roles through logistic regressions relying on network centrality. Results show that leaders minimize telephone use (20% missing in wiretap net-works), and act as brokers, particularly in meeting networks (betweenness 18 times higher than non-leaders). Regressions on meeting networks identify leaders more effectively than wiretap networks, with betweenness centrality as the strongest predictor of leadership. Leaders’ centrality in meetings shows their strategic brokering position and the social embeddedness of criminal groups. While meeting participation is a sign of power, it is also a social obligation that leaders can hardly minimize. This makes them more visible, with possible benefits to investigations and intelligence.
Geng J., Bhattacharya A., Pati D.
2018-07-11 citations by CoLab: 40

Are you a researcher?

Create a profile to get free access to personal recommendations for colleagues and new articles.
Share
Cite this
GOST | RIS | BibTex
Found error?