Human behavior plays a critical role in shaping epidemic trajectories. During health crises, people respond in diverse ways in terms of self-protection and adherence to recommended measures, largely reflecting differences in how individuals assess risk. This behavioral variability induces effective heterogeneity into key epidemic parameters, such as infectivity and susceptibility. We introduce a minimal extension of the susceptible-infected-removed (SIR) model, denoted HeSIR, that captures these effects through a simple bimodal scheme, where individuals may have higher- or lower-transmission-related traits. We derive a closed-form expression for the epidemic threshold in terms of the model parameters, and the network's degree distribution and homophily, defined as the tendency of like-risk individuals to preferentially interact. We identify a resurgence regime just beyond the classical threshold, where the number of infected individuals may initially decline before surging into large-scale transmission. Through simulations on homogeneous and heterogeneous network topologies we corroborate the analytical results and highlight how variations in susceptibility and infectivity influence the epidemic dynamics. We further show that, under suitable assumptions, the HeSIR model maps onto a standard SIR process on an appropriately modified contact network, providing a unified interpretation in terms of structural connectivity. Our findings quantify the effect of heterogeneous behavioral responses, especially in the presence of homophily, and caution against underestimating epidemic potential in fragmented populations, which may undermine timely containment efforts. The results also extend to heterogeneity arising from biological or other nonbehavioral sources.
Most vaccines require multiple doses, the first to induce recognition and antibody production and subsequent doses to boost the primary response and achieve optimal protection. We show that properly prioritizing the administration of first and second doses can shift the epidemic threshold, separating the disease-free from the endemic state and potentially preventing widespread outbreaks. Assuming homogeneous mixing, we prove that at a low vaccination rate, the best strategy is to give absolute priority to first doses. In contrast, for high vaccination rates, we propose a scheduling that outperforms a first-come first-served approach. We identify the threshold that separates these two scenarios and derive the optimal prioritization scheme and interdose interval. Agent-based simulations on real and synthetic contact networks validate our findings. We provide specific guidelines for effective resource allocation, showing that adjusting the timing between the primer and booster significantly impacts epidemic outcomes and can determine whether the disease persists or disappears.
The Critical Node Detection Problem (CNDP) consists in finding the set of nodes, defined critical, whose removal maximally degrades the graph. In this work we focus on finding the set of critical nodes whose removal minimizes the pairwise connectivity of a direct graph (digraph). Such problem has been proved to be NP-hard, thus we need efficient heuristics to detect critical nodes in real-world applications. We aim at understanding which is the best heuristic we can apply to identify critical nodes in practice, i.e., taking into account time constrains and real-world networks. We present an in-depth analysis of several heuristics we ran on both real-world and on synthetic graphs. We define and evaluate two different strategies for each heuristic: standard and iterative. Our main findings show that an algorithm recently proposed to solve the CNDP and that can be used as heuristic for the general case provides the best results in real-world graphs, and it is also the fastest. However, there are few exceptions that are thoroughly analyzed and discussed. We show that among the heuristics we analyzed, few of them cannot be applied to very large graphs, when the iterative strategy is used, due to their time complexity. Finally, we suggest possible directions to further improve the heuristic providing the best results.
Models of networks play a major role in explaining and reproducing empirically observed patterns. Suitable models can be used to randomize an observed network while preserving some of its features, or to generate synthetic graphs whose properties may be tuned upon the characteristics of a given population. In the present paper, we introduce the Fitness-Corrected Block Model, an adjustable-density variation of the well-known Degree-Corrected Block Model, and we show that the proposed construction yields a maximum entropy model. When the network is sparse, we derive an analytical expression for the degree distribution of the model that depends on just the constraints and the chosen fitness-distribution. Our model is perfectly suited to define maximum-entropy data-driven spatial social networks, where each block identifies vertices having similar position (e.g., residence) and age, and where the expected block-to-block adjacency matrix can be inferred from the available data. In this case, the sparse-regime approximation coincides with a phenomenological model where the probability of a link binding two individuals is directly proportional to their sociability and to the typical cohesion of their age-groups, whereas it decays as an inverse-power of their geographic distance. We support our analytical findings through simulations of a stylized urban area.
The geographic distribution of the population on a region is a significant ingredient in shaping the spatial and temporal evolution of an epidemic outbreak. Heterogeneity in the population density directly impacts the local relative risk: the chances that a specific area is reached by the contagion depend on its local density and connectedness to the rest of the region. We consider an SIR epidemic spreading in an urban territory subdivided into tiles (i.e., census blocks) of given population and demographic profile. We use the relative attack rate and the first infection time of a tile to quantify local severity and timing: how much and how fast the outbreak will impact any given area. Assuming that the contact rate of any two individuals depends on their household distance, we identify a suitably defined geographical centrality that measures the average connectedness of an area as an efficient indicator for local riskiness. We simulate the epidemic under different assumptions regarding the socio-demographic factors that influence interaction patterns, providing empirical evidence of the effectiveness and soundness of the proposed centrality measure.
SIR
Epidemic
Risk Assessment
Data Driven
Urban System
Geographic Spreading
Tor is an open source software that allows accessing various kinds of resources, known as hidden services, while guaranteeing sender and receiver anonymity. Tor relies on a free, worldwide, overlay network, managed by volunteers, that works according to the principles of onion routing in which messages are encapsulated in layers of encryption, analogous to layers of an onion. The Tor Web is the set of web resources that exist on the Tor network, and Tor websites are part of the so-called dark web. Recent research works have evaluated Tor security, its evolution over time, and its thematic organization. Nevertheless, limited information is available about the structure of the graph defined by the network of Tor websites, not to be mistaken with the network of nodes that supports the onion routing. The limited number of entry points that can be used to crawl the network, makes the study of this graph far from being simple. In the present paper we analyze two graph representations of the Tor Web and the relationship between contents and structural features, considering three crawling datasets collected over a five-month time frame. Among other findings, we show that Tor consists of a tiny strongly connected component, in which link directories play a central role, and of a multitude of services that can (only) be reached from there. From this viewpoint, the graph appears inefficient. Nevertheless, if we only consider mutual connections, a more efficient subgraph emerges, that is, probably, the backbone of social interactions in Tor.
Network-based epidemic models that account for heterogeneous contact patterns are extensively used to predict and control the diffusion of infectious diseases. We use census and survey data to reconstruct a geo-referenced and age-stratified synthetic urban population connected by stable social relations. We consider two kinds of interactions, distinguishing daily (household) contacts from other frequent contacts. Moreover, we allow any couple of individuals to have rare fortuitous interactions. We simulate the epidemic diffusion on a synthetic urban network for a typical medium-sized Italian city and characterize the outbreak speed, pervasiveness, and predictability in terms of the socio-demographic and geographic features of the host population. Introducing age-structured contact patterns results in faster and more pervasive outbreaks, while assuming that the interaction frequency decays with distance has only negligible effects. Preliminary evidence shows the existence of patterns of hierarchical spatial diffusion in urban areas, with two regimes for epidemic spread in low- and high-density regions.
SIR
Epidemic
Social network
Data driven
Urban system
The recent COVID-19 pandemic came alongside with an "infodemic", with online social media flooded by often unreliable information associating the medical emergency with popular subjects of disinformation. In Italy, one of the first European countries suffering a rise in new cases and dealing with a total lockdown, controversial topics such as migrant flows and the 5G technology were often associated online with the origin and diffusion of the virus. In this work we analyze COVID-19 related conversations on the Italian Facebook, collecting over 1.5M posts shared by nearly 80k public pages and groups for a period of four months since January 2020. On the one hand, our findings suggest that well-known unreliable sources had a limited exposure, and that discussions over controversial topics did not spark a comparable engagement with respect to institutional and scientific communication. On the other hand, however, we realize that dis- and counter-information induced a polarization of (clusters of) groups and pages, wherein conversations were characterized by a topical lexicon, by a great diffusion of user generated content, and by link-sharing patterns that seem ascribable to coordinated propaganda. As revealed by the URL-sharing diffusion network showing a "small-world" effect, users were easily exposed to harmful propaganda as well as to verified information on the virus, exalting the role of public figures and mainstream media, as well as of Facebook groups, in shaping the public opinion.
Facebook
Infodemic
Disinformation
COVID-19
Online social networks
Defining accurate and flexible models for real-world networks of human beings is instrumental to understand the observed properties of phenomena taking place across those networks and to support computer simulations of dynamic processes of interest for several areas of research - including computational epidemiology, which is recently high on the agenda. In this paper we present a flexible model to generate age-stratified and geo-referenced synthetic social networks on the basis of widely available aggregated demographic data and, possibly, of estimated age-based social mixing patterns. Using the Italian city of Florence as a case study, we characterize our network model under selected configurations and we show its potential as a building block for the simulation of infections' propagation. A fully operational and parametric implementation of our model is released as open-source.
Urban social network
Graph model
Simulator
Epidemic
The COVID-19 pandemic triggered a global research effort to define and assess timely and effective containment policies. Understanding the role that specific venues play in the dynamics of epidemic spread is critical to guide the implementation of fine-grained non-pharmaceutical interventions (NPIs). In this paper, we present a new model of context-dependent interactions that integrates information about the surrounding territory and the social fabric. Building on this model, we developed an open-source data-driven simulator of the patterns of fruition of specific gathering places that can be easily configured to project and compare multiple scenarios. We focused on the greatest park of the City of Florence, Italy, to provide experimental evidence that our simulator produces contact graphs with unique, realistic features, and that gaining control of the mechanisms that govern interactions at the local scale allows to unveil and possibly control non-trivial aspects of the epidemic.
The definition of suitable generative models for synthetic yet realistic social networks is a widely studied problem in the literature. By not being tied to any real data, random graph models cannot capture all the subtleties of real networks and are inadequate for many practical contexts--including areas of research, such as computational epidemiology, which are recently high on the agenda. At the same time, the so-called contact networks describe interactions, rather than relationships, and are strongly dependent on the application and on the size and quality of the sample data used to infer them. To fill the gap between these two approaches, we present a data-driven model for urban social networks, implemented and released as open source software. By using just widely available aggregated demographic and social-mixing data, we are able to create, for a territory of interest, an age-stratified and geo-referenced synthetic population whose individuals are connected by "strong ties" of two types: Intra-household (e.g., kinship) or friendship. While household links are entirely data-driven, we propose a parametric probabilistic model for friendship, based on the assumption that distances and age differences play a role, and that not all individuals are equally sociable. The demographic and geographic factors governing the structure of the obtained network under different configurations, are thoroughly studied through extensive simulations focused on three Italian cities of different size.
simulator
open source
data-driven
graph model
urban social network
Operated by the H2020 SOMA Project, the recently established Social Observatory for Disinformation and Social Media Analysis supports researchers, journalists and fact-checkers in their quest for quality information. At the core of the Observatory lies the DisInfoNet Toolbox, designed to help a wide spectrum of users understand the dynamics of (fake) news dissemination in social networks. DisInfoNet combines text mining and classification with graph analysis and visualization to offer a comprehensive and user-friendly suite. To demonstrate the potential of our Toolbox, we consider a Twitter dataset of more than 1.3M tweets focused on the Italian 2016 constitutional referendum and use DisInfoNet to: (i) track relevant news stories and reconstruct their prevalence over time and space; (ii) detect central debating communities and capture their distinctive polarization/narrative; (iii) identify influencers both globally and in specific “disinformation networks”.
Classification
Disinformation
Social network analysis
The daily exposure of social media users to propaganda and disinformation campaigns has reinvigorated the need to investigate the local and global patterns of diffusion of different (mis)information content on social media. Echo chambers and influencers are often deemed responsible of both the polarization of users in online social networks and the success of propaganda and disinformation campaigns. This article adopts a data-driven approach to investigate the structuration of communities and propaganda networks on Twitter in order to assess the correctness of these imputations. In particular, the work aims at characterizing networks of propaganda extracted from a Twitter dataset by combining the information gained by three different classification approaches, focused respectively on (i) using Tweets content to infer the "polarization" of users around a specific topic, (ii) identifying users having an active role in the diffusion of different propaganda and disinformation items, and (iii) analyzing social ties to identify topological clusters and users playing a "central" role in the network. The work identifies highly partisan community structures along political alignments; furthermore, centrality metrics proved to be very informative to detect the most active users in the network and to distinguish users playing different roles; finally, polarization and clustering structure of the retweet graphs provided useful insights about relevant properties of users exposure, interactions, and participation to different propaganda items.
Operated by the H2020 SOMA Project, the recently established Social Observatory for Disinformation and Social Media Analysis supports researchers, journalists and fact-checkers in their quest for quality information. At the core of the Observatory lies the DisInfoNet Toolbox, designed to help a wide spectrum of users understand the dynamics of (fake) news dissemination in social networks. DisInfoNet combines text mining and classification with graph analysis and visualization to offer a comprehensive and user-friendly suite. To demonstrate the potential of our Toolbox, we consider a Twitter dataset of more than 1.3M tweets focused on the Italian 2016 constitutional referendum and use DisInfoNet to: (i) track relevant news stories and reconstruct their prevalence over time and space; (ii) detect central debating communities and capture their distinctive polarization/narrative; (iii) identify influencers both globally and in specific "disinformation networks".
Social network analysis
Disinformation
Classification
Dinur and Shamir's cube attack has attracted significant attention in the literature. Nevertheless, the lack of implementations achieving effective results casts doubts on its practical relevance. On the theoretical side, promising results have been recently achieved leveraging on division trails. The present paper follows a more practical approach and aims at giving new impetus to this line of research by means of a cipher-independent flexible framework that is able to carry out the cube attack on GPU/CPU clusters. We address all issues posed by a GPU implementation, providing evidence in support of parallel variants of the attack and identifying viable directions for solving open problems in the future. We report the results of running our GPU-based cube attack against round-reduced versions of three well-known ciphers: Trivium, Grain-128 and SNOW 3G. Our attack against Trivium improves the state of the art, permitting full key recovery for Trivium reduced to (up to) 781 initialization rounds (out of 1152) and finding the first-ever maxterm after 800 rounds. In this paper, we also present the first standard cube attack (i.e., neither dynamic nor tester) to yield maxterms for Grain-128 up to 160 initialization rounds on non-programmable hardware. We include a thorough evaluation of the impact of system parameters and GPU architecture on the performance. Moreover, we demonstrate the scalability of our solution on multi-GPU systems. We believe that our extensive set of results can be useful for the cryptographic engineering community at large and can pave the way to further results in the area.
Cube attack
Algebraic attacks
Graphics processing unit
Tor hidden services allow offering and accessing various Internet resources while guaranteeing a high degree of provider and user anonymity. So far, most research work on the Tor network aimed at discovering protocol vulnerabilities to de-anonymize users and services. Other work aimed at estimating the number of available hidden services and classifying them. Something that still remains largely unknown is the structure of the graph defined by the network of Tor services. In this paper, we describe the topology of the Tor graph (aggregated at the hidden service level) measuring both global and local properties by means of well-known metrics. We consider three different snapshots obtained by extensively crawling Tor three times over a 5 months time frame. We separately study these three graphs and their shared "stable" core. In doing so, other than assessing the renowned volatility of Tor hidden services, we make it possible to distinguish time dependent and structural aspects of the Tor graph. Our findings show that, among other things, the graph of Tor hidden services presents some of the characteristics of social and surface web graphs, along with a few unique peculiarities, such as a very high percentage of nodes having no outbound links.
The exploration and analysis of Web graphs has flourished in the recent past, producing a large number of relevant and interesting research results. However, the unique characteristics of the Tor network demand for specific algorithms to explore and analyze it. Tor is an anonymity network that allows offering and accessing various Internet resources while guaranteeing a high degree of provider and user anonymity. So far the attention of the research community has focused on assessing the security of the Tor infrastructure. Most research work on the Tor network aimed at discovering protocol vulnerabilities to de-anonymize users and services, while little or no information is available about the topology of the Tor Web graph or the relationship between pages' content and topological structure. With our work we aim at addressing such lack of information. We describe the topology of the Tor Web graph measuring both global and local properties by means of well-known metrics that require due to the size of the network, high performance algorithms. We consider three different snapshots obtained by extensively crawling Tor three times over a 5 months time frame. Finally we present a correlation analysis of pages' semantics and topology, discussing novel insights about the Tor Web organization and its content. Our findings show that the Tor graph presents some of the character- istics of social and surface web graphs, along with a few unique peculiarities.
Topic Modelling (TM) is a widely adopted generative model used to infer the thematic organization of text corpora. When document-level covariate information is available, so-called Structural Topic Modelling (STM) is the state-of-the-art approach to embed this information in the topic mining algorithm. Usually, TM algorithms rely on unigrams as the basic text generation unit, whereas the quality and intelligibility of the identified topics would significantly benefit from the detection and usage of topical phrasemes. Following on from previous research, in this paper we propose the first iterative algorithm to extend STM with n-grams, and we test our solution on textual data collected from four well-known ToR drug marketplaces. Significantly, we employ a STM-guided n-gram selection process, so that topic-specific phrasemes can be identified regardless of their global relevance in the corpus. Our experiments show that enriching the dictionary with selected n-grams improves the usability of STM, allowing the discovery of key information hidden in an apparently "mono-thematic" dataset.
Traffic data, automatically collected en masse every day, can be mined to discover information or patterns to support police investigations. Leveraging on domain expertise, in this paper we show how unsupervised clustering techniques can be used to infer trending behaviors for road-users and thus classify both routes and vehicles. We describe a tool devised and implemented upon openly-available scientific libraries and we present a new set of experiments involving three years worth data. Our classification results show robustness to noise and have high potential for detecting anomalies possibly connected to criminal activity.
The amount of traffic data collected by automatic number plate reading systems constantly incrseases. It is therefore important, for law enforcement agencies, to find convenient techniques and tools to analyze such data. In this paper we propose a scalable and fully automated procedure leveraging the Apache Accumulo technology that allows an effective importing and processing of traffic data. We discuss preliminary results obtained by using our application for the analysis of a dataset containing real traffic data provided by the Italian National Police. We believe the results described here can pave the way to further interesting research on the matter.
Apache Accumulo
Exploratory Data Analysis
Traffic Data