Social-network analysis

Our efforts on social networks focus on the analysis of graph problems, on the understanding of user behavior, and on the development of tools for online collaborative systems.

We study issues regarding the evolution of collaboration networks, such as Wikipedia, computation and assignment of trust and reputation in social networks, and the importance of authorities in the phenomenon of peer influence.

We are developping a framework for the problems of online team formation, where the objective is to provide tools for the automati creation of teams of experts that are able to execute tasks that arrive online, while satisfying a variety of constraints (minimization of load, unfairness, communication cost, etc.).

We also develop algorithms for computations on social-network graphs such as streaming-based algorithms for computing the clustering coefficient and for counting triangles and other small subgraphs.

Publications

2015

V. Auletta, D. Ferraioli, F. Pasquale, P. Penna, and G. Persiano

Logit Dynamics with Concurrent Updates for Local Interaction Potential Games

Algorithmica, Volume 73, Number 3, 2015

Show abstract

Hide abstract

Logit choice dynamics constitute a family of randomized best response dynamics based on the logit choice function (McFadden in Frontiers in econometrics. Academic Press, New York, 1974) that models players with limited rationality and knowledge. In this paper we study the all-logit dynamics [also known as simultaneous learning (Alós-Ferrer and Netzer in Games Econ Behav 68(2):413--427, 2010)], where at each time step all players concurrently update their strategies according to the logit choice function. In the well studied (one-)logit dynamics (Blume in Games Econ Behav 5(3):387--424, 1993) instead at each step only one randomly chosen player is allowed to update. We study properties of the all-logit dynamics in the context of local interaction potential games, a class of games that has been used to model complex social phenomena (Montanari and Saberi 2009; Peyton in The economy as a complex evolving system. Oxford University Press, Oxford, 2003) and physical systems (Levin et al. in Probab Theory Relat Fields 146(1--2):223--265, 2010; Martinelli in Lectures on probability theory and statistics. Springer, Berlin, 1999). In a local interaction potential game players are the vertices of a social graph whose edges are two-player potential games. Each player picks one strategy to be played for all the games she is involved in and the payoff of the player is the sum of the payoffs from each of the games. We prove that local interaction potential games characterize the class of games for which the all-logit dynamics is reversible. We then compare the stationary behavior of one-logit and all-logit dynamics. Specifically, we look at the expected value of a notable class of observables, that we call decomposable observables. We prove that the difference between the expected values of the observables at stationarity for the two dynamics depends only on the rationality level beta and on the distance of the social graph from a bipartite graph. In particular, if the social graph is bipartite then decomposable observables have the same expected value. Finally, we show that the mixing time of the all-logit dynamics has the same twofold behavior that has been highlighted in the case of the one-logit: for some games it exponentially depends on the rationality level beta, whereas for other games it can be upper bounded by a function independent from beta.

A. Anagnostopoulos, L. Becchetti, B. de Keijzer, and G. Schäfer

Inefficiency of Games with Social Context [pdf]

Theory of Computing Systems, Volume 57, Number 3, 2015

Show abstract

Hide abstract

The study of other-regarding player behavior such as altruism and spite in games has recently received quite some attention in the algorithmic game theory literature. Already for very simple models, it has been shown that altruistic behavior can actually be harmful for society in the sense that the price of anarchy may increase as the players become more altruistic. In this paper, we study the severity of this phenomenon for more realistic settings in which there is a complex underlying social structure, causing the players to direct their altruistic and spiteful behavior in a refined player-specific sense (depending, for example, on friendships that exist among the players). Our findings show that the increase in the price of anarchy is modest for congestion games and minsum scheduling games, whereas it might be drastic for generalized second price auctions.

A. Anagnostopoulos, and M. Sorella

Learning a Macroscopic Model of Cultural Dynamics [pdf]

Proc. Of the 15th International Conference on Data Mining (ICDM 2015), 2015

Show abstract

Hide abstract

A fundamental open question that has been studied by sociologists since the 70s and recently started being addressed by the computer-science community is the understanding of the role that influence and selection play in shaping the evolution of socio-cultural systems. Quantifying these forces in real settings is still a big challenge, especially in the large-scale case in which the entire social network between the users may not be known, and only longitudinal data in terms of masses of cultural groups (e.g., political affiliation, product adoption, market share, cultural tastes) may be available. We propose an influence and selection model encompassing an explicit characterization of the feature space for the different cultural groups in the form of a natural equation-based macroscopic model, following the approach of Kempe et al. [EC 2013]. Our main goal is to estimate edge influence strengths and selection parameters from an observed time series. To do an experimental evaluation on real data, we perform learning on real datasets from Last.FM and Wikipedia.

A. Bessi, F. Petroni, M. D. Vicario, F. Zollo, A. Anagnostopoulos, A. Scala, G. Caldarelli, and W. Quattrociocchi

Viral Misinformation: The Role of Homophily and Polarization [pdf]

Proc. Of the 24th International World Wide Web Conference, Companion Volume (WWW 2015, Web Science Track), 2015

Show abstract

Hide abstract

No abstract available.

A. Anagnostopoulos, D. Ferraioli, and S. Leonardi

Competitive Influence in Social Networks: Convergence, Submodularity, and Competition Effects [pdf]

Proc. 2015 International Conference on Autonomous Agents and Multiagent Systems (AAMAS 2015), 2015

Show abstract

Hide abstract

In the last 10 years, a vast amount of scientific literature has studied the problem of influence maximization. Yet, only very recently have scientists started considering the more realistic case in which competing entities try to expand their market and maximize their share via viral marketing. Goyal and Kearns [STOC 2012] present a model for the diffusion of two competing alternatives in a social network, which consists of two phases: one for the activation, in which nodes choose whether to adopt any of the two alternatives or none of them, and one for the selection, which is for choosing which of the two alternatives to adopt.

In this work we consider this two-phase model, by composing some of the most known dynamics (threshold, voter, and logit models), and we ask the following questions: (1) How is the stationary distribution of the composition of these dynamics related to those of the single composing dynamics? (2) Does the number of adopters of one of the alternatives increase in a monotone and submodular way with respect to the set of initial adopters of that alternative? (3) To what extent does the competition among alternatives affect the total number of agents adopting one of the alternatives?

L. Becchetti, A. Clementi, E. Natale, F. Pasquale, and R. Silvestri

Plurality Consensus in the Gossip Model

Proc. 26th Annual ACM-SIAM Symposium on Discrete Algorithms (SODA 2015), 2015

Show abstract

Hide abstract

We study Plurality Consensus in the GOSSIP Model over a network of n anonymous agents. Each agent supports an initial opinion or color. We assume that at the onset, the number of agents supporting the plurality color exceeds that of the agents supporting any other color by a sufficiently-large bias, though the initial plurality itself might be very far from absolute majority. The goal is to provide a protocol that, with high probability, brings the system into the configuration in which all agents support the (initial) plurality color.

We consider the Undecided-State Dynamics, a well-known protocol which uses just one more state (the undecided one) than those necessary to store colors.

We show that the speed of convergence of this protocol depends on the initial color configuration as a whole, not just on the gap between the plurality and the second largest color community. This dependence is best captured by the monochromatic distance, a novel notion we introduce, measuring the distance of the initial color configuration from the closest monochromatic one. In the complete graph, we prove that, for a wide range of the input parameters, this dynamics converges in a number of rounds that is logarithmic in the number of nodes and linear with respect to the aforementioned monochromatic distance. We prove that this upper bound is almost tight in the strong sense: Starting from any color configuration, the convergence time is linear in the monochromatic. Finally, we adapt the protocol above to obtain a fast, random walk-based protocol for plurality consensus on regular expanders. All our bounds hold with high probability.

2014

L. Becchetti, A. Clementi, E. Natale, F. Pasquale, R. Silvestri, and L. Trevisan

Simple Dynamics for Plurality Consensus

Proc. 26th ACM Symposium on Parallelism in Algorithms and Architectures (SPAA 2014), 2014

Show abstract

Hide abstract

We study a Majority Consensus process in which each of n anonymous agents of a communication network supports an initial opinion (a color chosen from a finite set [k]) and, at every time step, he can revise his color according to a random sample of neighbors. It is assumed that the initial color configuration has a sufficiently large bias s towards a fixed majority color, that is, the number of nodes supporting the majority color exceeds the number of nodes supporting any other color by s additional nodes. The goal (of the agents) is to let the process converge to the stable configuration where all nodes support the majority color. We consider a basic model in which the network is a clique and the update rule (called here the 3-majority dynamics) of the process is that each agent looks at the colors of three random neighbors and then applies the majority rule (breaking ties uniformly).

We prove that with high probability the process converges in time logarithmic in the number of nodes and linear in the number of colors. A natural question is whether looking at more (than three) random neighbors can significantly speed up the process. We provide a negative answer to this question: in particular, we show that samples of polylogarithmic size can speed up the process by a polylogarithmic factor only.

S. Lattanzi, and S. Leonardi

Efficient Computation of the Weighted Clustering Coefficient

Proc. 11th Workshop on Algorithms and Models for the Web Graph (WAW 2014), 2014

Show abstract

Hide abstract

The clustering coefficient of an unweighted network has been extensively used to quantify how tightly connected is the neighbor around a node and it has been widely adopted for assessing the quality of nodes in a social network. The computation of the clustering coefficient is challenging since it requires to count the number of triangles in the graph. Several recent works proposed efficient sampling, streaming and MapReduce algorithms that allow to overcome this computational bottleneck. As a matter of fact, the intensity of the interaction between nodes, that is usually represented with weights on the edges of the graph, is also an important measure of the statistical cohesiveness of a network. Recently various notions of weighted clustering coefficient have been proposed but all those techniques are hard to implement on large-scale graphs. In this work we show how standard sampling techniques can be used to obtain efficient estimators for the most commonly used measures of weighted clustering coefficient. Furthermore we also propose a novel graph-theoretic notion of clustering coefficient in weighted networks.

H. Aziz, and B. de Keijzer

Shapley Meets Shapley

Proc. 31st International Symposium on Theoretical Aspects of Computer Science (STACS 2014), 2014

Show abstract

Hide abstract

This paper concerns the analysis of the Shapley value in matching games. Matching games constitute a fundamental class of cooperative games which help understand and model auctions and assignments. In a matching game, the value of a coalition of vertices is the weight of the maximum size matching in the subgraph induced by the coalition. The Shapley value is one of the most important solution concepts in cooperative game theory. After establishing some general insights, we show that the Shapley value of matching games can be computed in polynomial time for some special cases: graphs with maximum degree two, and graphs that have a small modular decomposition into cliques or cocliques (complete k-partite graphs are a notable special case of this). The latter result extends to various other well-known classes of graph-based cooperative games. We continue by showing that computing the Shapley value of unweighted matching games is #P-complete in general. Finally, a fully polynomial-time randomized approximation scheme (FPRAS) is presented. This FPRAS can be considered the best positive result conceivable, in view of the #P-completeness result.

P. Rozenshtein, A. Anagnostopoulos, A. Gionis, and N. Tatti

Event Detection in Activity Networks [pdf]

Proc. 20th International Conference on Knowledge Discovery and Data Mining (KDD 2014), 2014

Show abstract

Hide abstract

With the fast growth of smart devices and social networks, a lot of computing systems collect data that record different types of activities. An important computational challenge is to analyze these data, extract patterns, and understand activity trends. We consider the problem of mining activity networks to identify interesting events, such as a big concert or a demonstration in a city, or a trending keyword in a user community in a social network.

We define an event to be a subset of nodes in the network that are close to each other and have high activity levels. We formalize the problem of event detection using two graph-theoretic formulations. The first one captures the compactness of an event using the sum of distances among all pairs of the event nodes. We show that this formulation can be mapped to the MaxCut problem, and thus, it can be solved by applying standard semidefinite programming techniques. The second formulation captures compactness using a minimum-distance tree. This formulation leads to the prize-collecting Steiner-tree problem, which we solve by adapting existing approximation algorithms. For the two problems we introduce, we also propose efficient and effective greedy approaches and we prove performance guarantees for one of them. We experiment with the proposed algorithms on real datasets from a public bicycling system and a geolocation-enabled social network dataset collected from twitter. The results show that our methods are able to detect meaningful events.

2013

A. Anagnostopoulos, L. Becchetti, B. de Keijzer, and G. Schäfer

Inefficiency of Games with Social Context [pdf]

Proc. 6th International Symposium on Algorithmic Game Theory (SAGT 2013), 2013

Show abstract

Hide abstract

2012

A. Anagnostopoulos, L. Becchetti, C. Castillo, A. Gionis, and S. Leonardi

Online Team Formation in Social Networks [pdf]

Proc. 21st International World Wide Web Conference (WWW 2012), 2012

Show abstract

Hide abstract

We study the problem of online team formation. We consider a setting in which people possess different skills and compatibility among potential team members is modeled by a social network. A sequence of tasks arrives in an online fashion, and each task requires a specific set of skills. The goal is to form a new team upon arrival of each task, so that (i) each team possesses all skills required by the task, (ii) each team has small communication overhead, and (iii) the workload of performing the tasks is balanced among people in the fairest possible way.

We propose efficient algorithms that address all these requirements: our algorithms form teams that always satisfy the required skills, provide approximation guarantees with respect to team communication overhead, and they are online-competitive with respect to load balancing. Experiments performed on collaboration networks among film actors and scientists, confirm that our algorithms are successful at balancing these conflicting requirements.

This is the first paper that simultaneously addresses all these aspects. Previous work has either focused on minimizing coordination for a single task or balancing the workload neglecting coordination costs.

A. Anagnostopoulos, R. Kumar, M. Mahdian, E. Upfal, and F. Vandin

Algorithms on Evolving Graphs [pdf]

Proc. 3rd ACM International Conference on Innovations on Theoretical Computer Science (ITCS 2012), 2012

Show abstract

Hide abstract

Motivated by applications that concern graphs that are evolving and massive in nature, we define a new general framework for computing with such graphs. In our framework, the graph changes over time and an algorithm can only track these changes by explicitly probing the graph. This framework captures the inherent tradeoff between the complexity of maintaining an up-to-date view of the graph and the quality of results computed with the available view. We apply this framework to two classical graph connectivity problems, namely, path connectivity and minimum spanning trees, and obtain efficient algorithms.

L. Becchetti, L. Bergamini, F. Ficarola, and A. Vitaletti

Population Protocols on Real Social Networks [pdf]

Proc. 5th Workshop on Social Network Systems (SNS 2012), 2012

Show abstract

Hide abstract

In this paper we present an experimental analysis to assess the performance of Population Protocols, a well-known fully decentralized computational model, on a real, evolving social network.

2011

A. Anagnostopoulos, G. Brova, and E. Terzi

Peer and Authority Pressure in Information-Propagation Models [pdf]

Proc. 22nd European Conference on Machine Learning and 15th European Conference on Principles and Practice of Knowledge Discovery in Databases (ECML/PKDD 2011), 2011

Show abstract

Hide abstract

Existing models of information diffusion assume that peer influence is the main reason for the observed propagation patterns. In this paper, we examine the role of authority pressure on the observed information cascades. We model this intuition by characterizing some nodes in the network as "authority" nodes. These are nodes that can influence large number of peers, while themselves cannot be influenced by peers. We propose a model that associates with every item two parameters that quantify the impact of the peer and the authority pressure on the item's propagation. Given a network and the observed diffusion patterns of the item, we learn these parameters from the data and characterize the item as peer- or authority-propagated. We also develop a randomization test that evaluates the statistical significance of our findings and makes our item characterization robust to noise. Our experiments with real data from online media and scientific-collaboration networks indicate that there is a strong signal of authority pressure in these networks.

2010

L. Becchetti, P. Boldi, C. Castillo, and A. Gionis

Efficient Algorithms for Large-Scale Local Triangle Counting [pdf]

ACM Transactions on Knowledge Discovery From Data, Volume 4, Number 3, 2010

Show abstract

Hide abstract

In this article, we study the problem of approximate local triangle counting in large graphs. Namely, given a large graph G=(V,E) we want to estimate as accurately as possible the number of triangles incident to every node v∈V in the graph. We consider the question both for undirected and directed graphs. The problem of computing the global number of triangles in a graph has been considered before, but to our knowledge this is the first contribution that addresses the problem of approximate local triangle counting with a focus on the efficiency issues arising in massive graphs and that also considers the directed case. The distribution of the local number of triangles and the related local clustering coefficient can be used in many interesting applications. For example, we show that the measures we compute can help detect the presence of spamming activity in large-scale Web graphs, as well as to provide useful features for content quality assessment in social networks.

For computing the local number of triangles (undirected and directed), we propose two approximation algorithms, which are based on the idea of min-wise independent permutations [Broder et al. 1998]. Our algorithms operate in a semi-streaming fashion, using O(|V|) space in main memory and performing O(log|V|) sequential scans over the edges of the graph. The first algorithm we describe in this article also uses O(|E|) space of external memory during computation, while the second algorithm uses only main memory. We present the theoretical analysis as well as experimental results on large graphs, demonstrating the practical efficiency of our approach.

A. Anagnostopoulos, L. Becchetti, C. Castillo, A. Gionis, and S. Leonardi

Power in Unity: Forming Teams in Large-Scale Community Systems [pdf]

Proc. 19th ACM International Conference on Information and Knowledge Management (CIKM 2010), 2010

Show abstract

Hide abstract

The internet has enabled the collaboration of groups at a scale that was unseen before. A key problem for large collaboration groups is to be able to allocate tasks effectively. An effective task assignment method should consider both how fit teams are for each job as well as how fair the assignment is to team members, in terms that no one should be overloaded or unfairly singled out. The assignment has to be done automatically or semi-automatically given that it is difficult and time-consuming to keep track of the skills and the workload of each person. Obviously the method to do this assignment must also be computationally efficient.

In this paper we present a general framework for task-assignment problems. We provide a formal treatment on how to represent teams and tasks. We propose alternative functions for measuring the fitness of a team performing a task and we discuss desirable properties of those functions. Then we focus on one class of task-assignment problems, we characterize the complexity of the problem, and we provide algorithms with provable approximation guarantees, as well as lower bounds. We also present experimental results that show that our methods are useful in practice in several application scenarios.

2008

L. Becchetti, P. Boldi, C. Castillo, and A. Gionis

Efficient Semi-Streaming Algorithms for Local Triangle Counting in Massive Graphs [pdf]

Proc. 14th International Conference on Knowledge Discovery and Data Mining (KDD 2008), 2008

Show abstract

Hide abstract

In this paper we study the problem of local triangle counting in large graphs. Namely, given a large graph G=(V,E) we want to estimate as accurately as possible the number of triangles incident to every node v∈V in the graph. The problem of computing the global number of triangles in a graph has been considered before, but to our knowledge this is the first paper that addresses the problem of local triangle counting with a focus on the efficiency issues arising in massive graphs. The distribution of the local number of triangles and the related local clustering coefficient can be used in many interesting applications. For example, we show that the measures we compute can help to detect the presence of spamming activity in large-scale Web graphs, as well as to provide useful features to assess content quality in social networks.

For computing the local number of triangles we propose two approximation algorithms, which are based on the idea of min-wise independent permutations (Broder et al. 1998). Our algorithms operate in a semi-streaming fashion, using O(|V|) space in main memory and performing O(log|V|) sequential scans over the edges of the graph. The first algorithm we describe in this paper also uses O(|E|) space in external memory during computation, while the second algorithm uses only main memory. We present the theoretical analysis as well as experimental results in massive graphs demonstrating the practical efficiency of our approach.

I. Bordino, D. Donato, A. Gionis, and S. Leonardi

Mining Large Networks with Subgraph Counting

Proc. 8th IEEE International Conference on Data Mining (ICDM 2008), 2008

Show abstract

Hide abstract

The problem of mining frequent patterns in networks has many applications, including analysis of complex networks, clustering of graphs, finding communities in social networks, and indexing of graphical and biological databases. Despite this wealth of applications, the current state of the art lacks algorithmic tools for counting the number of subgraphs contained in a large network.

In this paper we develop data-stream algorithms that approximate the number of all subgraphs of three and four vertices in directed and undirected networks. We use the frequency of occurrence of all subgraphs to prove their significance in order to characterize different kinds of networks: we achieve very good precision in clustering networks with similar structure. The significance of our method is supported by the fact that such high precision cannot be achieved when performing clustering based on simpler topological properties, such as degree, assortativity, and eigenvector distributions. We have also tested our techniques using swap randomization.

D. Donato, S. Leonardi, and M. Paniccia

Combining Transitive Trust and Negative Opinions for Better Reputation Management in Social Networks

Proc. 2nd ACM Workshop on Social Network Mining and Analysis (SNA-KDD 2008), 2008

Show abstract

Hide abstract

Reputation management is a crucial task in Peer-to-Peer networks, social networks and other decentralized distributed systems. In this work we investigate the role of users' negative opinions in order to devise fully decentralized reputation management mechanisms. Our study is motivated by the limitations of methods proposed in literature that are based on the idea of propagating positive opinions, most notably EigenTrust [9], a cornerstone method for reputation management in decentralized systems. EigenTrust makes use of a transitive deÞnition of trust: a peer tends to trust those peers who have a high reputation in the opinion of trustworthy peers. While EigenTrust has been shown to be effective against a number of threat attacks from coalitions of malicious players, it does not address properly more sophisticated threat attacks.

In this paper we propose a new approach to the design of fully decentralized reputation mechanisms that combine negative and positive opinions expressed by peers to reach a global consensus on trust and distrust valuess for each peer of the network. We show how these strategies outperform EigenTrust in terms of number of successful transactions against a large set of sophisticated threat attacks posed by coalitions of malicious peers. We also discuss a clustering method that achieves detecting most of the malicious peers with high precision.

2006

A. Capocci, V. Servedio, F. Colaiori, L. Buriol, D. Donato, S. Leonardi, and G. Caldarelli

Preferential Attachment in the Growth of Social Networks: The Internet Encyclopedia Wikipedia

Physical Review E, Volume 74, Number 3, 2006

Show abstract

Hide abstract

We present an analysis of the statistical properties and growth of the free on-line encyclopedia Wikipedia. By describing topics by vertices and hyperlinks between them as edges, we can represent this encyclopedia as a directed graph. The topological properties of this graph are in close analogy with those of the World Wide Web, despite the very different growth mechanism. In particular, we measure a scale-invariant distribution of the in and out degree and we are able to reproduce these features by means of a simple statistical model. As a major consequence, Wikipedia growth can be described by local rules such as the preferential attachment mechanism, though users, who are responsible of its evolution, can act globally on the network.

L. Buriol, G. Frahling, S. Leonardi, A. Marchetti-Spaccamela, and C. Sohler

Counting Triangles in Data Streams

Proc. 25th ACM SIGMOD-SIGACT-SIGART Symposium on Principles of Database Systems (PODS 2006), 2006

Show abstract

Hide abstract

We present two space bounded random sampling algorithms that compute an approximation of the number of triangles in an undirected graph given as a stream of edges. Our first algorithm does not make any assumptions on the order of edges in the stream. It uses space that is inversely related to the ratio between the number of triangles and the number of triples with at least one edge in the induced subgraph, and constant expected update time per edge. Our second algorithm is designed for incidence streams (all edges incident to the same vertex appear consecutively). It uses space that is inversely related to the ratio between the number of triangles and length 2 paths in the graph and expected update time O(log|V|(1+s|V|/|E|)), where s is the space requirement of the algorithm. These results significantly improve over previous work [20, 8]. Since the space complexity depends only on the structure of the input graph and not on the number of nodes, our algorithms scale very well with increasing graph size and so they provide a basic tool to analyze the structure of large graphs. They have many applications, for example, in the discovery of Web communities, the computation of clustering and transitivity coefficient, and discovery of frequent patterns in large graphs.

We have implemented both algorithms and evaluated their performance on networks from different application domains. The sizes of the considered graphs varied from about 8,000 nodes and 40,000 edges to 135 million nodes and more than 1 billion edges. For both algorithms we run experiments with parameter s=1,000, 10,000, 100,000, 1,000,000 to evaluate running time and approximation guarantee. Both algorithms appear to be time efficient for these sample sizes. The approximation quality of the first algorithm was varying significantly and even for s=1,000,000 we had more than 10% deviation for more than half of the instances. The second algorithm performed much better and even for s=10,000 we had an average deviation of less than 6% (taken over all but the largest instance for which we could not compute the number of triangles exactly).

Menu

Social-network analysis

Publications

2015

2014

2013

2012

2011

2010

2008

2006