We are doing a research on Restraining Rumors Spreading. Our goal is to propose a new strategy for clarifying the truth so that we can restraining rumors spreading as effectively as possible. But before any strategy, we need to model the social network correctly. Thus, here is the learning note of social network modeling.
Outline of Reading Material[1]
- Background
- Regard a network itself as a random object.
- Social networks anaslysis involves many fields such as math, statistics, computer science, physics and sociology.
- Independence
- theoretically, individuals in SN are assumed independent which is not possible in samples.
- The connection between individuals is at the haert of SN.
- Definitions
- A network is defined as a structure of nodes connected by edges.
- Edges can be directed or undirected.
- Vocabulary
- Nodes = Vertices = Actors
- Edges = Links = Relationships = Ties
- Network = Graphs
Erdos-Renyi(1959) or Bernoulli Model
If a SN has $N$ nodes, there are $D = {N\choose2}$ pairs of nodes, and under an assumption of symmetry, there are $2^D$ possible observed networks.- Erdos-Renyi Model
- Assumes that for every pair of nodes$(i, j)$, with probability $p$ an edge exists between the two nodes.
- all nodes attached to all other nodes: $p^D$
- all nodes disconnected: $(1-p)^D$
- also known as a Bernoulli network
- Erdos-Renyi Model
- Real World Properties of Networks
Real-world-networks tend not to resemble Bernoulli Network because BNs do not capture common characteristics of social networks such as transitivity: clustering, reciprocity or mutualitiy and betweenness. - Exponential Random Graph Models
- Within sociology, a class of models that has become widely used is the exponential random graph models(ERGMs) also know as p* models.
- $Y$ is the random variable, $N\times N$ matrix represents a networks. Thus the general form of these network models is:
$$P{\theta} \left(Y=y\right) = \text{exp}(\theta{1}z{1}( y ) + \theta{2}z{2}( y ) + … + \theta{k}z{k}( y)-\psi(\theta))$$ where the parameter is $\theta=\left(\theta{1},\theta{2},\ldots,\theta{k}\right)$,and the sufficient statistics are$\left(z{1}\left( y \right),z{2}\left( y \right),\ldots,z{k}\left( y \right)\right)$ and $\psi$ is a normalizing constant. For example,$P{\theta}\left(Y=y\right) = \text{exp}\left(\theta{1}z{1}\left( y \right) + \theta{2}z{2}\left( y \right) + \theta{3}z{3}\left( y \right)-\psi\left(\theta \right)\right)$, is an ERGM - $z{1}$, $z{2}$ and $z_{3}$ could be the graph statistics: number of triangles, number of edges and number of “2-stars”
- These models can statistically simulate real-world-relation.
- A Bernoulli network is a special case of an ERGM. Let $z(y)=$number of edges, then $P_{\theta}\left(Y=y\right) = \text{exp}\left(\theta z\left( y \right)\right)$.
- Inference for ERGMs
- In practice in the ERMG literature, only one network is observed, which is a sample size of one.
- For ERGMs, estimating the parameters from one observation of the network, as is done in practice, tends to be done using either a pseudo-likelihood estimation procedure or MCMC methods
- Latent Space Models
- The observed data is an $n \times n$ social network $Y$ with entries $y_{ij}$ denoting relationships between nodes $i$ and $j$ and corresponding covariate information, $X$.
- Unobserved are unknown latent positions, $z_i$, in social space, for all nodes $i=1,\ldots,n$, such that conditional upon these unobserved positions, the edges in the network are independent.
- $z$‘s and $\theta$ are treated as parameters to be estimated in the model $P(Y| z, x, \theta) = \prod_{i \neq j} {P(y_i | z_i, zj, X{i,j}, \theta)},$
- MLE for $Z$, MCMC for $\theta$.