### Identity and Search in Social Networks

Networks,Reading,Science — Zac Townsend @ October 30, 2012 10:48 am

This articles develops a model for explaining how "ordinary" individuals can search through their social networks and send messages to distant people they do not know.

Travers and Milgram (Sociometry, 1969) ran a experiment where they asked people in Omaha, NE to try to get a letter to randomly selected people in Boston. The Nebraskans were to do this by mailing the letter to an acquaintance they thought would be closer to the target. "The average length of the resulting acquaintance chains for the let ters that eventually reached the target (roughly 20%) was about six." These results suggest that short paths exist between many people and that ordinary people can find those paths. This is pretty cool because although most people know there friends, they don't know there friend's friends (imagine a pre-facebook world). The authors of the article I am reading today call the ability to find targets "searchability." They question the accuracy of previous studies who tried to model social network using 1) hubs or highly connected people that you need only reach and then you will reach the target or 2) geometric lattices (which often have some assumption that every node has the same number of edges, see regular networks from yesterday's post) because neither of these network types, however, is a satisfactory model of society. (As an aside, models should not be judged on whether or not their assumptions comport with reality but on whether they make useful and accurate predictions, see Gary King or Milton Friedman on this point).

The authors build a model from "six contentions about social networks":

1. People have identities and social groups are collections of identities.
2. People break their society up in to hierarchies. They use an example of a specialized subgroup with an academic department that in turn is within a university. Formerly they put an upper bound on group size near one hundred, formally $g \approx 100$. They define the similarity between two individuals $i$ and $j$ as $x_{ij}$, where $x_{ij}=1$ if they are in the same group ($=2$ if they are in the same department in the example above).
3. Group membership is the the primary basis for social interaction and thus you're more likely to know someone the closer you are to being in the same group. The chance that two people are acquaintances decreases when the groups they belong to are more dissimilar. To build their social network, they keep making connections between people until the average numbers of friends equals z. They make these links by randomly choosing an individual, then choosing a link distance, $x$, to add to that person with $p(x)=c\cdot e^{\alpha x}$, and then by choosing a random person in that other level that is $x$ levels away and making the link. $\alpha$ is a "tunable" parameter and is a measure of homophily--the tendency of similar people to group together--in the created social network.
4. People can belong to different hierarchical groups (I am a member of the City of Newark workforce, the Brown alumni, the Manhattan democrats, Brooklyn) and it is assumed these groups are independent. This is a standard assumption but you can see in my group membership alone that Brown and Brooklyn likely aren't wholly statistically independent. In this way a person's identity is defined by the position people have in various hierarchies defined in the model, represented by the H-dimensional vector $\vec{v}_i$, where where $v_i^h$ is the position of node $i$ in the $h^{th}$ hierarchy, or dimension. Each node is randomly assigned a location in the various hierarchies and then the links are made as is indicated in point three above (I think it would have made more sense to explain this point than that one).
5. People construct a social distance to perceive the similarity between themselves and others. This is here defined as the minimum distance that exist between nodes across all hierarchies, or $y_{ij}=min_h x^h_{ij}$. It is interesting to note that "social distance violates the triangle inequality ... because individuals i and j can be close in dimension $h_1$, and individuals j and k can be close in dimension $h_2$, yet i and k can be far apart in both dimensions. A simple example might be to use my dad as individual j, me as individual i, and one of my dad's work colleagues at Mt. Sinai as k. Although my dad and I are in the same family and he and his co-worker have the same occupation ($x_{ij}=x_{jk}=1$) the co-worker and I are quite socially distant.
6. Individuals are only aware of their immediate network across all hierarchies, so that if a person forwards a message to a single neighbor that person does not know how much closer, if at all, the neighbor is to the intended recipient.

With this model, "[i]ndividuals therefore have two kinds of partial information: social distance, which can be measured globally but which is not a true distance (and hence can yield misleading estimates); and network paths, which generate true distances but which are known only locally."

With this model setup, they show that a greedy algorithm (the same as the one Milgram suggests) can efficiently direct a message to a receipt in a short number of steps.

They do some analysis that is worth noting. First, they point out (apparently against the previous literature) that the average length of a message chain, defined as the metric $$ has to be short in an absolute sense. The message chain length cannot scale with population n because of at each point there is a failure/termination rate of 25% (i.e. one and four people get the letter and do not forward it at all). Secondly, through some simple math they show that with a termination rate of 25% and a success rate fixed at above 5% that the average message length chain, $$ must be less than 10.4 independent of population size. With this, they can create diagrams of the parameter space in H and $\alpha$ that meets that requirement. That is, they can create a graph of all the possible hierarchy number and homophily parameters that meet that requirement in $$, thus:

Our main result is that searchable networks occupy a broad region of parameter space $(\alpha ,H)$ which [...] corresponds to choices of the model parameters that are the most sociologically plausible. Hence our model suggests that searchability is a generic property of real-world social networks.