Emergence of Scaling in Random Networks

Networks,Reading,Science — Zac Townsend @ October 31, 2012 10:12 am

This article by Barabasi and Albert provide an explanation for why the distribution of links over nodes in many real-world networks follow a power-law distribution. Power-law distributions are defined by a concentration of the density in a small number of units and then a long-tail.

The distribution function of connectivities for various large networks. (A) Actor collaboration graph with N=212,250 vertices (B) WWW, N=325,729 (C) Power grid data, N =4941, . The dashed lines have slopes (A)=2.3, (B)=2.1 and (C)=4

After going through some example of networks with complex topologies--everything from neurons to the World Wide Web--they note that networks have traditionally been described by the random network model of Erdos and Renyi (the "ER" model). But, the ER model (and the WS model) is not up to the task of explaining the scale-free power law distributions that they find by "exploring several large databases describing the topology of large networks that span fields as diverse as the WWW or citation patterns in science."

More precisely, the authors find that independent of the system by which the networks are created, that the probability P(k) that a node has k links follos $P(k) \sim k^{- \gamma}$. They explain these distributions by developing a model that includes growth and preferential attachment--in short, 1) that networks start small and grow larger and 2) that no matter if a) because early nodes (those that exist when the network is young) have more random chances to be linked to new nodes or because b) there is "preferential attachment" made to those nodes that are already popular, that there will be some winners with many more links than most other nodes.

Why do the ER and WS models not accurately explain the power-law phenomena? Both models start with a fixed number of nodes and then randomly connect them (ER) or reconnected (WS), whereas this model takes explicit account of the growing number of nodes that occurs in most real world networks. Additionally, the two random network models assume that the probability that any two nodes are connected is random and uniform, whereas this model includes preferential connectivity:

For example, a new actor is most likely to be cast in a supporting role with more established and better-known actors. Consequently, the probability that a new actor will be cast with an established one is much higher than that the new actor will be cast with other less-known actors.

So, how does their model work? It begins with some small number of nodes and then keeps adding new nodes with the probability that the new nodes will be connected to all the previous nodes weighted by the previous nodes popularity. In their language,

1. To incorporate the growing character of the network, starting with a small number ($m_0$) of nodes,
at every time step they add a new node with $m(m_0)$ links, so the new node connects to $m$ different vertices already present in the system.
2. To incorporate preferential attachment, they assume that the probability $\Pi$ that a new node will be connected to node $i$ depends on the connectivity $k_i$ of that node, so that $\Pi (k_i)=\frac{k_i}{\sum_jk_j}$.

They continue by developing slight variants of this model--one that assumes growth but not preferential treatment and one that assumes preferential treatment but not growth--and show that neither is sufficient to explain the emergent scale-free power law distributions seen in real-world networks.

Because of preferential treatment, early gains are quite important and older nodes increase their connectivity at the expense of the younger ones. This leads to some nodes "that are highly connected, a “rich-get-richer” phenomenon that can be easily detected in real networks."

Working the Room

Politics,Reading — Zac Townsend @ October 31, 2012 9:09 am

An essay in Lapham's Quarterly about politicians being or not being funny.

The paradox of democracy is that we elect someone on the basis of being just like us and then criticize them for not being better than us. To be elected as a political leader in a democracy is to occupy three positions relative to the other citizens: they must be better than us, for they must lead us; they must be less than us because they err greatly and publicly; and they must be one of us, a citizen among their peers. Comedy can be a way of coping with such conflicting roles; rhetorical humor is a tool to help master them.

Identity and Search in Social Networks

Networks,Reading,Science — Zac Townsend @ October 30, 2012 10:48 am

This articles develops a model for explaining how "ordinary" individuals can search through their social networks and send messages to distant people they do not know.

Travers and Milgram (Sociometry, 1969) ran a experiment where they asked people in Omaha, NE to try to get a letter to randomly selected people in Boston. The Nebraskans were to do this by mailing the letter to an acquaintance they thought would be closer to the target. "The average length of the resulting acquaintance chains for the let ters that eventually reached the target (roughly 20%) was about six." These results suggest that short paths exist between many people and that ordinary people can find those paths. This is pretty cool because although most people know there friends, they don't know there friend's friends (imagine a pre-facebook world). The authors of the article I am reading today call the ability to find targets "searchability." They question the accuracy of previous studies who tried to model social network using 1) hubs or highly connected people that you need only reach and then you will reach the target or 2) geometric lattices (which often have some assumption that every node has the same number of edges, see regular networks from yesterday's post) because neither of these network types, however, is a satisfactory model of society. (As an aside, models should not be judged on whether or not their assumptions comport with reality but on whether they make useful and accurate predictions, see Gary King or Milton Friedman on this point).

The authors build a model from "six contentions about social networks":

1. People have identities and social groups are collections of identities.
2. People break their society up in to hierarchies. They use an example of a specialized subgroup with an academic department that in turn is within a university. Formerly they put an upper bound on group size near one hundred, formally $g \approx 100$. They define the similarity between two individuals $i$ and $j$ as $x_{ij}$, where $x_{ij}=1$ if they are in the same group ($=2$ if they are in the same department in the example above).
3. Group membership is the the primary basis for social interaction and thus you're more likely to know someone the closer you are to being in the same group. The chance that two people are acquaintances decreases when the groups they belong to are more dissimilar. To build their social network, they keep making connections between people until the average numbers of friends equals z. They make these links by randomly choosing an individual, then choosing a link distance, $x$, to add to that person with $p(x)=c\cdot e^{\alpha x}$, and then by choosing a random person in that other level that is $x$ levels away and making the link. $\alpha$ is a "tunable" parameter and is a measure of homophily--the tendency of similar people to group together--in the created social network.
4. People can belong to different hierarchical groups (I am a member of the City of Newark workforce, the Brown alumni, the Manhattan democrats, Brooklyn) and it is assumed these groups are independent. This is a standard assumption but you can see in my group membership alone that Brown and Brooklyn likely aren't wholly statistically independent. In this way a person's identity is defined by the position people have in various hierarchies defined in the model, represented by the H-dimensional vector $\vec{v}_i$, where where $v_i^h$ is the position of node $i$ in the $h^{th}$ hierarchy, or dimension. Each node is randomly assigned a location in the various hierarchies and then the links are made as is indicated in point three above (I think it would have made more sense to explain this point than that one).
5. People construct a social distance to perceive the similarity between themselves and others. This is here defined as the minimum distance that exist between nodes across all hierarchies, or $y_{ij}=min_h x^h_{ij}$. It is interesting to note that "social distance violates the triangle inequality ... because individuals i and j can be close in dimension $h_1$, and individuals j and k can be close in dimension $h_2$, yet i and k can be far apart in both dimensions. A simple example might be to use my dad as individual j, me as individual i, and one of my dad's work colleagues at Mt. Sinai as k. Although my dad and I are in the same family and he and his co-worker have the same occupation ($x_{ij}=x_{jk}=1$) the co-worker and I are quite socially distant.
6. Individuals are only aware of their immediate network across all hierarchies, so that if a person forwards a message to a single neighbor that person does not know how much closer, if at all, the neighbor is to the intended recipient.

With this model, "[i]ndividuals therefore have two kinds of partial information: social distance, which can be measured globally but which is not a true distance (and hence can yield misleading estimates); and network paths, which generate true distances but which are known only locally."

With this model setup, they show that a greedy algorithm (the same as the one Milgram suggests) can efficiently direct a message to a receipt in a short number of steps.

They do some analysis that is worth noting. First, they point out (apparently against the previous literature) that the average length of a message chain, defined as the metric $$ has to be short in an absolute sense. The message chain length cannot scale with population n because of at each point there is a failure/termination rate of 25% (i.e. one and four people get the letter and do not forward it at all). Secondly, through some simple math they show that with a termination rate of 25% and a success rate fixed at above 5% that the average message length chain, $$ must be less than 10.4 independent of population size. With this, they can create diagrams of the parameter space in H and $\alpha$ that meets that requirement. That is, they can create a graph of all the possible hierarchy number and homophily parameters that meet that requirement in $$, thus:

Our main result is that searchable networks occupy a broad region of parameter space $(\alpha ,H)$ which [...] corresponds to choices of the model parameters that are the most sociologically plausible. Hence our model suggests that searchability is a generic property of real-world social networks.

Faces, Places, Spaces

Humanities,Reading — Zac Townsend @ October 30, 2012 9:59 am

A great read on geography and ideas. Starts with a review of two new books on geography as a primal force in history, goes on the criticize that argument with counterexamples, and continues with a great distillation of the recent historiography around Eastern Europe and the Holocaust, and rounds itself out with a discussion of the U S of A.

Another version of space history is available these days, though. This might be called the cartographic turn, and is characterized by the argument that, while geography matters, it is visible only through the maps that we make of it. Where borders fall is as much a matter of how things are seenas how they really are. We can know the shape of the planet only through maps—maps in the ordinary glove-compartment sense, maps in a broader metaphoric one—and those maps are made by minds attuned to the relations of power. All nations are shaped by belligerence and slaughter.

Security Isn't Free, Either

Economics,Ideas,Reading — Zac Townsend @ October 30, 2012 9:30 am

On the tradeoff between risk and reward. It begins with a simple anecdote:

I once heard a Harvard professor give a talk describing the yumminess vs. safety scale for food regulation. Yummy food (yes, I’m quoting here) is frequently unsafe, and safe food rarely yummy. Head to the Texas border, he said, to see it in action. The same ingredients are used in burritos on either side of the border, but the Mexican version, with its unpasteurized cheese and fresher, unmedicated chicken, bursts with both flavor and, occasionally, Salmonella. The American version is recognizable as a burrito but has little taste in common with its more daring cousin. We’ve sacrificed some deliciousness in favor of safety.

But don't be fooled, this guy has things to say:

Such questions are the stuff of nightmares and of responsible government, and the abilities to humanely understand, resolve and – most of all – explain their nuances are the differences between great leaders and tyrants.

I have been very interested lately in thinking about risks and too which extents government should go to mitigate them. I don't have anything fully formed on the topic yet, but is pretty obvious that as the limit reaches probability zero on any risk the cost function approaches infinity. That is, it gets more and more expensive to decrease smaller and smaller risks toward making something not happen with probability one. That might be worth it if the risk is nuclear attack on New York, but it makes less sense in the case of things the government avoids just because they look bad.

All Hail the Nanny State

New York City,Reading — Zac Townsend @ October 30, 2012 9:25 am

There can be no ambiguity about the progressive nature of Bloomberg’s long, unprecedented war on smoking and obesity: It is aimed squarely at the city’s poor. Uncertainty about this exists only because the agenda has passed in a spread-out package of citywide initiatives over a decade, each one just small enough—just un-ambitious enough—to be mocked as silly and then adjusted to by residents, and later imitated around the country, without smelling of “welfare.” The public face of the agenda, in fact, has been about taking away, in the form of bans and restrictions, rather than handing out. See: the two major smoking bans (bars in 2002, parks in 2011); the 2005 phase-out of trans fats; the 2008 requirement of restaurants to display calorie counts. While these “draconian” restrictions drew attention and protest, other city programs—such as expanded food stamps, diet education, quitting-smoking assistance, even coupons for farmers markets—have thrived in complement with their higher-profile cousins.

The Art of the Art Heist

Humanities,Reading — Zac Townsend @ October 30, 2012 9:03 am

Heist as art criticism.

The Meyer de Haan is a self-portrait by a minor artist most people have never heard of. It is worth only a fraction of what the other paintings are worth. Jop Ubbens, the general director of Christie's in Amsterdam told The Guardian that the de Haan "might have been stolen by mistake." The Guardian's art critic, Jonathan Jones, thinks "any idea that a tasteful collector commissioned this theft is undermined by the inclusion of Meyer de Haan's “Self-Portrait”. No offence, but this comparatively minor Dutch artist does not really belong in the company of the others whose works have been stolen." True. But suppose for a moment that it wasn't a mistake. Suppose that whoever masterminded this robbery actually did want the Meyer de Haan. Why? What does the painting by Meyer de Haan tell us? What might that self-portrait have to do with all the other, more famous paintings that were stolen? There is a mystery here, perhaps, that only needs the right key for unlocking, the right set of questions. And the first, most obvious question is staring us right in the face.

Who is Meyer de Haan?

Collective Dynamics of ‘Small-world’ Networks

Networks,Reading,Science — Zac Townsend @ October 29, 2012 11:50 am

This article explores the dynamics of a network that is somewhere between regular and random.Regular networks are those where every node has the same number of connections. On the other hand, random networks are those that are built by introducing new nodes and randomly assigning edges between the new nodes and the old nodes with some probability, p (see the Erdos-Renyi model).

This paper "interpolates" between these two models by starting with a regular network (a ring lattice to be precise) and then rewiring each edge at random with probability p; in this way the graph is "tuned" between regular ($p=0$) and complete disorder / randomness ($p=1$).

To continue, we have to understand two measures that are often used in network theory: $L(p)$ and $C(p)$. $L(p)$ is the average shortest length between all pairs of nodes. That is, if you find the shortest path between all pairs of nodes and then you take the average of those shortest paths you get $L(p)$. $C(p)$ is the clustering coefficient and is a measure of how many links exist between nodes divided by the number of possible such links.

For friendship networks, these statistics have intuitive meanings: $L$ is the average number of friendships in the shortest chain connecting two people; $C_v$ reflects the extent to which friends of $v$ are also friends of each other; and thus $C$ measures the cliquishness of a typical friendship circle.

The $p=0$ case is a "highly clustered, large world where $L$ grows linearly with $n$" while the $p=1$ case is a "poorly clustered, small world where $L$ grows only logarithmically with $n$". The key insight here is that allow it appears form these end points that large C and large L (and small C and small C) would always be associated from this simple analysis, it turns out that there is a "broad interval" where things stay clustered but the average length between nodes ($L(P)$) drops quickly. It's a "small world" because as you move, even just a little but, from a regular network to a random one there are introduced "short cuts" between groups that decreases the length: "The important implication here is that at the local level (as reflected by C(p)), the transition to a small world is almost undetectable."

The rest of the paper (Nature articles are quite short) shows three empirical examples fit this "small world" network model--IMDB film actors, the power gift, and the neural network of a nematode worm. The paper then finishes testing the implication of the network on a simulated disease outbreak where they show that 1) the smaller the L(p) of a network the faster the infection effects the entire population, 2) That a network just needs to be a little not regular for L(p) to drop significantly and 3) If it's a small world disease spread much more quickly than previous models suggest.

We hope that our work will stimulate further studies of smallworld networks. Their distinctive combination of high clustering with short characteristic path length cannot be captured by traditional approximations such as those based on regular lattices or random graphs. Although small-world architecture has not received much attention, we suggest that it will probably turn out to be widespread in biological, social and man-made systems, often with important dynamical consequences.

General Failure

Politics,Reading — Zac Townsend @ October 29, 2012 9:58 am

On the history of firing generals, and why we should do it more, or, more specifcially, that the Army doesn't punish or reward it's general officers well. From the article:

But the Army continues to do too little to sort the average performers from the outstanding ones. That has long-term consequences for the caliber of military leaders. A recent survey by students at Harvard’s Kennedy School of Government found that good young officers have left the military in large part because of frustration with military bureaucracy and the sense that the armed forces do not have a performance-oriented system for managing personnel.

What Happens in Brooklyn Moves to Vegas

Reading,Social Science — Zac Townsend @ October 28, 2012 9:27 am

On a crazy and very impressive social experiment by Tony Hsieh to save Las Vegas:

The Downtown Project is hoping to draw 10,000 “upwardly mobile, innovative professionals” to the area in the next five years. And according to Hsieh, he and his team receive requests for seed money from dozens of people every week. In return, the Downtown Project asks not just for a stake in the companies but also for these entrepreneurs to live and work in downtown Las Vegas. (They’re also expected to give back to the community and hand over contacts for future recruits.) In expectation of all these newcomers, the project has already set up at least 30 real estate companies, bought more than 15 buildings and broken ground on 16 construction projects.

For those entrepreneurs who live in other parts of the country, and most do, the question often comes down to how eager they are to relocate to a downtown area filled with liquor stores and weekly hotels. Less than a year after the project was officially established, about 15 tech start-ups have signed on. The first tech investment went to Romotive, a company developing smart-phone-controlled personal robots. Money has also gone to Local Motion, a start-up that designs networks for sharing vehicles, and Digital Royalty, a social-media company.