Complexity and Decision-Making

Economics,Social Science — Zac Townsend @ November 6, 2012 9:26 am

The Great Man theory of history is usually considered too limited (see yesterday's post). This argument is perhaps best expressed in War and Peace, where Tolstoy goes on long discussions on the imaginary significance of great men, including obviously Napoleon. Or, as Isaiah Berlin says in the "Hedgehog and the Fox: An Essay on Tolstoy's View of History," Tolstoy perceived a "central tragedy" of human life:

...if only men would learn how little the cleverest and most gifted among them can control, how little they can know of all the multitude of factors the orderly movement of which is the history of the world...

We can think of this in a more limited fashion when it comes to Generals or even CEOs. This comment on Hacker News put it well:

People overrate what people can honestly achieve in highly chaotic environments. 15% of corporate CEOs are replaced every year - notice how companies don't change much from year to year though - I have. However, changing often definitely let's us lionize the lucky ones (see hedge funds, startups, novels, movies, tv shows and any other at scale, highly path dependent, chaotic and random systems).

My question though is whether this is changing. "Big data" and associated analyses may give us the ability to understand large systems in ways that were never before possible.

Just as an example, Friedrich Hayek has a famous argument in his work that the reason the government cannot run the "commanding heights" of the economy is because of information. Simply put there is no way for the government to amass and understand the information necessary to choose prices an set supply and demand, and that price is the only true reflection of preferences. History might have caught up with Hayek. We are entering an era where massive datasets and computational social science will allow us to understand people's revealed preferences better than any mechanism in history--even prices. Obviously I have set up a straw-man here in a sense, but the greater point is that we can begin to understand people's behavior and preferences far better by gathering massive information about them and their surroundings than can be revealed by a thousand theorems in American Economic Review.

I am particularly interested in what this means for local, state, and federal governments. (Political campaigns already use massive amounts of voter history and consumer data to microtarget potential voters, see this book and this one.) Governments collect large amounts of data on the services provided to individuals and outcomes of those service. New York City is slowly building the capabilities to cross-reference and understand all of this data and its implications for human behavior. As they work to collect, knit together, and derive meaning from massive administrative datasets, the very nature of what governments can know about citizens and how they can provide services could change.

Cesar Hildalgo comments on the data collected by government and the hope they move to big data:

Governments are much slower, but they're starting to collect data, and they have always been a very information-intensive business. Governments invented taxing, and taxation requires fine-grain data on how much you earn and where you live. Governments, actually "states" a long time ago, invented last names. People in villages didn't need last names. You were able to get around with just a first name. They had to invent last names for taxation, for drafting, so government is a very information-intensive business. In their innovation agenda, in order to do the things that they do better, governments are going to need to embrace big data.

I see, little by little, that there are people inside all of these organizations that are starting to have that battle. They tend to be younger people and were born into this Internet generation. Sometimes it's hard for them to have this fight. As time goes on, there's going to be more and more people that are going to see the value of data that is not only monetization, but also is providing better services, is understanding the world better, is understanding diseases, understanding the way that cities work, mobility, many types of things. Not just targeting people with ads. I think that there's more than that.

The three main problem paradigms (prediction, modeling, and detection) of machine learning, data mining, and artificial intelligence can be used to ask questions about government as service. For example, can we predict in a child welfare context who is most likely to end up in a juvenile deliquescence context? Can we model the individuals who receive housing subsidies that are most likely to commit crime? Can we detect the spread of knowledge of a new government program? What predictions can we made using leading indicator data? Allow me to focus on New York City as an illustrative example. New York City has a system that allows users to screen families for more than 35 city, state, and federal benefit programs. On the other hand, City agencies collect a massive number of variables related to demographics, location, risk-assessment tools, court dates, child welfare contact, police action, and more. If this data could be knitted together, we could begin to understand the life-cycle of families in their behavior and use of government service, and begin to model needs profiles in a way never before possible.

So, we live in an era where we have a large number of sensors that are collecting more data, we have the means and methods to analyze the gathered data, and we have the ability to be dynamic, instantly responsive models. With this all together, we may enter an era where CEOs, Generals, and other leaders are able to understand and respond to chaotic systems.

On Porgy and Bess

Humanities — Zac Townsend @ November 5, 2012 9:37 am

"It is a Russian who has directed it, two Southerns who have written its book, two Jewish boys who have composed its lyrics and music and a stage full of negros who sing and act it to perfection. The result is one of the far famed wonders of the melting pot; the most American opera that has yet been seen or heard." —John Mason Brown; New York Evening Post, 1935

On the Value of Complexity

Science — Zac Townsend @ November 4, 2012 9:23 am

Cesar Hildago again:

One example that I like very much to try to make a distinction between those approaches and other approaches is that of an F-22 jet fighter. This is an example from my friend Francisco Claro. The idea is that an F-22 fighter is actually quite an expensive machine. You need to have a lot of money to buy an F-22. An F-22, being a very expensive machine, is also a very complex machine. It has a lot of parts, and there were a lot of people with a lot of different types of expertise that went in to generate that machine.

If you take the price of an F-22 and you divide it by its weight, you get that, per pound, cost something between silver and gold. It's that expensive! . Now, take your F-22 and crash it against a hill, or crash it against the ocean, blow it up into tiny little bits and pieces. How valuable it is now? It's probably way less valuable than silver. It's probably almost worthless after it's broken down. So, where was the value?

The value cannot be in any of the parts or in any of the materials, or in anything other than the complexity of how these things come together. So actually, value is set by the property of organization. It's more of an entropic, or anti-entropic more precisely, idea of value.

What Doctors Don't Know About the Drugs They Prescribe

Science,Social Science — Zac Townsend @ November 3, 2012 9:20 am

When a new drug gets tested, the results of the trials should be published for the rest of the medical world -- except much of the time, negative or inconclusive findings go unreported, leaving doctors and researchers in the dark. In this impassioned talk, Ben Goldacre explains why these unreported instances of negative data are especially misleading and dangerous.

It might seem less disturbing, but the same could be said about our knowledge about education, social science, housing, etc...

On Smart Failure

Social Science — Zac Townsend @ November 2, 2012 9:15 am

This guy is all over the place, but I find his idea that bureaucracies should prioritize implementations in the following order to be very interesting: old ideas that fail are worst, next is old ideas that succeed, above that is new ideas that fail, and finally new ideas that succeed.  He calls this a strong culture of “smart failure."

Obama Doesn't LIke People

Politics — Zac Townsend @ November 1, 2012 9:13 am

I keep coming back to something that was said about Obama [in this article about the Clintons]:

“People say the reason Obama wouldn’t call Clinton is because he doesn’t like him,” observes Tanden. “The truth is, Obama doesn’t call anyone, and he’s not close to almost anyone. It’s stunning that he’s in politics, because he really doesn’t like people. My analogy is that it’s like becoming Bill Gates without liking computers.”

Emergence of Scaling in Random Networks

Networks,Reading,Science — Zac Townsend @ October 31, 2012 10:12 am

This article by Barabasi and Albert provide an explanation for why the distribution of links over nodes in many real-world networks follow a power-law distribution. Power-law distributions are defined by a concentration of the density in a small number of units and then a long-tail.

The distribution function of connectivities for various large networks. (A) Actor collaboration graph with N=212,250 vertices (B) WWW, N=325,729 (C) Power grid data, N =4941, . The dashed lines have slopes (A)=2.3, (B)=2.1 and (C)=4

After going through some example of networks with complex topologies--everything from neurons to the World Wide Web--they note that networks have traditionally been described by the random network model of Erdos and Renyi (the "ER" model). But, the ER model (and the WS model) is not up to the task of explaining the scale-free power law distributions that they find by "exploring several large databases describing the topology of large networks that span fields as diverse as the WWW or citation patterns in science."

More precisely, the authors find that independent of the system by which the networks are created, that the probability P(k) that a node has k links follos P(k) \sim k^{- \gamma}. They explain these distributions by developing a model that includes growth and preferential attachment--in short, 1) that networks start small and grow larger and 2) that no matter if a) because early nodes (those that exist when the network is young) have more random chances to be linked to new nodes or because b) there is "preferential attachment" made to those nodes that are already popular, that there will be some winners with many more links than most other nodes.

Why do the ER and WS models not accurately explain the power-law phenomena? Both models start with a fixed number of nodes and then randomly connect them (ER) or reconnected (WS), whereas this model takes explicit account of the growing number of nodes that occurs in most real world networks. Additionally, the two random network models assume that the probability that any two nodes are connected is random and uniform, whereas this model includes preferential connectivity:

For example, a new actor is most likely to be cast in a supporting role with more established and better-known actors. Consequently, the probability that a new actor will be cast with an established one is much higher than that the new actor will be cast with other less-known actors.

So, how does their model work? It begins with some small number of nodes and then keeps adding new nodes with the probability that the new nodes will be connected to all the previous nodes weighted by the previous nodes popularity. In their language,

  1. To incorporate the growing character of the network, starting with a small number (m_0) of nodes,
    at every time step they add a new node with m(m_0) links, so the new node connects to $m$ different vertices already present in the system.
  2. To incorporate preferential attachment, they assume that the probability \Pi that a new node will be connected to node i depends on the connectivity k_i of that node, so that \Pi (k_i)=\frac{k_i}{\sum_jk_j}.

They continue by developing slight variants of this model--one that assumes growth but not preferential treatment and one that assumes preferential treatment but not growth--and show that neither is sufficient to explain the emergent scale-free power law distributions seen in real-world networks.

Because of preferential treatment, early gains are quite important and older nodes increase their connectivity at the expense of the younger ones. This leads to some nodes "that are highly connected, a “rich-get-richer” phenomenon that can be easily detected in real networks."

Working the Room

Politics,Reading — Zac Townsend @ October 31, 2012 9:09 am

An essay in Lapham's Quarterly about politicians being or not being funny.

The paradox of democracy is that we elect someone on the basis of being just like us and then criticize them for not being better than us. To be elected as a political leader in a democracy is to occupy three positions relative to the other citizens: they must be better than us, for they must lead us; they must be less than us because they err greatly and publicly; and they must be one of us, a citizen among their peers. Comedy can be a way of coping with such conflicting roles; rhetorical humor is a tool to help master them.

Identity and Search in Social Networks

Networks,Reading,Science — Zac Townsend @ October 30, 2012 10:48 am

This articles develops a model for explaining how "ordinary" individuals can search through their social networks and send messages to distant people they do not know.

Travers and Milgram (Sociometry, 1969) ran a experiment where they asked people in Omaha, NE to try to get a letter to randomly selected people in Boston. The Nebraskans were to do this by mailing the letter to an acquaintance they thought would be closer to the target. "The average length of the resulting acquaintance chains for the let ters that eventually reached the target (roughly 20%) was about six." These results suggest that short paths exist between many people and that ordinary people can find those paths. This is pretty cool because although most people know there friends, they don't know there friend's friends (imagine a pre-facebook world). The authors of the article I am reading today call the ability to find targets "searchability." They question the accuracy of previous studies who tried to model social network using 1) hubs or highly connected people that you need only reach and then you will reach the target or 2) geometric lattices (which often have some assumption that every node has the same number of edges, see regular networks from yesterday's post) because neither of these network types, however, is a satisfactory model of society. (As an aside, models should not be judged on whether or not their assumptions comport with reality but on whether they make useful and accurate predictions, see Gary King or Milton Friedman on this point).

The authors build a model from "six contentions about social networks":

  1. People have identities and social groups are collections of identities.
  2. People break their society up in to hierarchies. They use an example of a specialized subgroup with an academic department that in turn is within a university. Formerly they put an upper bound on group size near one hundred, formally g \approx 100. They define the similarity between two individuals i and j as x_{ij}, where x_{ij}=1 if they are in the same group (=2 if they are in the same department in the example above).
  3. Group membership is the the primary basis for social interaction and thus you're more likely to know someone the closer you are to being in the same group. The chance that two people are acquaintances decreases when the groups they belong to are more dissimilar. To build their social network, they keep making connections between people until the average numbers of friends equals z. They make these links by randomly choosing an individual, then choosing a link distance, x, to add to that person with p(x)=c\cdot e^{\alpha x}, and then by choosing a random person in that other level that is x levels away and making the link. \alpha is a "tunable" parameter and is a measure of homophily--the tendency of similar people to group together--in the created social network.
  4. People can belong to different hierarchical groups (I am a member of the City of Newark workforce, the Brown alumni, the Manhattan democrats, Brooklyn) and it is assumed these groups are independent. This is a standard assumption but you can see in my group membership alone that Brown and Brooklyn likely aren't wholly statistically independent. In this way a person's identity is defined by the position people have in various hierarchies defined in the model, represented by the H-dimensional vector \vec{v}_i, where where v_i^h is the position of node i in the h^{th} hierarchy, or dimension. Each node is randomly assigned a location in the various hierarchies and then the links are made as is indicated in point three above (I think it would have made more sense to explain this point than that one).
  5. People construct a social distance to perceive the similarity between themselves and others. This is here defined as the minimum distance that exist between nodes across all hierarchies, or y_{ij}=min_h x^h_{ij}. It is interesting to note that "social distance violates the triangle inequality ... because individuals i and j can be close in dimension $h_1$, and individuals j and k can be close in dimension $h_2$, yet i and k can be far apart in both dimensions. A simple example might be to use my dad as individual j, me as individual i, and one of my dad's work colleagues at Mt. Sinai as k. Although my dad and I are in the same family and he and his co-worker have the same occupation (x_{ij}=x_{jk}=1) the co-worker and I are quite socially distant.
  6. Individuals are only aware of their immediate network across all hierarchies, so that if a person forwards a message to a single neighbor that person does not know how much closer, if at all, the neighbor is to the intended recipient.

With this model, "[i]ndividuals therefore have two kinds of partial information: social distance, which can be measured globally but which is not a true distance (and hence can yield misleading estimates); and network paths, which generate true distances but which are known only locally."

With this model setup, they show that a greedy algorithm (the same as the one Milgram suggests) can efficiently direct a message to a receipt in a short number of steps.

They do some analysis that is worth noting. First, they point out (apparently against the previous literature) that the average length of a message chain, defined as the metric <L> has to be short in an absolute sense. The message chain length cannot scale with population n because of at each point there is a failure/termination rate of 25% (i.e. one and four people get the letter and do not forward it at all). Secondly, through some simple math they show that with a termination rate of 25% and a success rate fixed at above 5% that the average message length chain, <L> must be less than 10.4 independent of population size. With this, they can create diagrams of the parameter space in H and \alpha that meets that requirement. That is, they can create a graph of all the possible hierarchy number and homophily parameters that meet that requirement in <L>, thus:

Our main result is that searchable networks occupy a broad region of parameter space (\alpha ,H) which [...] corresponds to choices of the model parameters that are the most sociologically plausible. Hence our model suggests that searchability is a generic property of real-world social networks.

Faces, Places, Spaces

Humanities,Reading — Zac Townsend @ October 30, 2012 9:59 am

A great read on geography and ideas. Starts with a review of two new books on geography as a primal force in history, goes on the criticize that argument with counterexamples, and continues with a great distillation of the recent historiography around Eastern Europe and the Holocaust, and rounds itself out with a discussion of the U S of A.

Another version of space history is available these days, though. This might be called the cartographic turn, and is characterized by the argument that, while geography matters, it is visible only through the maps that we make of it. Where borders fall is as much a matter of how things are seenas how they really are. We can know the shape of the planet only through maps—maps in the ordinary glove-compartment sense, maps in a broader metaphoric one—and those maps are made by minds attuned to the relations of power. All nations are shaped by belligerence and slaughter.

Next Page »