Complexity and Decision-Making

Economics,Social Science — Zac Townsend @ November 6, 2012 9:26 am

The Great Man theory of history is usually considered too limited (see yesterday's post). This argument is perhaps best expressed in War and Peace, where Tolstoy goes on long discussions on the imaginary significance of great men, including obviously Napoleon. Or, as Isaiah Berlin says in the "Hedgehog and the Fox: An Essay on Tolstoy's View of History," Tolstoy perceived a "central tragedy" of human life:

...if only men would learn how little the cleverest and most gifted among them can control, how little they can know of all the multitude of factors the orderly movement of which is the history of the world...

We can think of this in a more limited fashion when it comes to Generals or even CEOs. This comment on Hacker News put it well:

People overrate what people can honestly achieve in highly chaotic environments. 15% of corporate CEOs are replaced every year - notice how companies don't change much from year to year though - I have. However, changing often definitely let's us lionize the lucky ones (see hedge funds, startups, novels, movies, tv shows and any other at scale, highly path dependent, chaotic and random systems).

My question though is whether this is changing. "Big data" and associated analyses may give us the ability to understand large systems in ways that were never before possible.

Just as an example, Friedrich Hayek has a famous argument in his work that the reason the government cannot run the "commanding heights" of the economy is because of information. Simply put there is no way for the government to amass and understand the information necessary to choose prices an set supply and demand, and that price is the only true reflection of preferences. History might have caught up with Hayek. We are entering an era where massive datasets and computational social science will allow us to understand people's revealed preferences better than any mechanism in history--even prices. Obviously I have set up a straw-man here in a sense, but the greater point is that we can begin to understand people's behavior and preferences far better by gathering massive information about them and their surroundings than can be revealed by a thousand theorems in American Economic Review.

I am particularly interested in what this means for local, state, and federal governments. (Political campaigns already use massive amounts of voter history and consumer data to microtarget potential voters, see this book and this one.) Governments collect large amounts of data on the services provided to individuals and outcomes of those service. New York City is slowly building the capabilities to cross-reference and understand all of this data and its implications for human behavior. As they work to collect, knit together, and derive meaning from massive administrative datasets, the very nature of what governments can know about citizens and how they can provide services could change.

Cesar Hildalgo comments on the data collected by government and the hope they move to big data:

Governments are much slower, but they're starting to collect data, and they have always been a very information-intensive business. Governments invented taxing, and taxation requires fine-grain data on how much you earn and where you live. Governments, actually "states" a long time ago, invented last names. People in villages didn't need last names. You were able to get around with just a first name. They had to invent last names for taxation, for drafting, so government is a very information-intensive business. In their innovation agenda, in order to do the things that they do better, governments are going to need to embrace big data.

I see, little by little, that there are people inside all of these organizations that are starting to have that battle. They tend to be younger people and were born into this Internet generation. Sometimes it's hard for them to have this fight. As time goes on, there's going to be more and more people that are going to see the value of data that is not only monetization, but also is providing better services, is understanding the world better, is understanding diseases, understanding the way that cities work, mobility, many types of things. Not just targeting people with ads. I think that there's more than that.

The three main problem paradigms (prediction, modeling, and detection) of machine learning, data mining, and artificial intelligence can be used to ask questions about government as service. For example, can we predict in a child welfare context who is most likely to end up in a juvenile deliquescence context? Can we model the individuals who receive housing subsidies that are most likely to commit crime? Can we detect the spread of knowledge of a new government program? What predictions can we made using leading indicator data? Allow me to focus on New York City as an illustrative example. New York City has a system that allows users to screen families for more than 35 city, state, and federal benefit programs. On the other hand, City agencies collect a massive number of variables related to demographics, location, risk-assessment tools, court dates, child welfare contact, police action, and more. If this data could be knitted together, we could begin to understand the life-cycle of families in their behavior and use of government service, and begin to model needs profiles in a way never before possible.

So, we live in an era where we have a large number of sensors that are collecting more data, we have the means and methods to analyze the gathered data, and we have the ability to be dynamic, instantly responsive models. With this all together, we may enter an era where CEOs, Generals, and other leaders are able to understand and respond to chaotic systems.

What Doctors Don't Know About the Drugs They Prescribe

Science,Social Science — Zac Townsend @ November 3, 2012 9:20 am

When a new drug gets tested, the results of the trials should be published for the rest of the medical world -- except much of the time, negative or inconclusive findings go unreported, leaving doctors and researchers in the dark. In this impassioned talk, Ben Goldacre explains why these unreported instances of negative data are especially misleading and dangerous.

It might seem less disturbing, but the same could be said about our knowledge about education, social science, housing, etc...

On Smart Failure

Social Science — Zac Townsend @ November 2, 2012 9:15 am

This guy is all over the place, but I find his idea that bureaucracies should prioritize implementations in the following order to be very interesting: old ideas that fail are worst, next is old ideas that succeed, above that is new ideas that fail, and finally new ideas that succeed.  He calls this a strong culture of “smart failure."

What Happens in Brooklyn Moves to Vegas

Reading,Social Science — Zac Townsend @ October 28, 2012 9:27 am

On a crazy and very impressive social experiment by Tony Hsieh to save Las Vegas:

The Downtown Project is hoping to draw 10,000 “upwardly mobile, innovative professionals” to the area in the next five years. And according to Hsieh, he and his team receive requests for seed money from dozens of people every week. In return, the Downtown Project asks not just for a stake in the companies but also for these entrepreneurs to live and work in downtown Las Vegas. (They’re also expected to give back to the community and hand over contacts for future recruits.) In expectation of all these newcomers, the project has already set up at least 30 real estate companies, bought more than 15 buildings and broken ground on 16 construction projects.

For those entrepreneurs who live in other parts of the country, and most do, the question often comes down to how eager they are to relocate to a downtown area filled with liquor stores and weekly hotels. Less than a year after the project was officially established, about 15 tech start-ups have signed on. The first tech investment went to Romotive, a company developing smart-phone-controlled personal robots. Money has also gone to Local Motion, a start-up that designs networks for sharing vehicles, and Digital Royalty, a social-media company.

What If We Tested Laws Before Passing Them?

Social Science,Statistics — Zac Townsend @ December 13, 2010 2:16 pm

An interesting article in the Boston Globe today on whether we should use randomized trials to test laws before they are passed.

There are certainly potential problems with this vision. First is the question of effectiveness: In some cases, it may prove too difficult to run an accurate test. The full repercussions of laws often take years to manifest themselves, and small-scale experiments do not always translate well to larger settings. Also at issue is fairness. Americans expect to be treated equally under the law, and this approach, by definition, entails disparate treatment.

“The problem is, we’re dealing with laws that have a huge impact on people’s lives,” says Barry Friedman, a law professor at New York University. “These aren’t casual tests. It’s not, you try Tide or you try laundry detergent X....Here we’re talking about basic benefits and fundamental rights.” Though Friedman is sympathetic to the goal of gaining better empirical knowledge, he says, “My guess is some of it’s doable in some contexts, and a lot of it’s not doable in other contexts.”

But others are more sanguine, and they make the opposite argument: That precisely because the stakes are so high, the laws that we enact on a large-scale, long-term basis must be more rigorously tested. This wave of thinking is part of a broader trend in fields from health care to education: Our practices should be “evidence-based,” rather than deriving from theories and unproven assumptions. The question is whether this kind of scientific approach can successfully take on a project as unruly as our society — and our politics.

From my earlier post, I think it is clear that I fall in the "the stakes are so high" lets test group.

Learning A New Statistical Method: Bayesian Additive Regression Trees

Social Science,Statistics — Zac Townsend @ December 13, 2010 1:53 pm

I may do some work for Jennifer Hill, an applied statistics professor at NYU's Steinhardt School. Having a career like hers is something I'm very interested in doing if I go the PhD route, which is get her doctorate in Statistics, focus on applications to social science, and work on interesting causal inference problems.

This last weekend I read a paper she sent me on Bayesian Additive Regression Trees (BART), which is quite interesting. The article, Bayesian Nonparametric Modeling for Causal Inference is coming out this January in Journal of Computational and Graphical Statistics. The abstract:

Researchers have long struggled to identify causal effects in nonexperimental settings. Many recently proposed strategies assume ignorability of the treatment assignment mechanism and require fitting two models—one for the assignment mechanism and one for the response surface. This article proposes a strategy that instead focuses on very flexibly modeling just the response surface using a Bayesian nonparametric modeling procedure, Bayesian Additive Regression Trees (BART). BART has several advantages: it is far simpler to use than many recent competitors, requires less guesswork in model fitting, handles a large number of predictors, yields coherent uncertainty intervals, and fluidly handles continuous treatment variables and missing data for the outcome variable. BART also naturally identifies heterogeneous treatment effects. BART produces more accurate estimates of average treatment effects compared to propensity score matching, propensity-weighted estimators, and regression adjustment in the nonlinear simulation situations examined. Further, it is highly competitive in linear settings with the “correct” model, linear regression. Supplemental materials including code and data to replicate simulations and examples from the article as well as methods for population inference are available online.

(This is perhaps more for me, than any reader) Basically, when using some methods to improve causal inference, such as matching, you're often fitting two models: one on whether or not a unit was treated, and than the more easily (or commonly) understood "response surface," which is the model for the outcome conditional on treatment and all the confounders. BART is a method to estimate the response surface non-parametrically, while being (it appears) as or more robust than other methods.

When trying to figure out how effective a treatment of some kind is, you cannot observe the outcomes for when an individual both receives the treatment Y_i(1) and does not receive the treatment Y_i(0). A fancy way of saying that is Y_i=Y_i(1)Z_i+Y_i(0)(1-Z_1), where Z_i is an indicator of whether you have or have not gotten the treatment. So that equation is saying that if you got the treatment the second term on the right side of the equal sign is zero, and in the alternative case, the first term is zero.

When doing casual inference, you want to compare two groups, one that received the treatment and one that did not, that are as similar as possible. That is, the only difference in the comparison groups is that one got the treatment and the other didn't. In this way, you can be sure that any observed difference in the groups is due to the treatment. This idea is formalized through the term ignorability. That is, if the two groups cannot be distinguished on all the observable characteristics (they have "balance"), the assignment to the treatment group is ignorable. (More formally, the potential outcomes are independent of treatment assignment, given the covariates or Y(0),Y(1) \perp\!\!\!\perp Z | X, where X are confounders and \perp\!\!\!\perp means conditionally independence). Ignorability also requires overlap or common support in the covariates across the two groups.

So, in the end with ignorability, we're left to estimate the E[Y(1)|x]=E[Y|X,Z=1] and E[Y(0)|x]=E[Y|X,Z=0]. Unfortunately, this estimation can be very difficult if the treatment outcomes are not linearly related to the covariates, the distribution of the covariates are different across the two groups, or, as is often the case in a world with increasing data, there are tons of confounding covariates or (and this happens all the the time) you really don't know which of them are needed to satisfy ignorability. A bunch of methods have been proposed to address this estimation problem (see the paper for a ton of citations), but the BART method, as I mentioned earlier is different because it "focuses solely on precise estimation of the response surface." Also, part of BART's advantage is that it doesn't require as many researchers choices:

Nonparametric and semiparametric versions of these [other cited] methods are more robust but require a higher level of researcher sophistication to understand and implement (e.g., to specify smoothing parameters such as number of terms in a series estimator or bandwidth for a kernel estimator). This article proposes that the benefits of the BART strategy in terms of simplicity, precision, robustness, and lack of required researcher interference outweigh the potential benefit of having an estimator that is strictly consistent under certain sets of conditions.

I think I'll save a careful description of the trees themselves for a later post, even thought that is most of the paper. Basically, though, BART is a sum-of-trees model that uses a set of binary trees to split up the observations on the confounders. What's most fascinating, though, is that the parameters for BART are defined as a statistical model, with a prior put on the parameters, which is quite different than the other learning/mining models I've learned about. For those happy few who might be interested, BART is described in even greater detail in "BART: Bayesian additive regression trees." Abstract:

We develop a Bayesian “sum-of-trees” model where each tree is constrained by a regularization prior to be a weak learner, and fitting and inference are accomplished via an iterative Bayesian backfitting MCMC algorithm that generates samples from a posterior. Effectively, BART is a nonparametric Bayesian regression approach which uses dimensionally adaptive random basis elements. Motivated by ensemble methods in general, and boosting algorithms in particular, BART is defined by a statistical model: a prior and a likelihood. This approach enables full posterior inference including point and interval estimates of the unknown regression function as well as the marginal effects of potential predictors. By keeping track of predictor inclusion frequencies, BART can also be used for model-free variable selection. BART’s many features are illustrated with a bake-off against competing methods on 42 different data sets, with a simulation experiment and on a drug discovery classification problem.

Testing Housing Aid

New York City,Social Science,Statistics — Zac Townsend @ December 12, 2010 5:27 pm

New York City is randomizing the people who get a certain Housing Aid program called Homebase:

It has long been the standard practice in medical testing: Give drug treatment to one group while another, the control group, goes without.

Now, New York City is applying the same methodology to assess one of its programs to prevent homelessness. Half of the test subjects — people who are behind on rent and in danger of being evicted — are being denied assistance from the program for two years, with researchers tracking them to see if they end up homeless.

The city’s Department of Homeless Services said the study was necessary to determine whether the $23 million program, called Homebase, helped the people for whom it was intended. Homebase, begun in 2004, offers job training, counseling services and emergency money to help people stay in their homes.

But some public officials and legal aid groups have denounced the study as unethical and cruel, and have called on the city to stop the study and to grant help to all the test subjects who had been denied assistance.

“They should immediately stop this experiment,” said the Manhattan borough president, Scott M. Stringer. “The city shouldn’t be making guinea pigs out of its most vulnerable.”

On a listserv I'm on, there has been a lot of ethical handwringing about this program, but these people weren't randomly assigned to poverty. They were randomly assigned not to receive a program.

If you agree with Stringer that citizens shouldn't be treated like lab rats, than the conclusion should be that they should receive no treatment. We have no idea whether this program is effective or not. We have no idea whether enrolling people in this program, in the long-term, might increase the time they spend homeless. We have no idea if the program leads to more crime or less. We have no idea if the program does anything. So if you're not interested in throwing people in to some unproven, untested, possibly ill-designed program at politicians' whims, the only option is to stop the intervention all-together.

Alternatively, perhaps we can test the program. We can see if the program is effective. We can learn whether the program meets its goals. Not necessarily on a cost-benefit basis, but at all. By any standard. To do that we turn to the randomized experiment.

Now, what is experimentation? In the ideal multiverse we could take the exact same people and give them intervention in one case, and not give them the intervention in the other. Then we could observe the difference and know that it was due to the Homebase program.

Absent that we have only one tool at our disposal that gets at causal inference with almost no exceptions, and that is the well-designed randomized experiment (note all the caveats because basically the randomized experiment is the gold standard, and there are SO many statistical and design tools to turn quasi-experiments and correlation studies into something approaching the ideal that NYC is implementing).

To do this you find two groups as alike as possible and you compare them. You give one of them the intervention, and you don't give it to the other group. You can't just give the program to as many people as apply and then pick some other group of people to test as a comparison. Applying is, itself, a factor you want to be equal across the groups. That's why in random experiments you tend to look for twice as many people as you can enroll, randomly enroll half of them, and then collect data on both them.

A large number of families are denied (1,500) due to lack of funding. Another way to think of the study is that there are 1,700 people rejected, and we found money to serve 200 of them. What is the best way to pick those people? The answer, to me, is the lottery. So 200 of those 1,700 families are assigned the intervention, and we randomly study another 200 of them. These two groups--all people who applied to the program--we can assume are basically similar (have something called "balance") across all observable and unobservable characteristics (we can measure the first and assume the second).

Now I'm masking a bunch of statistics that show that random assignment leads to balance, on average, but whatever. The point is that we're creating a counter-factual group of people who applied to the program but didn't get the intervention and people who did apply to the program who did. The selection was done by lottery--not by some other method such as who you're best friends with, or whether your name sounds right or whatever. Doesn't that seem like a just way to assign spots in a program?