Journalism in the Age of Data

December 15, 2010

Spencer sent me a very cool site yesterday called Journalism in the Age of Data. The main content is a 54 minute video report on "data visualization as a storytelling medium." It has a great interface that contains a lot of extra content.  There is a really interesting thread in the video about hard-to-make good visualizations, and the proliferation of bad and confusing visualizations. The "key points":

The explosion of data has brought a complementary need for tools to analyze it
Researchers in visualization are helping by building tools for non-experts
Journalists are finding ways to adapt to the challenge of telling stories with data
With experience in charting data, infographics designers are well suited to bring data vis to journalism, but they debate how effective it is at explaining concepts
In a wired world, data is increasingly becoming a medium of personal expression
Data will increasingly arrive in real time, challenging our ability to absorb, analyze and display it
Technologies for creating online visualizations are in transition, but there are new tools coming out that will make the process easier
Data analysis is at least as important as visually displaying it; there are tools that help with this process

Some cool visualizations I saw in the video:
Budget Forecasts, Compared With Reality
The Crisis of Credit Visualized
San Francisco Crimespotting

And, a reference to a very cool paper on Narrative Visualization: Telling Stories with Data and a very cool JS library: Protovis.

How New York's Racial Makeup Has Changed Since 2000

December 14, 2010

The Times has visualized the change in the ethnic break down of the City by census tract. Here is the map for Black New Yorkers:

Black New Yorkers MovementMap Key

The map text from the Times:

Canarsie, Brooklyn, had one of the greatest increases in its share of black residents in 2009 (to 81% from 67%), while recently gentrified neighborhoods like Prospect Heights, Clinton Hill and Fort Greene saw double-digit decreases.

See the rest of the maps here.

There is also an accompanying article Region Is Reshaped as Minorities Go to Suburbs:

Metropolitan New York is being rapidly reshaped as blacks, Latinos, Asians and immigrants surge into the suburbs, while gentrification by whites is widening the income gap in neighborhoods in Manhattan and Brooklyn, according to new census figures released on Tuesday.

Gawker Passwords

December 14, 2010

This weekend the Gawker network of blogs was hacked, and a bunch of user passwords were compromised. For an interesting analysis of the see this post as Coding Horror. One of the things I found most interesting though, was the Wall Street Journal had an interesting article on passwords in the hack:

On Sunday night, hackers posted online a trove of data from Gawker Media’s servers, including the usernames, email addresses and passwords of more than one million registered users. The passwords were originally encrypted, but 188,279 of them were decoded and made public as part of the hack.

Then, using that dataset, the WSJ found the 50 most-popular Gawker Media passwords and made this interesting graph:
The Top 50 Gawker Media Passwords

Visualizing Friendships

December 14, 2010

An intern at Facebook has created a world map that visualizes the connections in the social graph:

Facebook World Map of Relationships

What's fascinating, through, is how he did it:

I began by taking a sample of about ten million pairs of friends from Apache Hive, our data warehouse. I combined that data with each user's current city and summed the number of friends between each pair of cities. Then I merged the data with the longitude and latitude of each city.

At that point, I began exploring it in R, an open-source statistics environment. As a sanity check, I plotted points at some of the latitude and longitude coordinates. To my relief, what I saw was roughly an outline of the world. Next I erased the dots and plotted lines between the points. After a few minutes of rendering, a big white blob appeared in the center of the map. Some of the outer edges of the blob vaguely resembled the continents, but it was clear that I had too much data to get interesting results just by drawing lines. I thought that making the lines semi-transparent would do the trick, but I quickly realized that my graphing environment couldn't handle enough shades of color for it to work the way I wanted.

Instead I found a way to simulate the effect I wanted. I defined weights for each pair of cities as a function of the Euclidean distance between them and the number of friends between them. Then I plotted lines between the pairs by weight, so that pairs of cities with the most friendships between them were drawn on top of the others. I used a color ramp from black to blue to white, with each line's color depending on its weight. I also transformed some of the lines to wrap around the image, rather than spanning more than halfway around the world.