The Problem with Data
Troy Anderson, Dataplace
Data doesn't kill people, people kill people. And yet, more lives are affected by data than guns: data determines how many Congress people represent you (unless you live in DC) and often how much money your state gets; data is one of the primary things (some say the only thing) that determines your mortgage interest rate; and data is often the last refuge of a specious argument.
With data so important these days, you'd better have some or you'll get left out, competed away, or find yourself unable to prove anything to anyone. Miss providing data and you'll miss out on money or opportunities for you, your organization, or your community.
The problem with data, as it currently exists in federal agencies and web sites, is that it's very difficult to use despite being very relevant, down to a neighborhood level. Interviewing people who try to make use of this data is sobering: "We used to spend a thousand hours a year processing HMDA data for our local community." "We have to pay through the nose to get good neighborhood level reports on data that's otherwise 'free'." "Why can't free data be free?"
As part of the Fannie Mae Foundation, we used to get many grant requests for data analysis equipment or services and for thousand dollar neighborhood market reports. Yet, often, what grantees really needed was free use of free data. Enter DataPlace. DataPlace makes understanding community statistics easy with tools such as rankings, charts, histograms, and maps that use the free federal statistics presented through an easy-to-use interface.
The DataPlace initiative allows people in the field to get an objective understanding of their community – free of charge – and use that information in ways DataPlace could have never imagined. Liz Curry, of the Champlain Housing Trust, said, “DataPlace enabled me to demonstrate – for the first time – the need for affordable homeownership opportunities among households with lower incomes. It revealed the fact that the homeownership rate among the general population masks the income make up of who owns property. This data has been previously inaccessible to my organization in a usable format (for example, could not find it on census website; could not find policy research & analysis consultants or staff that could find this information for me).”
But ease of use and democratization of data are not the only problems faced by people who need data. The other problem with data, and with wanting data, is that data usually comes with a bias. People who gather and use data often already have an answer in mind: “Let’s disprove this hypothesis.” “Let’s see what’s well correlated with default risk” “What’s the income of this neighborhood?” The answers to their questions often depend on what can be measured, how often, when, and where. The usual answer to “What data can we get on this?” is “This is the data that’s available.” Anyone familiar with the story of the drunk looking for his keys under the lamppost because that’s where the best light was should see the problems here.
Even with good data, people can still arrive at bad conclusions. For instance, check out the very high correlation between loan denials and aggregate income by county across the US – a whopping 87.9% correlation! This data can be made to say that the more loan denials a county has, the more aggregate income a county will have. If I weren’t a good data analyst, I might conclude that opening bank branches across the country and denying loans would be helpful to increasing aggregate income! Just because the rooster doesn’t crow doesn’t mean the sun won’t come up.
Like most anything, you can acquire various levels of expertise in understanding and using data. DataPlace provides good basic statistics that are apple-to-apple sets of data, drilled down to a very narrow level. Organizations like the National Neighborhood Indicator Partnership, Social Compact, MetroEdge, and Community Indicator Consortium are other nonprofit organizations that can help give you a richer understanding of how data is useful.
While you might be tempted to try all this data out for yourself, don’t forget that data is power, and power can corrupt. Be careful! Don’t look for your keys under the lightpost.







