Ethics of Making Graphs
In a few political and data-visualization blogs the past several days, there has been a kerfuffle concerning this bar chart that the Wall Street Journal published. The gist of the chart is that the bulk of the taxable income in this country is earned by households in the $100,000-$200,000 range, and the argument made is that increasing taxes on the richest Americans won’t raise enough money to eliminate the budget deficit.
The liberal-leaning magazine Mother Jones responded with this graph, with the objection that the WSJ’s graph was drawn to imply that the rich weren’t really all that rich.
I don’t like either graph. Neither is particularly useful. But neither one is “wrong.”
Both graphs seem to have been created with the intention of making a political statement. Which is OK for political blogs because we all know that information presented by pundits has the potential to be biased. But what about graphs that are supposed to be objective – ones we see on the news or ones that we send in reports to our bosses? Is there a standard of ethics for making graphs?
Google is surprisingly silent on the issue, at least using the several searches I tried.
Here are a few items I thought of to help us improve the quality of the graphs in our lives.
Why did you categorize your data this way? Are divorced people similar enough to widowed people to represent by the same slice of a pie chart? Are 21-22 year olds more similar to 19-20 year olds or to 23-24 year olds? Did you choose the breakpoints that do the most to strengthen your hypothesis (replacing “strengthen your hypothesis” with “generate hits to your website” or “improve your chances of getting a bonus,” etc., as needed)? Can you back up your choices with research or other examples in your industry?
If you change the scaling of your graphs from the defaults, why? Are you perhaps trying to soften a bad trend or exaggerate a good trend? Will you use the same scaling in your next report even if trends change?
Ask questions when you suspect that someone is using data visualization for a deceptive purpose. Ask them to present the information in several different ways or ask for the raw data set. If you’re still reading this paragraph, I assume you have a certain level of expertise with or interest in graphs (not to mention you’re probably very sophisticated and good-looking). Use your above-average knowledge as a public service to help keep others from getting duped.
I feel obliged to make a graph from the raw IRS dataset that was used by both the Wall Street Journal and Mother Jones. I used the same groupings as the raw data set, so part of the graph looks like the Wall Street Journal bar graph, but I also included the number of returns filed in each income category in 2008.
We can see that the number of returns filed peaks in the $50,000 to $75,000 income range, and the total taxable income per income category peaks in the $100,000 to $200,000 range, at least according to the way that the IRS chose to break down the data for this particular table. There are many more returns filed in the lowest income groups than the highest income groups, and there is much more total taxable income in the highest income groups than the lowest income groups.
What “should” be done with tax policy after seeing this graph? I hope I’ve left you to make your own decisions on that.