Data 304
Original in 1983, revised in 2001
My main goal is to introduce you to both the ideas and the methods of data visualization in a sensible, comprehensible, reproducible way.
When teaching people how to make graphics with data, however, I have repeatedly found the need for an introduction that motivates and explains why you are doing something but that does not skip the necessary details of how to produce the images you see on the page. And so this book has two main aims. First, I want you get to the point where you can reproduce almost every figure in the text for yourself. Second, I want you to understand why the code is written the way it is, such that when you look at data of your own you can feel confident about your ability to get from a rough picture in your head to a high-quality graphic on your screen or page.
This book is a hands-on introduction to the principles and practice of looking at and presenting data using R and ggplot.
Graphical excellence is the well-designed presentation of interesting data-—a matter of substance, of statistics, and of design.
Graphical excellence consists of complex ideas communicated with clarity, precision, and efficiency.
Graphical excellence is that which gives to the viewer the greatest number of ideas in the shortest time with the least ink in the smallest space.
Graphcial excellence is nearly always multivariate.
And graphical excellence requires telling the truth about the data.
(Tufte 2001, 51, italics original, bold mine)
Good design has two key elements:
Graphical elegance is often found in simplicity of design and complexity of data.
Visually attractive graphics also gather power from content and interpretations beyond the immediate display of some numbers. The best graphics are about the useful and important, about life and death, about the universe. Beautiful graphics do not traffic with the trivial.
Learn more about its creator, Charles Minard, in this National Geographic article
See how to recreate this in R here.
On rare occasions graphical architecture combines with the data content to yield a uniquely spectacular graphic. Such performances can be described and admired but there are no easy compositional principles on how to create that one wonderful graphic in millions.
What can be suggested, though, are some guides for enhancing the visual quality of routine, workaday designs.
Most of our graphics do not look like Minard’s graphic.
We need principles and guidelines to help us effectively build and customize “workaday graphics”.
Healy: Layer, Highlight, Repeat
The grammar of graphics sets us up to use these elements flexibly.
We also need to use them effectively and honestly.
Attractive displays of statistical information
within
Audience matters.
Healy (2019) enumerates three categories of badness.
Bad taste
Bad data
Bad perception
Tufte’s advice has often been summarized as a desire to increase the data-to-ink ratio.
This is practical advice. It is not hard to jettison tasteless junk, and if we look a little harder we may find that the chart can do without other visual scaffolding as well. We can often clean up the typeface, remove extraneous colors and backgrounds, and simplify, mute, or delete gridlines, superfluous axis marks, or needless keys and legends. Given all that, we might think that a solid rule of “simpify, simplify” is almost all of what we need to make sure that our charts remain junk-free, and thus effective. But …
Somewhat annoyingly, there is evidence that highly embellished charts like Nigel Holmes’s “Monstrous Costs” are often more easily recalled than their plainer alternatives (Bateman et al., 2010). Viewers do not find them more easily interpretable, but they do remember them more easily and also seem to find them more enjoyable to look at. They also associate them more directly with value judgments, as opposed to just trying to get information across. Borkin et al. (2013) also found that visually unique, “Infographic” style graphs were more memorable than more standard statistical visualizations.
(“It appears that novel and unexpected visualizations can be better remembered than the visualizations with limited variability that we are exposed to since elementary school”, they remark.))
Even worse, it may be the case that graphics that really do maximize the data-to-ink ratio are harder to interpret than those that are a little more relaxed about it.
Cues like labels and gridlines, together with some strictly superfluous embellishment of data points or other design elements, may often be an aid rather than an impediment to interpretation.
From (Healy 2019, 1.2.2):
{.width=90%}
the survey question asked respondents to rate the importance of living in a democracy on a ten point scale, with 1 being “Not at all Important” and 10 being “Absolutely Important”. The graph showed the difference across ages of people who had given a score of “10” only
Showing average scores this time.
{.width=70%}
Our eyes and brains are designed to view the real world, not data graphics.
This mismatch can cause us to misinterpret data graphics that don’t take human perception into account.
Perception is not a simple matter of direct visual inputs producing straightforward mental representations of their content. Rather, our visual system is tuned to accomplish some tasks very well, and this comes at a cost in other ways. (Healy 2019, 1.3.1)
With a partner,
Look at the gallery and identify good and bad features of the examples there.
Design a better graphic.
If time, create that graphic.