The Grammar of Graphics Recap

Data 304 – Spring 2025

Grammar of Graphics Components

  • Varset: Variables thought about backwards: (multi-)value \(\to\) ordered list of observations (that have that value)

    • \((CS, 10) \to \langle 2,5,8 \rangle\)
  • Algebra: 3 algebraic operations (blend, nest, union) combine variables and prepare data for faceting, etc.

  • Scales:

    • transformations: e.g., identity, log, normalization
    • units: e.g., Convert from F to C for temperature
    • 4 kinds: nominal, ordinal, interval, ratio
  • Statistics: Binning, numerical summaries (e.g., mean, 5-number summary), model values, etc.

  • Geometry: Varset \(\to \mathbb{R}^n\) (\(n = 2\), typically)

  • Coordinates: Cartesian, polar, interchange axes, etc.

  • Aesthetics: add visual properties

Note: slightly different use of scales and aesthetics compared to, say, ggplot2 and most other graphics software.

Details at (Wilkinson:2012?) or Wilkinson et al. (2013).

Eye Training

  1. What variables form the frame?
  2. What glyphs (marks/geoms) are used?
  3. What are the properties (aesthetics, channels) for those glyphs?
  4. Which variables are mapped to which properties?
  5. Which variables, if any, are used for faceting (small multiples)?
  6. Which scales have a corresponding guide?
  7. What would a row of the glyph-ready data used to create this plot look like? (Include variable names and (estimated) values.)
  8. Do you think the raw data had a different format? If so, what would a row of the raw data look like?
  9. What are the biggest strengths/weakness of this plot?
  10. Are there any elments of this plot that don’t fit into the “grammar of graphics”?

Example

References

Wilkinson, L., D. Wills, D. Rope, A. Norton, and R. Dubbs. 2013. The Grammar of Graphics. Statistics and Computing. Springer New York. https://books.google.com/books?id=ZiwLCAAAQBAJ.