The Grammar of Graphics

Data 304 – Spring 2025

Acknowledgements

  • These slides are based on a similar set of slides presented by Danny Kaplan at the 2018 Computation and Visualization Consortium.

  • The were modified by Randall Pruim for use in courses at Calvin University.

The Grammar of Graphics

A little history

  • Leland Wilkinson: The Grammar of Graphics

    • first edition in 1999
    • psychologist, primary author of 1999 APA guidelines for statistical methods in psychology journals
    • died December 2021

The Grammar of Graphics

A little history

Glyphs and Data

In its original sense, in archaeology, a glyph is a carved symbol.

Heiroglyph Mayan glyph
Heiroglyph Mayan glyph

Data Glyph

A data glyph is also a mark, e.g. 

  • Some are very simple, e.g. a dot:
  • One glyph or two? a pointrange:
    • some systems combine simpler glyphs to form 1 compound glyph
    • some systems require you to specify multiple glyphs
  • Names: glyph, geometry (geom), symbol, mark

Features of a data glyph can encode the values of variables.

Data Glyph Properties

Each data glyph has a set of visual properties.

  • Properties for points: location (x and y), shape, color (stroke and fill), size, shape, transparency, etc.

  • Names: property, aesthetic, channel

Why “Aesthetic”?

Some Graphics Components

glyph [mark, geom, symbol]
The basic graphical unit that represents one case. Other terms used include mark, geom, symbol.
property [channel, aesthetic]

a visual property of a glyph such as position, size, shape, color, etc.

  • may be mapped based on data values: color is determined by sex
  • may be set to particular non-data related values: color is black
scale

A mapping that translates data values into properties.

  • example: male -> blue; female -> pink
guide

An indication for the human viewer of the scale. This allows the viewer to translate properties back into data values.

  • Examples: x- and y-axes, various sorts of legends
frame
The position scale describing how data are mapped to x and y

Scales

Scale: Data value \(\to\) property value

Examples

  • The conversion from SBP to position is a scale.

    • Systolic Blood Pressure (SBP) has units of mmHg (millimeters of mercury).
    • Position on the x-axis measured in distance on paper/screen, e.g. inches/pixels.
  • The conversion from Smoker (variable) to color (aesthetic) is a scale.

    • never \(\to\) red; former \(\to\) green; current \(\to\) blue

Guides

Guide: an indication to a human viewer of what the scale is.

  • Axis ticks and numbers

  • Legends

  • Labels on faceted graphics

Facets – using x and y twice

  • x is determined by sbp and sex
  • basically a separate frame for each sex

Related terms: small multiples, subplots

Designing Graphics

Graphics are designed by the human expert (you!) in order to reveal information that’s latent in the data.

Most graphics are designed to make some sort of comparison.

  • Comparing multiple groups to each other
  • Comparing data to some benchmark
  • Comparing data to a model
  • etc.

A good graphic is one that allows the viewer to make the intended comparison easily and accurately.

Design choices

  • What kind of glyph, e.g. scatter, density, bar, … many others
  • What variables constitute the frame. And some details:
    • axis limits
    • logarithmic axes, etc.
  • What variables should be mapped to other properties of the glyph.
  • Whether to facet and with what variable.
  • What/how to label.
  • Which fonts, colors, etc. to use.

Good and Bad Graphics

Remember: A good graphic is one that allows the viewer to make the intended comparisons easily and accurately.

  • Good graphics make it easy for people to perceive things that are similar and things that are different.

  • Need to know something about how people perceive.

  • Your choices depend on what information you want to reveal and convey.

  • Learn by reading graphics and determining which ways of arranging things are better or worse.

Perception and Comparison

In roughly descending order of human ability to compare nearby objects:

  1. Position
  2. Length/distance
  3. Area (easier if shapes are the same)
  4. Angle
  5. Shape (but only a very few different shapes)
  6. Color (depends a bit on how color is used)

Notes on color

Color can be the most difficult, because it is a 3-dimensional quantity.

Count the ways this graphic is bad

Better?

  • What comparisons are easier to make now?

  • How else might we modify the plot? (For what purposes?)

Glyph-Ready Data

Glyph-ready data has this form:

  • There is (usually) one row for each glyph to be drawn.
  • The variables in that row are mapped to properties of the glyph (including position)

Glyph-ready data

  sbp dbp    sex smoker
1 112  55   male former
2 144  84   male  never
3 143  84 female  never
4 110  62 female  never

Mapping of data to properties

   sbp -> x      
   dbp -> y     
smoker -> color
   sex -> shape

Scales determine details of data -> aesthetic translation

You can see the data used by ggplot layers using ggplot::layer_data().

Layers – building up complex plots

Each layer may have its own data, glyphs, aesthetic mapping, etc.

  • one layer has points
  • another layer has the curves

Stats: Data Transformations

  • What are the glyphs, properties, etc. for this plot?
  • How is the glyph-ready data for this plot related to the “raw” data?
  sbp dbp    sex smoker
1 112  55   male former
2 144  84   male  never
3 143  84 female  never
4 110  62 female  never

What’s Next

  1. Eye-training

    • recognize and describe glyphs, properties, scales, etc.
    • identify data required for a plot
      • think about data transformations potentially involved
    • identify good and bad features of a plot
    • start building a repertoire of “plot ideas”

What’s Next

  1. Eye-training

  2. Design

    • learn to make good decisions about how to use our “palette of visual properties” to convey a message

What’s Next

  1. Eye-training

  2. Design

  3. Data wrangling

    • may need to modify the data before beginning the process of creating a graphic.

What’s Next

  1. Eye-training

  2. Design

  3. Data wrangling

  4. Graphics construction

    • convert design into (code for) graphics
    • many different systems for doing this

Some software options

  • R: base graphics, lattice, ggplot2, ggformula, plotly, (ggvis), altair, vegabrite, …
  • Python: matplotlib, plotly, seaborn.objects, altair, …
  • Javascript: D3, plotly, observable, vegalite, …
  • Non/low-coding options: Tableau, PowerBI, Looker, Flourish, …

We will focus on vega-lite

  • mid-level
    • good balance between ease of use and control
  • closely tied to grammar of graphics approach
  • can be used in R, Python, Javascript, …
    • native format is JSON
  • provides a grammar of interactive graphics