Vega-Lite: Scales
(part 2)

Data 304

Scale Types

Scale Types

Vega-Lite supports the following scale types:

  1. Continuous: continuous domain \(\to\) continuous range

    • “linear”, “pow”, “sqrt”, “symlog”, “log”, “time”, “utc”
  2. Discrete: discrete domain \(\to\)

    • discrete range: “ordinal”, or

    • continuous range: “band”, “point”

  3. Discretizing: continuous domain \(\to\) discrete range

    • “bin-ordinal”, “quantile”, “quantize”, “threshold”

Scale Types

domain range scale type
continuous continuous continuous (linear, pow, sqrt, symlog, log, time, utc)
continuous discrete discretizing (bin-ordinal, quantile, quantize, threshold)
discrete continuous discrete (point, band)
discrete discrete discrete (ordinal)

Default scale types

The default scale type depends on the data type and the encoding channel.

Binning in Vega-lite

Binning

Binning is a transformation that puts quantitative values into “bins”.

  • This is familiar from histograms.
  • Binning can be used for other properties as well.

Creating bins (2 ways)

There are two ways to create bins in Vega-Lite

  1. transform
{
  ...
  "transform": [
    {"bin": ..., "field": ..., "as" ...} // Bin Transform
     ...
  ],
  ...
}
  1. shortcut in encoding
"size": {"field": ..., "type": "quantitative", "bin": ...}

Binned size

'{
  "$schema": "https://vega.github.io/schema/vega-lite/v5.json",
  "data": {"url": "https://cdn.jsdelivr.net/npm/vega-datasets@2.8.0/data/seattle-weather.csv"},  
  "width": 800, "height": 250,
  "title": "High temperatures in Seattle",
  "mark": {"type": "point"},
  "encoding": {
    "x": {"field": "date", "type": "temporal"},
    "y": {"field": "temp_max", "type": "quantitative", 
          "scale": {"domain": [0,35]}}, 
    "size": {"field": "precipitation", "type": "quantitative", "bin": true},
    "opacity": {"value": 0.7}
  }
}' |> as_vegaspec()

Controlling the bins

To get default bins, use "bin": true.

Can customize with a BinParams object in place of true:

Examples

  "bin": {"binned": true, "step": 5, "anchor": 0}

  "bin": {"binned": true, "maxBins": 15}

  "bin": {"binned": true, "steps": [1, 5, 10]}

  "bin": {"binned": true, "bins": [0, 2.5, 5, 7.5, 10]}

Give it a try

Exercise 1  

  • Create this graphic using bin defaults.
  • Then experiment with some of the bin options.

Binning color

Q. How do we create this plot?

A.

"color": {"field": ..., "type": "quantitative", "bin": true}

Let’s make it better

Exercise 2  

  • Choose a color scheme that makes the bins easier to see.

  • See if using circles (or fill) is better than using points.

  • What happens if you bin shape instead of color?

Comparing color schemes

Modifying the scale type

Q. How do you think we tell Vega-Lite to do this for size?

"size": {..., "scale": { "type": "threshold", 
                         "domain": [0.2, 0.5, 1], 
                         "range": [25, 50, 100, 200]}}

Continuous Scales

Continuous scales

The default (linear) scale is most common, but other choices are sometimes better.

"scale":{"type": "symlog", "constant": ...}

Exercise 3  

Ordinal scales

Ordinal scales have a discrete domain and a discrete range.

  • essentially serve as a look-up table mapping domain (data values) to range (visual values)

  • default for color and shape for ordinal/nominal data

  • main options: domain, range, scheme

But we can also use a continuous range with a discrete domain…

Band scales

  • default for nominal and ordinal fields on position channels (x and y) of bar or rect marks.

Point scales

  • point scale = band scale with bandwidth = 0

  • default for position channels of other marks and for size and opacity

Monarchs data

monarchs.json

[
{"name":"Elizabeth","start":1565,"end":1603,"index":0},
{"name":"James I","start":1603,"end":1625,"index":1},
{"name":"Charles I","start":1625,"end":1649,"index":2},
{"name":"Cromwell","start":1649,"end":1660,"commonwealth":true,"index":3},
{"name":"Charles II","start":1660,"end":1685,"index":4},
{"name":"James II","start":1685,"end":1689,"index":5},
{"name":"W&M","start":1689,"end":1702,"index":6},
{"name":"Anne","start":1702,"end":1714,"index":7},
{"name":"George I","start":1714,"end":1727,"index":8},
{"name":"George II","start":1727,"end":1760,"index":9},
{"name":"George III","start":1760,"end":1820,"index":10},
{"name":"George IV","start":1820,"end":1820,"index":11}
]
name start end index commonwealth
Elizabeth 1565 1603 0 NA
James I 1603 1625 1 NA
Charles I 1625 1649 2 NA
Cromwell 1649 1660 3 TRUE
Charles II 1660 1685 4 NA
James II 1685 1689 5 NA
W&M 1689 1702 6 NA
Anne 1702 1714 7 NA
George I 1714 1727 8 NA
George II 1727 1760 9 NA
George III 1760 1820 10 NA
George IV 1820 1820 11 NA

Setting padding for bars

Exercise 4 (Pencil and paper mostly)  

  • What stories can you tell with these data? [monarchs.json]

  • How can you modify the graphic to tell the various stories (still using bars)?

  • Are there alternatives to bars that you should consider?

Sorting the scale range

  "encoding": {
    "x": {"field": "name", "type": "nominal",
          "sort": {"field": "reign", "order": "descending"},


Exercise 5 What other sorting might be interesting here (and better than alphabetical)?

Another use of sort

Exercise 6 What happens to this graphic if we remove the sort from the y-encoding?

Jitter

Sometimes a little imprecision is better than being exact…

Q. How is the jitter created?

A. Manually calculate a new field using random().

 "transform": [
    {"calculate": "datum.Cylinders + 0.5 * random() - 0.25", 
     "as": "jCylinders"}
  ]

Jitter with temporal scales

Q. How do we create this plot? [This uses cars.json.]

A. Use timeOffset() in a calculate transform.

"transform": [
    {"calculate": "datetime(datum.Year)", "as": "Year"},
    {"calculate": "timeOffset('day', datum.Year, 200 * random() - 100 )",
     "as": "jYear"}
  ]
  • offsets Year (a datetime) by \(\pm\) 100 days.

Jitter with nominal scales

Q. Why is jittering with nominal scales different?

A. If we jitter in the domain, we have to jitter all the way to the next category!

  • So we need to jitter in the range.

Jitter with nominal scales

xOffset and yOffset

{ ..., 
  "transform": [{"calculate": "random()", "as": "r_offset"}],
  ...,
  "encoding": { ..., 
    "y": {"field": "Cylinders", "type": "nominal"},
    "yOffset": {"field": "r_offset"}
  }
}

Controlling the jitter in yOffset

Q. How can we control how much jitter there is?

A1. (fail): This does nothing! (Why?)

  "transform": [{"calculate": "0.3 * random()", "as": "r_offset"}],

A2. (awkward) Adjust the range of the offset scale.

    "yOffset": {"field": "r_offset", "scale": {"range": [10, 30]}}

Controlling the jitter in yOffset

Q. How can we control how much jitter there is?

A3. Use a band scale and adjust paddingInner.

    "y": {
      "field": "Origin", "type": "nominal",
      "scale": {"type": "band", "paddingInner": {"expr": "pad"}}
    },

Turning off the scale

Q1. When might we not want to have a scale at all?

Q2. How do we achieve that?

A2.

  "scale": null

A1. When data values are also range values.

  • Example: colors – you might have literal colors as a column in your data
  • Example: random – you might generate random range values with calculate.