Vega-Lite: Scales
(part 2)

Data 304

Scale Types

Scale Types

Vega-Lite supports the following scale types:

  1. Continuous: continuous domain \(\to\) continuous range

    • “linear”, “pow”, “sqrt”, “symlog”, “log”, “time”, “utc”
  2. Discrete: discrete domain \(\to\)

    • discrete range: “ordinal”, or

    • continuous range: “band”, “point”

  3. Discretizing: continuous domain \(\to\) discrete range

    • “bin-ordinal”, “quantile”, “quantize”, “threshold”

Scale Types

domain range scale type
continuous continuous continuous (linear, pow, sqrt, symlog, log, time, utc)
continuous discrete discretizing (bin-ordinal, quantile, quantize, threshold)
discrete continuous discrete (point, band)
discrete discrete discrete (ordinal)

Default scale types

The default scale type depends on the data type and the encoding channel.

Binning

Binning is a transformation that puts quantitative values into “bins”.

  • This is familiar from histograms.
  • Binning can be used for other properties as well.

Creating bins (2 ways)

There are two ways to create bins in Vega-Lite

  1. transform
{
  ...
  "transform": [
    {"bin": ..., "field": ..., "as" ...} // Bin Transform
     ...
  ],
  ...
}
  1. shortcut in encoding
"size": {"field": ..., "type": "quantitative", "bin": ...}

Binned size

'{
  "$schema": "https://vega.github.io/schema/vega-lite/v5.json",
  "data": {"url": "https://cdn.jsdelivr.net/npm/vega-datasets@2.8.0/data/seattle-weather.csv"},  
  "width": 800, "height": 250,
  "title": "High temperatures in Seattle",
  "mark": {"type": "point"},
  "encoding": {
    "x": {"field": "date", "type": "temporal"},
    "y": {"field": "temp_max", "type": "quantitative", 
          "scale": {"domain": [0,30]}}, 
    "size": {"field": "precipitation", "type": "quantitative", "bin": true},
    "opacity": {"value": 0.7}
  }
}' |> as_vegaspec()

Controlling the bins

To get default bins, use "bin": true.

Can customize with a BinParams object in place of true:

Examples

  "bin": {"binned": true, "step": 5, "anchor": 0}

  "bin": {"binned": true, "maxBins": 15}

  "bin": {"binned": true, "steps": [1, 5, 10]}

  "bin": {"binned": true, "bins": [0, 2.5, 5, 7.5, 10]}

Give it a try

  • Create this graphic using bin defaults.
  • Then experiment with some of the bin options.

Binning color

Q. How do we create this plot?

"color": {"field": ..., "type": "quantitative", "bin": true}

Your turn

  • Choose a color scheme that makes the bins easier to see.

  • See if using circles (or fill) is better than using points.

  • What happens if you bin shape instead of color?

Comparing color schemes

Modifying the scale type

Q. How do you think we tell Vega-Lite to do this for size?

"size": {..., "scale": { "type": "threshold", 
                         "domain": [0.2, 0.5, 1], 
                         "range": [25, 50, 100, 200]}}

Continuous scales

"scale":{"type": "symlog", "constant": ...}

Your turn

  • Try some other scale types: “log”, “pow”, “sqrt”.
  • Why can’t we bind the scale type to a select input?

Ordinal scales

Ordinal scales have a discrete domain and a discrete range.

  • essentially serve as a look-up table mapping domain (data values) to range (visual values)

  • default for color and shape for ordinal/nominal data

  • main options: domain, range, scheme

But we can also use a continuous range with a discrete domain…

Band scales

  • default for nominal and ordinal fields on position channels (x and y) of bar or rect marks.

Point scales

  • point scale = band scale with bandwidth = 0

  • default for position channels of other marks and for size and opacity

Monarchs data

monarchs.json

[
{"name":"Elizabeth","start":1565,"end":1603,"index":0},
{"name":"James I","start":1603,"end":1625,"index":1},
{"name":"Charles I","start":1625,"end":1649,"index":2},
{"name":"Cromwell","start":1649,"end":1660,"commonwealth":true,"index":3},
{"name":"Charles II","start":1660,"end":1685,"index":4},
{"name":"James II","start":1685,"end":1689,"index":5},
{"name":"W&M","start":1689,"end":1702,"index":6},
{"name":"Anne","start":1702,"end":1714,"index":7},
{"name":"George I","start":1714,"end":1727,"index":8},
{"name":"George II","start":1727,"end":1760,"index":9},
{"name":"George III","start":1760,"end":1820,"index":10},
{"name":"George IV","start":1820,"end":1820,"index":11}
]
name start end index commonwealth
Elizabeth 1565 1603 0 NA
James I 1603 1625 1 NA
Charles I 1625 1649 2 NA
Cromwell 1649 1660 3 TRUE
Charles II 1660 1685 4 NA
James II 1685 1689 5 NA
W&M 1689 1702 6 NA
Anne 1702 1714 7 NA
George I 1714 1727 8 NA
George II 1727 1760 9 NA
George III 1760 1820 10 NA
George IV 1820 1820 11 NA

Setting padding for bars

Your turn

  • What stories can you tell with these data? [monarchs.json]

  • How can you modify the graphic to tell the various stories (still using bars)?

  • Are there alternatives to bars that you should consider?

Sorting the scale range

  "encoding": {
    "x": {"field": "name", "type": "nominal",
          "sort": {"field": "reign", "order": "descending"},

Your turn

What other sorting might be interesting here (and better than alphabetical)?

Another use of sort

Jitter

Sometimes a little imprecision is better than being exact…

Q. How do we create this kind of plot? [This uses cars.json.]

Jitter with quantiative scales

Manually calculate a new field using random().

{ ...,
  "transform": [
    {"calculate": "datum.Cylinders + 0.5 * random() - 0.25", 
    "as": "jCylinders"}],
  ...
}

Jitter with nominal scales

With nominal scales, we can use xOffset or yOffset encodings.

{ ..., 
  "transform": [{"calculate": "random()", "as": "random"}],
  ...,
  "encoding": { ..., 
    "y": {"field": "Cylinders", "type": "nominal"},
    "yOffset": {"field": "random"}
  }
}

Q. How can we control how much jitter there is when using xOffset or yOffset?

Turning off the scale

Q1. When might we not want to have a scale at all?

Q2. How do we achieve that?

A2.

  "scale": null

A1. When data values are also range values.

  • Example: colors – you might have literal colors as a column in your data
  • Example: random – you might generate random range values with calculate.