Vega-Lite: Scales

Data 304

Documentation

https://vega.github.io/vega-lite/docs/

Two big decisions

When designing a data graphic, the two biggest decisions are

  1. What visual properties to use for which variables.

    • Handled by encoding in Vega-Lite.
  2. What scales to use to map data values to visual values.

    • Defaults are often, but not always, sufficient.

Today’s topic: How to adjust scales in Vega-Lite.

Introduction to scales

Scales

Scales are functions that transform a domain of data values (numbers, dates, strings, etc.) to a range of visual values (pixels, colors, sizes).

scale: data value \(\to\) visual value

So three parts we can adjust:

  • domain: data values
  • range: visual values
  • arrow (type): how data values are connected to visual values

Guides

Scales should usually be invertible to avoid ambiguity.

Guides help humans reverse the scale

(guide): visual value \(\to\) data value

The guide is a visual aid for performing the inverse scale function.

Seattle weather data

The primary data set used here is seattle-weather.csv from the vega datasets.

Modifying the domain

How depends on the type of scale

Continuous

Default domain includes 0 and all the data values

  • "scale": { "domain": [0, 100] }
  • "scale": { "domainMin": -3, "domainMax": 5, "domainMid": 0 }
  • "scale": { "unionWith": [0, 100] } (expanded to inlcude data)
  • "scale": { "zero": false} (don’t force 0 to be in domain)

Ordinal/Nominal

  • "scale": { "domain": ["A", "B", "E"] }

Temporal

  • "scale": { "domain": [{"hours": 0}, {"hours": 24}] }

A potential problem with domains

"y": {..., "scale": {"domain": [0,25]}}

Q: How can we fix this if it isn’t what we want? (3 ways)

Three fixes

  1. Use unionWith to expand domain to encompass data.

Three fixes

  1. Use unionWith to expand domain to encompass data.

  2. Filter the data so that only data in the domain are used.

Three fixes

  1. Use unionWith to expand domain to encompass data.

  2. Filter the data so that only data in the domain are used.

  3. Set the “clip” property of the mark.

"mark": {"type": "line", "clip": true}

Modifying the range

Modifying the range for x and y

Q. What will this do to the previous graphic?

  • {"y": ..., {"scale": {"range": [100, 150]}}}

Usually we want our data to fill the frame, so modifying the range for the x and y channels isn’t so interesting.

Modifying the range for size

More interesting for other channels, like size.

"mark": "point",
"size": {"field": "precipitation", "type": "quantitative",
         "scale": {"range": [25, 150]}},

Your turn:

  • Modify the range for size and see how it affects the graphic.
  • Try using “rangeMin” and/or “rangeMax” instead of “range”.

Modifying the range for color

For a color scale, the range determines the colors that are used.

"color": {
  "field": "weather", "type": "nominal", 
  "scale": {"range": ["red", "green", "blue", "skyblue", "orange"]}

But we will often use a different approach for picking colors.

Color schemes

Rather than picking specific colors for a color scale, we can choose a color scheme from this list of color schemes

"scale": {"scheme": "tableau10"}}

Your turn

  • Try some of the other colors schemes.

  • What happens if you also change the domain?

Facets

Two ways to think about faceting

  1. Facet operator [small mulitples]

    • each facet is basically a subplot
    • faceting filters data for each subplot
    • plotting in each subplot proceeds as for single view
    • subplots are arranged in rows/columns/grid
  2. Facets as encoding [Macro + micro scales]

    • x and y position determined by combinging scales from macro and micro variables
    • macro scale(s() used to pick the facet
    • micro scale(s) refine to a position with the facet

Vega-Lite uses the operator approach, but provides the encoding approach as a short-cut.

  • Encoding specifications are translated into operator specificaitons behind the scenes.

Facets and scales

Facets let us to use the x (and/or y) positional attribute twice.

  • Faceting variable(s) \(\to\) macro location (which facet)
  • Other variable(s) \(\to\) micro location within the facet.
visual position = macro + micro

Example

  • (discrete) macro range: [0, 100, 200] \(\to\) 100
  • (continuous) micro range: [0, 100] \(\to\) 35
  • combo: 100 + 35 = 135

facet, row, and column encodings

For the short-cut method, faceting is specified just like any other encoding property:

     "x": {"field": ..., "type": ..., title = ...}
"column": {"field": ..., "type": ..., title = ...}  // column facets
     "y": {"field": ..., "type": ..., title = ...}
   "row": {"field": ..., "type": ..., title = ...}  // row facets
 "facet": {"field": ..., "type": ..., title = ..., columns = ...} 

Since facets are labeled with variable values, it can be useful to add a title for the facets if the variable values are not sufficient.

Your turn

Create this graphic.

  • We’ll return to faceting later to talk about options. Examples:
    • same scales for each facet or different?
    • same size for each facet or different?

Dates and times

Preparing data for temporal scales

Vega-Lite has nice features for handling temporal (date/time) data, but sometimes we need to help out by modifying the data so that is looks temporal to Vega-Lite.

We can use transform to specify data transformations (e.g., compute new variables.)

"transform": [
  { "calculate": "datetime(datum.year, 0)",         # 0 = January
    "as": "year_as_date"                            # name of new variable
  }
]

This can be used to calculate other variables as well. (Similar to dplyr::mutate() in R).

Datetime functions in Vega-Lite

  • functions to create datetime objects

    • datetime(year, month, [day, minute, hour, sec, millisec])
  • functions to extract parts of dates (year, month, day of week, day of year, etc.)

  • date/time functions documentation

Your Turn

Improve this graphic.

  • Convert the year to a datetime and use a temporal scale.
  • Fix the scale for the y-axis as well.

Number and date formatting

Vega-Lite use D3’s number and date formatting specification system to format numbers and dates that appear on graphics.

This provides another solution to our year formatting:

"x": {"field": "year", "type": "quantitative", 
       "scale": {"zero": false}, "axis": {"format": "d"}},

Review

Let’s review: Scales

scale: data value \(\to\) visual value

Default scales are chosen based on the type of data used.

Three parts we can adjust:

  • domain: data values
  • range: visual values
  • arrow (type): how data values are connected to visual values

For color we can also choose a color scheme with

"scale": {"scheme": ...}}

Let’s review: Facets

The facet operator can be invoked by treating facets like another encoding channel.

     "x": {"field": ..., "type": ..., title = ...}
"column": {"field": ..., "type": ..., title = ...}  // column facets
     "y": {"field": ..., "type": ..., title = ...}
   "row": {"field": ..., "type": ..., title = ...}  // row facets
  "facet" {"field": ..., "type": ..., title = ..., columns = ...} 

Full control over facets requires using the facet operator, which we will cover in the context of other multi-view graphics.

Let’s review: The Calculate transform

"transform: [{"calculate": "expression", "as": ...}]"
  • "expression" can include a subset of javascript

  • learn about the Vega-Lite expression language here

  • Example use case: create datetime data for use with temporal scales.

Practice

  1. Improve this graphic in at least three ways. For each improvement decide whether you are fixing something that is “ugly”, “bad”, or “wrong” (according to Claus Wilke).
  1. Claus Wilke would probably consider this graphic to be “wrong”. What’s wrong with it? What can we do to fix the problem?

Solutions