Data 304
When designing a data graphic, big decisions include
What sort of mark to use.
What visual properties to use for which variables.
encoding in Vega-Lite.What scales to use to map data values to visual values.
Today’s topic: How to adjust scales in Vega-Lite.
Scales are functions that transform a domain of data values (numbers, dates, strings, etc.) to a range of visual values (pixels, colors, sizes).
So three parts we can adjust:
Scales should usually be invertible to avoid ambiguity.
Guides help humans reverse the scale
The guide is a visual aid for performing the inverse scale function.
The primary data set used here is seattle-weather.csv from the vega datasets.
Default domain includes 0 and all the data values
"scale": { "domain": [0, 100] }"scale": { "domainMin": -3, "domainMax": 5, "domainMid": 0 }"scale": { "unionWith": [0, 100] } (expanded to inlcude data)"scale": { "zero": false} (don’t force 0 to be in domain)"scale": { "domain": ["A", "B", "E"] }"scale": { "domain": [{"hours": 0}, {"hours": 24}] }"y": {..., "scale": {"domain": [0,25]}}
Q: How can we fix this if it isn’t what we want? (3 ways)
unionWith to expand domain to encompass data.Use unionWith to expand domain to encompass data.
Filter the data so that only data in the domain are used.
Use unionWith to expand domain to encompass data.
Filter the data so that only data in the domain are used.
Set the “clip” property of the mark.
"mark": {"type": "line", "clip": true}
Q. What will this do to the previous graphic?
{"y": ..., {"scale": {"range": [100, 150]}}}Usually we want our data to fill the frame, so modifying the range for the x and y channels isn’t so interesting.
More interesting for other channels, like size.
"mark": "point",
"size": {"field": "precipitation", "type": "quantitative",
"scale": {"range": [25, 150]}},
Exercise 1 Modify the size scale of this graphic.
For a color scale, the range determines the colors that are used.
"color": {
"field": "weather", "type": "nominal",
"scale": {"range": ["red", "green", "blue", "skyblue", "orange"]}
But we will often use a different approach for picking colors.
Rather than picking specific colors for a color scale, we can choose a color scheme from this list of color schemes
"scale": {"scheme": "tableau10"}}
Exercise 2
Try some of the other colors schemes.
What happens if you also change the domain?
Facet operator [small mulitples]
Facets as encoding [Macro + micro scales]
Vega-Lite uses the operator approach, but provides the encoding approach as a short-cut.
Facets let us to use the x (and/or y) positional attribute twice.
Example
[0, 100, 200] \(\to\) 100[0, 100] \(\to\) 35For the short-cut method, faceting is specified just like any other encoding property:
"x": {"field": ..., "type": ..., title = ...}
"column": {"field": ..., "type": ..., title = ...} // column facets
"y": {"field": ..., "type": ..., title = ...}
"row": {"field": ..., "type": ..., title = ...} // row facets
"facet": {"field": ..., "type": ..., title = ..., columns = ...}
Since facets are labeled with variable values, it can be useful to add a title for the facets if the variable values are not sufficient.
Exercise 3 Create this graphic.
Hint: Start with the code for the graphic on the previous slide.
We’ll return to faceting later to talk about options. Examples:
Vega-Lite has nice features for handling temporal (date/time) data, but sometimes we need to help out by modifying the data so that is looks temporal to Vega-Lite.
We can use transform to specify data transformations (e.g., compute new variables.)
"transform": [
{ "calculate": "datetime(datum.year, 0)", # 0 = January
"as": "year_as_date" # name of new variable
}
]
This can be used to calculate other variables as well. (Similar to dplyr::mutate() in R).
functions to create datetime objects
datetime(year, month, [day, minute, hour, sec, millisec])
functions to extract parts of dates (year, month, day of week, day of year, etc.)
Exercise 4 Improve this graphic.
Vega-Lite uses D3’s number and date formatting specification system to format numbers and dates that appear on graphics.
This provides another solution to our year formatting:
"x": {"field": "year", "type": "quantitative",
"scale": {"zero": false}, "axis": {"format": "d"}},
Default scales are chosen based on the type of data used.
Three parts we can adjust:
For color we can also choose a color scheme with
"scale": {"scheme": ...}}
The facet operator can be invoked by treating facets like another encoding channel.
"x": {"field": ..., "type": ..., title = ...}
"column": {"field": ..., "type": ..., title = ...} // column facets
"y": {"field": ..., "type": ..., title = ...}
"row": {"field": ..., "type": ..., title = ...} // row facets
"facet": {"field": ..., "type": ..., title = ..., columns = ...}
Full control over facets requires using the facet operator, which we will cover in the context of other multi-view graphics.
"transform: [{"calculate": "expression", "as": ...}]"
"expression" can include a subset of javascript
learn about the Vega-Lite expression language here
Example use case: create datetime data for use with temporal scales.
Exercise 5 Improve this graphic in at least three ways. For each improvement decide whether you are fixing something that is “ugly”, “bad”, or “wrong” (according to Claus Wilke).
Exercise 6 Claus Wilke would probably consider this graphic to be “wrong”. What’s wrong with it? What can we do to fix the problem?
Feel free to make other improvements as well.
Exercise 7 Improve this graphic.
Start by fixing obvious things (like the x encoding, which has several problems). As you improve it, you will likely discover other things that should be improved. Keep iterating until you are satisfied with your graphic (or run into things we don’t yet know how to do).
Exercise 8 Experiment with different color scales or schemes in the previous examples.