Data 304
Aggregation = “groupwise” calculations
Example aggregation operations:
Vega-Lite documentation includes a full list of aggregation operations that are available.
If at least one fields in the specified encoding channels contain aggregate, the resulting visualization will show aggregate data. In this case, all fields without aggregation function specified are treated as group-by fields in the aggregation process.
If our data are not already aggregated, we can use aggregation to create a traditional bar chart.
"mark": "bar",
"encoding": {
"x": {"field": "weather", "type": "nominal"},
"y": {"aggregate": "count", "field": "temp_max"}
}
When creating bar charts, we should give some thought to the order of the bars. A pareto diagram orders the bars by their lengths.
By default, filling the bars creates stacked bars.
We can create dodged bars using “xOffset”.
Be sure you understand your data lest, you end up with unintended stacking!
{
...,
"data": {"url": "https://cdn.jsdelivr.net/npm/vega-datasets@2.11.0/data/gapminder.json"},
"mark": "bar",
"encoding": {
"x": {"field": "country"},
"y": {"field": "life_expect", "type": "quantitative"}
}
}
Adding some color makes it more obvious what is happening, but adding life expectency over multiple years is a non-sensical operation.
{
...,
"data": {"url": "https://cdn.jsdelivr.net/npm/vega-datasets@2.11.0/data/gapminder.json"},
"mark": "bar",
"encoding": {
"x": {"field": "country"},
"y": {"field": "life_expect", "type": "quantitative"},
"fill": {"field": "year", "type": "nominal"}
}
}
Q. How can we make a histogram?
The detail encoding channel puts rows of the data into groups without otherwise adding a visual element. This can be used for
Bug in the documentation or implementation of order?
Use https://calvin-data304.netlify.app/data/weather-with-dates.csv. This is a weather.csv
from the Vega-Lite data sets with some additional date columns included to make your life easier. You may find this documentation for date-time functions helpful.
You may find it handy to sketch a graph before trying to code it.
Exercise 1
Create a graphic that shows the high temperature in Seattle each day.
Now modify this so that the temperatures for the same day of the year are overlaid on top of each other for the several years in the data set.
Exercise 2 Create a graphic that shows the mean temperature for each month. How many “months” should you be displaying? (There is more than one answer to this – perhaps try doing it more that one way.)
Exercise 3 Create a graphic that shows how the different types of weather (rain, fog, etc.) are distributed by month in Seattle. When is it rainiest in Seattle? Sunniest?