Vega-Lite: Aggregation
(in encodings)

Data 304

Aggregation

Aggregation = “groupwise” calculations

Example aggregation operations:

  • count
  • sum, product
  • mean, median, variance, stdev, q1, q3
  • min, max, argmin, argmax
  • missing, valid, values, distinct

Vega-Lite documentation includes a full list of aggregation operations that are available.

What are the groups?

If at least one fields in the specified encoding channels contain aggregate, the resulting visualization will show aggregate data. In this case, all fields without aggregation function specified are treated as group-by fields in the aggregation process.

Bar charts

If our data are not already aggregated, we can use aggregation to create a traditional bar chart.

"mark": "bar",
"encoding": {
  "x": {"field": "weather", "type": "nominal"},
  "y": {"aggregate": "count", "field": "temp_max"}
}

Horizontal bars

 "encoding": {
    "x": {"aggregate": "count", "field": "temp_max"},
    "y": {"field": "weather", "type": "nominal"}
  }

Ordering the bars

When creating bar charts, we should give some thought to the order of the bars. A pareto diagram orders the bars by their lengths.

  "encoding": {
    "x": {"aggregate": "count", "field": "temp_max"}, 
    "y": { "field": "weather", "type": "nominal",
      "sort": {"field": "weather", "op": "count", "order": "descending"}
    }
  }
  • Note the use of “op” here to indicate the aggregating operation.

Stacked bars

By default, filling the bars creates stacked bars.

  "encoding": {
    "y": {"aggregate": "count"},
    "x": {"field": "Cylinders", "type": "nominal"},
    "fill": {"field": "Origin"}
  }

Dodged bars

We can create dodged bars using “xOffset”.

  "encoding": {
    "y": {"aggregate": "count"},
    "x": {"field": "Cylinders", "type": "nominal"},
    "fill": {"field": "Origin"},
    "xOffset": {"field": "Origin"}
  }

Caution about unintended stacking

Be sure you understand your data lest, you end up with unintended stacking!

{
  ..., 
  "data": {"url": "https://cdn.jsdelivr.net/npm/vega-datasets@2.11.0/data/gapminder.json"},
  "mark": "bar",
  "encoding": {
    "x": {"field": "country"},
    "y": {"field": "life_expect", "type": "quantitative"}
  }
}

Caution about unintended stacking

Adding some color makes it more obvious what is happening, but adding life expectency over multiple years is a non-sensical operation.

{
  ..., 
  "data": {"url": "https://cdn.jsdelivr.net/npm/vega-datasets@2.11.0/data/gapminder.json"},
  "mark": "bar",
  "encoding": {
    "x": {"field": "country"},
    "y": {"field": "life_expect", "type": "quantitative"},
    "fill": {"field": "year", "type": "nominal"}
  }
}

Caution about interpreting stacking

Another (synthetic) example

Histograms

Q. How can we make a histogram?

A. Combine binning and aggregation!

 "mark": "bar",
  "encoding": {
    "y": {"aggregate": "count", "field": "temp_max"},
    "x": {
      "field": "temp_max",
      "type": "quantitative",
      "bin": true
    }
  }

Details

The detail encoding channel puts rows of the data into groups without otherwise adding a visual element. This can be used for

  • creating multiple lines (of the same color)
  • aggregating by groups (without adding a visual element)

Detail: Example

  "encoding": {
    "x": {"field": "yeart", "type": "temporal"},
    "color": {"field": "cluster"},
    "y": {"field": "fertility", "type": "quantitative", "scale": {"title": "year"}},
    "detail": {"field": "country"},
    "facet": {"field": "cluster", "columns": 3}
  }

Using text as a mark

  "mark": {"type": "text", "size": 20, "opacity": 0.8},
  "encoding": {
    "y": {"field": "Miles_per_Gallon", "aggregate": "mean"},
    "color": {"field": "Origin"},
    "x": {"field": "Year", "type": "temporal"},
    "text": {"field": "Cylinders"}
  }

Pie chart = bar chart with arc marks

{ ..., 
  "mark": "arc",
  "encoding": {
    "x": {
      "field": "weather", "aggregate": "count"
    },
    "color": {"field": "weather", 
              "sort": {"field": "weather", "op": "count"}},
    "order": {"field": "weather", "op": "count"}
}
{ ..., 
  "mark": "arc",
  "encoding": {
    "theta": {
      "field": "weather", "aggregate": "count"
    },
    "color": {"field": "weather", "sort": {"field": "weather", "op": "count"}},
    "order": {"field": "weather", "op": "count"}
}

Bug in the documentation or implementation of order?

Seattle Weather Exercises

Use https://calvin-data304.netlify.app/data/weather-with-dates.csv. This is a weather.csv from the Vega-Lite data sets with some additional date columns included to make your life easier. You may find this documentation for date-time functions helpful.

You may find it handy to sketch a graph before trying to code it.

Exercise 1  

  1. Create a graphic that shows the high temperature in Seattle each day.

  2. Now modify this so that the temperatures for the same day of the year are overlaid on top of each other for the several years in the data set.

Exercise 2 Create a graphic that shows the mean temperature for each month. How many “months” should you be displaying? (There is more than one answer to this – perhaps try doing it more that one way.)

Exercise 3 Create a graphic that shows how the different types of weather (rain, fog, etc.) are distributed by month in Seattle. When is it rainiest in Seattle? Sunniest?