Simple Vega-Lite specifications

JSON = Javascript Object Notation

JSON has become a data standard

  • for many pursposes
  • across many languages.

Vega-Lite specifications are JSON objects.

JSON: values

A value can be

  • a number (no distinction between integer or floating point)
  • a string (quoted with "")
  • true, false, or null
  • an object (stay tuned)
  • an array (stay tuned)

JSON: Objects

An object is an (unordered) list of key-value pairs

{
  "key1": value1,
  "key2": value2,
  ...
}
  • surrounded by curly braces
  • keys are quoted strings (quoted with "")
  • white space is ignored
{
  "name": "John Calvin",
  "height": 73.5,
  "weight": 205,
  "reformed": true
}

JSON: arrays

An array is an ordered list of values

  • surrounded by square brackets
  • values may be of the same type or different types
  • white space ignored
[ "one", 1, "wonderful" ]

A complex JSON object

{
  "name": "John Calvin",
  "height": 73.5, "weight": 205,
  "reformed": true,
  "bp": {"systolic": 120, "diastolic": 80},
  "friends": [{"first": "Martin", "last": "Luther"},
              {"first": "Thomas", "last": "Hobbes"}]
}

CSV \(\to\) JSON

The standard way to convert CSV to JSON is as an array of objects where each object represents one row of the data.

  • Note: R and Python data frames are column-oriented, JSON is row-oriented.
[ 
    { "name": "John Calvin", "height": 73.5, "weight": 205, },
    { "name": "Thomas Hobbes", "height": 71.5, "weight": 185, },
]

Note: dangling commas are allowed, which makes editing slightly easier.

Data for Vega-Lite

Data can be provided in several ways, including:

  • Included as JSON within the Vega-Lite specification

  • Imported as JSON or CSV file (from local file or from URL)

  • Python and R wrappers handle converting from data frames to something Vega-Lite can deal with.

Note: The “raw” data for a Vega-Lite graphic are sent to the browser.

  • Could be a data security concern
  • Could be a performace concern (see Vega Fusion)

Vega datasets

The Vega team has assembled some data sets for testing and examples.

Some data sets are in JSON format, some are in CSV format.

You can find out more about some of the data sets and where they came from here.

us-state-capital.json

gapminder.json

Hans Rosling’s 200 countries, 200 years, in 4 minutes

Vega-Lite spceification = JSON object

A Vega-Lite specifications are JSON objects that describe what sort of graphic should be rendered.

Vega-Lite \(\to\) Vega \(\to\) HTML + Javascript (or PNG or SVG)

Vega Editor

The Vega editor provides an online editor to create and render Vega and Vega-Lite graphics.

Vega-Lite Views

Complex vega-lite graphics are created by composing views.

We’ll start with the simplest case,

  • standalone single view graphic
  • using glyph-ready data

Later we will learn about

  • multiple views (layers, facets, repeats, and concatenations)
  • data transformations (inside vega-lite and before vega-lite)
  • interactive graphics
  • more customization
  • etc, etc.

Our first graphic

We are required to include “$schema” and at least one of “mark”, “layer”, “facet”, “hconcat”, “vconcat”, “concat”, or “repeat”

All but “mark” are used for complex graphics, so let’s make our minimal example by specifying a mark.

Our first graphic

{
  "$schema": "https://vega.github.io/schema/vega-lite/v5.json",
  "mark": "point",
}
'{
  "$schema": "https://vega.github.io/schema/vega-lite/v5.json",
  "width": 500, "height": 300,
  "mark": "point",
  "background": "skyblue"
}' |> vegawidget::as_vegaspec()

I added a background color ("background": "skyblue") so you can see that a graphic is being made. It just doesn’t have anything on it yet.

Let’s add some data

{
  "$schema": "https://vega.github.io/schema/vega-lite/v5.json",
  "data": {"url": "https://cdn.jsdelivr.net/npm/vega-datasets@2.8.0/data/gapminder.json"},
  "mark": "point",
  "background": "skyblue", "width": 100, "height": 100
}
'{
  "$schema": "https://vega.github.io/schema/vega-lite/v5.json",
  "data": {"url": "https://cdn.jsdelivr.net/npm/vega-datasets@2.8.0/data/gapminder.json"},
  "mark": "point",
  "background": "skyblue", "width": 100, "height": 100
}' |> as_vegaspec()

Your turn

  1. Why do you think the plot looks the way it does?

  2. Try some of the other marks and see what you get.

  3. What happens if you delete the width and height?

Adding an encoding

The encoding specifies how graphical properties are mapped and/or set.

{ 
  ...,
  "mark": "point",
  "encoding": {
    "x": {"field": "fertility", "type":  "quantitative"},
    "y": {"field": "life_expect", "type": "quantitative"},
    "color": {"value": "maroon"}
  }
}
'
{
  "$schema": "https://vega.github.io/schema/vega-lite/v5.json",
  "data": {"url": "https://cdn.jsdelivr.net/npm/vega-datasets@2.8.0/data/gapminder.json"},
  "height": 150, "width": 400,
  "mark": "point",
  "encoding": {
    "x": {"field": "fertility", "type":  "quantitative"},
    "y": {"field": "life_expect", "type": "quantitative"},
    "color": {"value": "maroon"}
  }
}' |> as_vegaspec()

Adding an encoding

Your Turn!

  1. Change the color of the dots to some other color you like.

  2. Encode fill instead of (or in addition to) color.

  3. Make the dots (a little) larger using the size property.

  4. Set fillOpacity to a number between 0 and 1. Experiment with some different values.

  5. Map the dot size to pop (the population of the country).

  6. What happens if you change the mark to something else? Try it and find out.

Filtering the data

These data cover years from 1955 to 2005. Let’s look at just one year.

{
  ...,
  "transform": [{"filter": "datum.year == 1955"}],
  ...
}
'
{
  "$schema": "https://vega.github.io/schema/vega-lite/v5.json",
  "height": 250, "width": 700,
  "data": {
    "url": "https://cdn.jsdelivr.net/npm/vega-datasets@2.8.0/data/gapminder.json"
  },
  "mark": "point",
  "transform": [{"filter": "datum.year == 1955"}],
  "encoding": {
    "x": {"field": "fertility", "type": "quantitative"},
    "y": {"field": "life_expect", "type": "quantitative"},
    "size": {"field": "pop", "type": "Q"},
    "fill": {"value": "maroon"},
    "fillOpacity": {"value": 0.6}
  }
}' |> as_vegaspec()

Adding Interaction

{
  "params": [{
    "name": "year",
    "value": 1955,
    "bind": {"input": "range", "min": 1955, "max": 2005, "step": 5},
    }],
  "transform": [{"filter": "datum.year == year"}],
  ...
}
'
{
  "$schema": "https://vega.github.io/schema/vega-lite/v5.json",
  "height": 250,
  "width": 700,
  "params": [
    {
      "name": "year",
      "value": 1955,
      "bind": {"input": "range", "min": 1955, "max": 2005, "step": 5}
    }
  ],
  "data": {
    "url": "https://cdn.jsdelivr.net/npm/vega-datasets@2.8.0/data/gapminder.json"
  },
  "mark": "point",
  "transform": [{"filter": "datum.year == year"}],
  "encoding": {
    "x": {"field": "fertility", "type": "quantitative"},
    "y": {"field": "life_expect", "type": "quantitative"},
    "size": {"field": "pop", "type": "Q"},
    "fill": {"value": "maroon"},
    "fillOpacity": {"value": 0.6}
  }
}' |> as_vegaspec()

Keeping the axes & legend fixed

'
{
  "$schema": "https://vega.github.io/schema/vega-lite/v5.json",
  "height": 250,
  "width": 700,
  "params": [
    {
      "name": "year",
      "value": 1955,
      "bind": {"input": "range", "min": 1955, "max": 2005, "step": 5}
    }
  ],
  "data": {
    "url": "https://cdn.jsdelivr.net/npm/vega-datasets@2.8.0/data/gapminder.json"
  },
  "mark": "point",
  "transform": [{"filter": "datum.year == year"}],
  "encoding": {
    "x": {
      "field": "fertility",
      "type": "quantitative",
      "scale": {"domain": [0, 9]}
    },
    "y": {
      "field": "life_expect",
      "type": "quantitative",
      "scale": {"domain": [0, 100]}
    },
    "size": {"field": "pop", "type": "Q",
      "scale": { "domain": [0, 1.5E9] }},
    "fill": {"value": "maroon"},
    "fillOpacity": {"value": 0.6}
  }
}' |> as_vegaspec()

Your turn!

  1. What component do we need to change?

  2. Guess how that change might be coded.

Keeping the axes & legend fixed

Here’s how to modify the x-scale.

 "encoding": {
    "x": {
      "field": "fertility",
      "type": "quantitative",
      "scale": {"domain": [0, 9]}
    }

Your turn!

  1. Do a similar thing for the other scales.

More modifications

Using a single year (or slider for year) …

  1. Map cluster to the fill of the circles. (What "type" will you use? Options: “quantitative”, “temporal”, “ordinal”, “nominal”.)

Still more modifications

Filter the data to look at just one country and then

  1. Create a scatter plot that shows the country’s fertility and life expectency each year.

  2. Connect the dots so you can see the “trail” for this country. (We haven’t talked about layers yet, so you will have the trail only, no dots.)

  3. Replace your filter with a selector widget that lets you interactively pick a country from a list of a few countries you are interested in. (Hint: {"input": "select", "options": [...]})

Example: Using a trail

'
{
  "$schema": "https://vega.github.io/schema/vega-lite/v5.json",
  "height": 450,
  "width": 700,
  "params": [
    {
      "name": "country",
      "value": "United States",
      "bind": {"input": "select", "options": ["United States", "Canada", "Mexico", "China", "Nigeria", "Egypt", "South Korea"]}
    }
  ],
  "data": {
    "url": "https://cdn.jsdelivr.net/npm/vega-datasets@2.8.0/data/gapminder.json"
  },
  "mark": "trail",
  "transform": [{"filter": "datum.country == country"}],
  "encoding": {
    "x": {
      "field": "fertility",
      "type": "quantitative",
      "scale": {"domain": [0,10]}
    },
    "y": {
      "field": "life_expect",
      "type": "quantitative",
      "scale": {"domain": [30,100]}
    },
    "opacity": {"value": 0.6},
    "size": {"field": "year"}
  }
}' |> as_vegaspec()

More Examples

'
{
  "$schema": "https://vega.github.io/schema/vega-lite/v5.json",
  "height": 450,
  "width": 700,
  "params": [
    {
      "name": "year",
      "value": 1955,
      "bind": {"input": "range", "min": 1955, "max": 2005, "step": 5}
    }
  ],
  "data": {
    "url": "https://cdn.jsdelivr.net/npm/vega-datasets@2.8.0/data/gapminder.json"
  },
  "mark": "point",
  "transform": [{"filter": "datum.year == year"}],
  "encoding": {
    "x": {
      "field": "fertility",
      "type": "quantitative",
      "scale": {"domain": [0, 9]}
    },
    "y": {
      "field": "life_expect",
      "type": "quantitative",
      "scale": {"domain": [0, 100]}
    },
    "size": {"field": "pop", "type": "Q", "scale": {"domain": [0, 1500000000]}},
    "fill": {"field": "cluster", "type": "nominal"},
    "stroke": {"field": "cluster", "type": "nominal"},
    "fillOpacity": {"value": 0.6}
  }
}' |> as_vegaspec()

Homework

Be sure to scroll to see the entire assignment.

  1. Read Chapters 1 and 2 of Claus Wilke’s Fundamentals of Data Visualization.

  2. Create at least two data graphics and submit the vega-lite specifications and some additional information using this form. For each plot,

    1. Use one of the vega data sets that is not the Gapminder data. (You may use the same data or different data for different plots.)

    2. Create a single view graphic that treats the data as glyph-ready and uses one of the following primitive mark types: area, bar, line, point, rect, text, trail.

    3. You may use the filter transformation if you like, but you shouldn’t use any other data transformations.

    4. Use a different mark type for each graphic (for some variety).

    5. Map at least one property that is not positional.

    6. Identify at least one place where you used something you read in Wilke’s book as you designed your graphic (or would have if you knew how). Do your best to be sure your plot would not be considered bad, ugly, or wrong by Wilke, at least to the extent that this is possible given what we know so far.

    7. Bonus: Bind a slider or selector that lets you change some feature of the graphic.