Data 304: Visualizing Data and Models
You probably already know it and have a workflow.
Many more tools for data wrangling, modeling, etc.
Syntax advantages, easier to reuse code.
Some parts of the JSON creation can be automated (less typing).
You are probably going to be using R or Python for other parts of your work anyway.
vegawidget::as_vegaspec()
Two ways to create the Vega-Lite specification:
Write JSON as string
Write “list-of-lists” version of JSON spec
vegabrite
altair
From the vegabrite github site:
The goal of vegabrite is to provide an R api for building up vega-lite specs… This package is still experimental but has a mostly complete interface for building out Vega-Lite specs and charts. There is still lots of room for improvement in terms of better error handling and warnings when making invalid specs… Much of the public API is auto-generated…
vl_
functions create/modify parts of Vega-Lite specification.
groups of functions with similar 3-part names add/modify components of spec.
vl_mark_<marktype>()
vl_encode_<channel>()
vl_sort_<channel>_by_encoding()
, vl_sort_<channel>_by_field()
vl_scale_<channel>()
, vl_legend_<channel>()
vl_axis_<x|y>
, vl_remove_axis_<x|y>()
vl_facet_<|row|col>()
vl_repeat<layer|col|row|wrap>()
vl_config<element to configure>()
etc.transform functions don’t use word “transform”
vl_calculate()
, vl_fold()
, vl_lookup()
, vl_aggregate_<channel>()
, etc.layering: +
, vl_layer()
,
concatenation: |
, &
, vl_hconcat()
, vl_vconcat()
, vl_concat()
facets: vl_facet()
, vl_facet_row()
, vl_facet_column()
,
repeat: vl_repeat_layer()
, vl_repeat_row()
, vl_repeat_col()
, vl_repeat_wrap()
config: vl_conig_<thing to configure>()
– lots of these
From documentation:
Vega-Altair is a declarative statistical visualization library for Python, based on Vega and Vega-Lite.
It offers a powerful and concise grammar that enables you to quickly build a wide range of statistical visualizations.
From me:
It is a Pythonification of the Vega-Lite JSON spceification.
alt.Char()
.
) to add elements to the specification.alt.Chart(cars).mark_point().encode(
x = 'Horsepower',
y = 'Miles_per_Gallon',
color = 'Origin',
).interactive()
interactive([name, bind_x, bind_y]) | Make chart axes scales interactive.
We can inspect the JSON and see that it inserts
We can get the same effect in vegabrite
by coding up this binding ourselves.
The Python style guide (PEP 8) recommends using implicit line continuation. An implicit line continuation happens whenever Python gets to the end of a line of code and sees that there’s more to come because a parenthesis ((
), square bracket ([
) or curly brace ({
) has been left open.
This is sometimes clunky for altair
code (and for method chaining in general).
One trick: Enclose the whole thing in parens; break lines at methods.
This package uses
reticulate
to provide an interface to the Altair Python package, and thevegawidget
package to render charts as htmlwidgets.
In other words: An R wrapper around the Python package.
alt
is a refernce to Python alt
..
with $
.vegabrite
Altair/altair
Altair
seems to be actively supported.altair
is derivative, so it needs less maintenance, but that support also seems to lag a bit (e.g., CRAN version is using an outdated version of Altair
).library(dplyr)
Weather <- mosaicData::Weather |> mutate (
year = lubridate::year(date),
month = lubridate::month(date),
day = lubridate::day(date)
)
Weather |> head(3) |> pander::pander()
city | date | year | month | day | high_temp | avg_temp | low_temp |
---|---|---|---|---|---|---|---|
Auckland | 2016-01-01 | 2016 | 1 | 1 | 68 | 65 | 62 |
Auckland | 2016-01-02 | 2016 | 1 | 2 | 68 | 66 | 64 |
Auckland | 2016-01-03 | 2016 | 1 | 3 | 77 | 72 | 66 |
high_dewpt | avg_dewpt | low_dewpt | high_humidity | avg_humidity |
---|---|---|---|---|
64 | 60 | 55 | 100 | 82 |
64 | 63 | 61 | 100 | 94 |
70 | 67 | 64 | 100 | 91 |
low_humidity | high_hg | avg_hg | low_hg | high_vis | avg_vis | low_vis |
---|---|---|---|---|---|---|
68 | 30.15 | 30.09 | 30.01 | 6 | 6 | 4 |
88 | 30.04 | 29.9 | 29.8 | 6 | 5 | 1 |
74 | 29.8 | 29.73 | 29.68 | 6 | 6 | 1 |
high_wind | avg_wind | low_wind | precip | events |
---|---|---|---|---|
21 | 15 | 28 | 0 | Rain |
33 | 21 | 46 | 0 | Rain |
18 | 12 | NA | 0 | Rain |
city date year month ... avg_wind low_wind precip events
0 Auckland 2016-01-01 2016.0 1.0 ... 15.0 28.0 0 Rain
1 Auckland 2016-01-02 2016.0 1.0 ... 21.0 46.0 0 Rain
2 Auckland 2016-01-03 2016.0 1.0 ... 12.0 NaN 0 Rain
[3 rows x 25 columns]
It seems that vegabrite
uses a different method to pass dates along to JSON.
In vegabrite
, we can use Weather$date
as is.
In altair
, we need to remove that column, after extracting the year, month, and day, and then use a calculate transform to recreate the date from those. If we don’t, we get errors about JSON serialization of date/datetime objects.
“Altair is designed to work best with pandas timeseries.”
See https://altair-viz.github.io/user_guide/times_and_dates.html for more info about dates in Vega-Altair.
Working with dates, times, and timezones is often one of the more challenging aspects of data analysis. In Altair, the difficulties are compounded by the fact that users are writing Python code, which outputs JSON-serialized timestamps, which are interpreted by Javascript, and then rendered by your browser. At each of these steps, there are things that can go wrong, but Altair and Vega-Lite do their best to ensure that dates are interpreted and visualized in a consistent way.
Weather = r.Weather.drop('date', axis = 1)
(alt.Chart(Weather, width = 800, height = 55)
.mark_area()
.transform_calculate(date = "datetime(datum.year, datum.month, datum.day)")
.encode(
x = alt.X("date:T", title = ""),
y = alt.Y("high_temp:Q", title = "temperature"),
y2 = "low_temp:Q",
row = "city:N")
)
vl_chart(width = 800, height = 55) |>
vl_mark_area() |>
vl_encode_x("date:T", title = "") |>
vl_encode_y("high_temp:Q", title = "temperature") |>
vl_encode_y2("low_temp:Q") |>
vl_facet_row("city:N", title = "") |>
vl_add_data(Weather) |>
vl_add_properties(title = "High and low temperaturs in several cities")