Data and Graphics Challenge

Data 304: Visualizing Data and Models

Prelude

Setup

R/vegabrite
Python/Altair

library(tidyverse)
library(vegabrite)
vega_data <- altair::import_vega_data()

import pandas as pd
import altair as alt

Kinds of Challenges

Design challenges: What graphic do I want to create?
Technical challenges: How do I make that graphic?
Ethical challenges: Is this a good idea?

Technical challenges

Some common data operations

Many graphics rely on obtaining, tidying, cleaning, and wrangling data

inspect data (what data types? what values? strange/surprising values?)
data cleaning (recoding values, missing data, etc.)
wide \(\leftrightarrow\) long
compute new variables from existing variables
rename variables
merge data from multiple sources
aggregation

Most can be done either in Vega-Lite or before (Python, R, etc.)

Getting help

It is hard (impossible?) to remember every option of every component of a graphic. So we need to learn how to use the help that is avalable to us.

code completion and help as you type
documentation for Vega-Lite, Altair/altair, vegabrite
example galleries
course slides (use search feature on website)

Debugging

The Vega Editor often provides more and more useful debugging help than the R and Python packages. So if you can’t figure something out, try exporting the JSON and pasting it into the Vega Editor.

Design challenges

Knaflic’s 6 step process

Step 1: Understand the context

Step 2: Choose and appropriate visual

What comparison am I trying to make?
Do a data and graphics inventory to come up with ideas.
Try multiple ideas and get feedback.

Step 3: Eliniate clutter

Elements of graphics can be turned off (or made transparent or muted).

Step 4: Draw attention where you want it

Leverage the Gestalt Principles.

Step 4: Draw attention – Many ways

Figures 4.3f, 4.3j, 4.3m, and 4.3p from Knaflic (2020)…

Step 4: Draw attention

Connect to story

Figures 4.2b and 4.2h from Knaflic (2020).

Step 5: Think like a designer

Determine the Why before the What.
First make the graphic correct and useful, then polish it.

Step 6: Tell a story

A graphic should tell (a part of) a story
Titles and annotation text can be used to help tell the story
Consistency across multiple graphics in a story helps

Manual Lima’s Information Visualization Manifesto (summary)

Form follows function
Start with a question
Interactivity is key
The power of narrative
Do not glorify aesthetics
Look for relevancy
Embrace time
Aspire for knowledge
Avoid gratuitous visualizations

Asside: 2 Manuel Lima books

Ethical challenges

3 ways to think about ethics

Deontological (arrow): Duty; does the action follow the rules/guidelines?
Consequentialist (outcome): What are the consequences of this action?
Virtue (doer):: What kind of person would do this?; What kind of person will I become if I do this?

3 ways to think about ethics

Suppose it is obvious that someone in need should be helped.

A utilitarian will point to the fact that the consequences of doing so will maximize well-being,

a deontologist to the fact that, in doing so the agent will be acting in accordance with a moral rule such as “Do unto others as you would be done by”, and

a virtue ethicist to the fact that helping the person would be charitable or benevolent.

Source: https://plato.stanford.edu/entries/ethics-virtue/; bullets and emphasis mine.

Hippocratic Oath for Visualization

Jason Moore of the US Air Force Research Laboratory proposed:

I shall not use visualization to intentionally hide or confuse the truth which it is intended to portray. I will respect the great power visualization has in garnering wisdom and misleading the uninformed. I accept this responsibility wilfully and without reservation, and promise to defend this oath against all enemies, both domestic and foreign.

5/7/10 principles for ethical visualization

You can find variations on these from numerous sources.

Accuracy and honesty: Data visualizations should correctly represent the underlying data and not deliberately mislead or deceive the audience.

Clarity and simplicity: Visualizations should be designed to make the data easier to understand, avoiding unnecessary complexity or clutter. Striking a balance between aesthetics and functionality is key to ensuring that the message is clear.

Fairness and objectivity: Data visualizers should strive to present data objectively, without introducing personal bias or promoting stereotypes.

Privacy and trust: We should be mindful of potential privacy concerns and adhere to relevant laws, regulations, and ethical guidelines to protect sensitive information.

Inclusiveness and accessibility: This includes using color schemes readable by individuals with color vision deficiencies. It also can mean providing alternative text descriptions (alt text) for visually impaired users. It also means considering cultural sensitivities when designing visuals.

ARIA in Vega-Lite

Accessibility properties are used to determine ARIA (Accessible Rich Internet Applications) attributes when using Vega to render SVG output.

Use description = "..." to add text descriptions of a graphic or an element of a graphic (axis, legend, etc.)
Use aria = FALSE to set the “aria-hidden” attribute and remove the element from teh ARIA tree.
Vega-Lite accessiblity review

Writing alt text

Some tips for writing alt text

Code

weather <- read_csv(vega_data$seattle_weather$url)

seattle_weather_graphic <-
  vl_chart() |>
  vl_add_data_url(vega_data$seattle_weather$url) |>
  vl_mark_point() |>
  vl_encode_x("date:T", title = NA) |>
  vl_encode_y("temp_max:Q", title = "High Temperature (C)") |>
  vl_add_properties(
    description = "A scatter plot of high temperatures (in degrees Celsius)
    vs date for Seattle, WA, from 2012 through 2015. Temperatures 
    rarely go above 35 or below 0. Strong annual periodicity.",
    width = 800, height = 300)
 
seattle_weather_graphic

Checking your colors for accessbility

For websites:

https://www.toptal.com/designers/colorfilter

For images:

In R:

Advice:

Use palettes and color schemes rather than picking your own colors.
When possible, don’t use only hue.

Some questions to ask

What story is the visualization telling you?
What is the motivation for the visualization (and the story)?
What has been left out?
Who/what is impacted? Who are the stakeholders?

Some practical advice

Have paper and pencil ready

draw sketches of graphics before trying to code them
- avoids wasting time making graphics you don’t want
- helps clarify the process/design
draw “sketches” of data
- ideally, what would on row look like?
- what does one row actually look like?
- want adjustments do you need to make?
- do you need to combine multiple data sources?

Ask what before how

What do you want the computer to do?
- be specific (“make a plot” is not specific enough)
What does the computer need to know to do that?
- data, encoding channels, variables (fields), etc.
How do you get the computer to do that? (the code)
- Don’t write code until you have clear answers to 1 and 2.

The sketches you make will help with this.

Know where you are on the graphics spectrum:

exploratory \(\leftrightarrow\) polished for presentation

A good graphics system should allow

rapid prototyping
- easily create (multiple) reasonable¹ graphics to explore data
- audience: creator of graphic or others on “the team”
fine tuning
- customization for presentation/publication/story telling
- audience: usually some other person or group

template <-
  vl_chart(...) |>
  vl_mark_point(...) |>   # or some other mark
  vl_encode_x(...) |>
  vl_encode_y(...) |>
  vl_add_data(...)

template = 
  alt.Chart(data).mark_marktype(...).encode(
    x = ..., # or alt.X(...)
    y = ..., # or alt.Y(...)
    ...
  )

Add complexity as you go along

transforms
composition (layers, concatenations, repeats, facets)
scale adjustments
etc.

Use good programming habits

Assign graphic components to variables and compose them

R/vegabrite
Python/Altair

layer1 <- vl_chart(...) |> ...
layer2 <- vl_chart(...) |> ...
layer1 + layer2
layer1 & layer2
vl_layer(layer1, layer2)

layer1 = alt.Chart()...
layer2 = alt.Chart()...
layer1 + layer2
layer1 & layer2
alt.layer(layer1, layer2, data = ...)

Style matters
- space after commas!
- space around = ?
Name things well
Use comments as needed; but don’t put analysis discussion in comments.
Consider writing wrapper functions to avoid repetitive code
etc.

The Challenges

Challenge #1: Email marketing

Context: Response and completion rates for an email marketing campaign where email recipients were asked to complete a survey.

Data Source: JSON, Excel, [HW 4, (Knaflic 2020, p 96)]

Challenge: Identify issues with this graphic and make something better. (See next slide first.)

Partial solution?

Many of you made graphics something like this on for HW 4.

Challenge: Identify issues with this graphic and make something better.

R/vegabrite
Python/Altair

Code

mailing <- jsonlite::fromJSON("../data/swd-lets-practice-ex-2-13.json")
mailing |> slice_head(n = 2)

     Date Completion Rate Response Rate
1 Q1-2017            0.91         0.023
2 Q2-2017            0.93         0.018

Code

base <- 
  vl_chart(mailing) |>
  vl_mark_line(point = TRUE) |>
  vl_encode_x("Date:O") 

completion <-
  base |>
  vl_encode_y("Completion Rate:Q") |>
  vl_encode_color(datum = "Completion", type = "nominal")
  
response <- 
  base |>
  vl_encode_y("Response Rate:Q") |>
  vl_encode_color(datum = "Response", type = "nominal")

(response + completion) |> vl_add_properties(width = 800, height = 200)

Code

mailing = pd.read_json('../data/swd-lets-practice-ex-2-13.json')
mailing.head(2)

      Date  Completion Rate  Response Rate
0  Q1-2017             0.91          0.023
1  Q2-2017             0.93          0.018

Code

base = alt.Chart(mailing).mark_line(point = True).encode(
  x = "Date:O",
  color = alt.Color(datum = "Completion", type = "nominal")
)

completion = base.encode(
  y = "Completion Rate:Q", 
  color = alt.Color(datum = "Completion"))

response = base.encode(
  y = "Response Rate:Q", 
  color = alt.Color(datum = "Response"))

chart = response + completion

json = chart.to_json()  # for pasting into Vega Editor or inspection

chart.properties(width = 800, height = 200)

Challenge #2: Customer Touchpoints

Context: Customer touchpoints (email, email, or chat) over time.

Data Source: CSV, Excel, (Knaflic 2020, p 206)]

Challenge: Identify issues with this graphic and make something better.

Challenge #3: Net Promoter Score

Context: Net promoter score over time for a company and its competitors.

Challenge #3: Net Promoter Score

Context: Net promoter score over time for a company and its competitors.

Data Source: Excel (multiple sheets!), (Knaflic 2020, p 342)

Challenge: Investigate

How our business compares to other businesses (over time and in February 2020)
How the components of the net promoter score for our business have changed over time

Challenge #4: Tickets

Here’s a page from a monthly report on ticket volume and related metrics. (Click on the image to see a larger version.)

:::

Challenge #4: Tickets

Data: Excel file

Challenges

Write a sentence describing a key takeaway for each graph shown in the report.
Imagine you need to tell a story with this data: which parts of the report would you focus on and which (if any) would you omit? It may be important to look at all of these things as we are exploring the data, but not all of the data is necessarily equally interesting when it comes to communicating it to our audience.
Use the data to create a webpage or slide deck¹ to tell a visual story with the elements you selected to include in Step 2. Imagine this will be read by someone who is somewhat familiar with the data, but doesn’t deal with on a daily basis. Be sure to include enough text/context to walk them through your story.

Challenge #5: Data Visualization Survey

The Data Visualization Society did a survey of its members and posted the data online so folks could visualize it (seems fitting).

Challenge: Create and interesting and informative graphic about some aspect(s) of the survey. (The survey has up to 73 items, so you will have to focus on just a portion of the survey responses.)

Info about the data: https://www.datavisualizationsociety.org/survey-history

includes links to some of the winning visualizations from the contest.

Links to data sets:

Dataset 1: Job Titles • Dataset 2: Characteristics • Dataset 3: Employment
Dataset 4: Visualization • Dataset 5: Challenges

References

Knaflic, C. N. 2020. Storytelling with Data: Let’s Practice! Wiley. https://github.com/Saurav6789/Books-/blob/master/Storytelling%20with%20Data%20Let%E2%80%99s%20Practice%20by%20Cole%20Nussbaumer%20Knaflic%20(z-lib.org).pdf.