Data 304: Visualizing Data and Models
Design challenges: What graphic do I want to create?
Technical challenges: How do I make that graphic?
Ethical challenges: Is this a good idea?
Many graphics rely on obtaining, tidying, cleaning, and wrangling data
Most can be done either in Vega-Lite or before (Python, R, etc.)
It is hard (impossible?) to remember every option of every component of a graphic. So we need to learn how to use the help that is avalable to us.
The Vega Editor often provides more and more useful debugging help than the R and Python packages. So if you can’t figure something out, try exporting the JSON and pasting it into the Vega Editor.
Figures 4.3f, 4.3j, 4.3m, and 4.3p from Knaflic (2020)…
Figures 4.2b and 4.2h from Knaflic (2020).
Suppose it is obvious that someone in need should be helped.
- A utilitarian will point to the fact that the consequences of doing so will maximize well-being,
- a deontologist to the fact that, in doing so the agent will be acting in accordance with a moral rule such as “Do unto others as you would be done by”, and
- a virtue ethicist to the fact that helping the person would be charitable or benevolent.
Source: https://plato.stanford.edu/entries/ethics-virtue/; bullets and emphasis mine.
Jason Moore of the US Air Force Research Laboratory proposed:
I shall not use visualization to intentionally hide or confuse the truth which it is intended to portray. I will respect the great power visualization has in garnering wisdom and misleading the uninformed. I accept this responsibility wilfully and without reservation, and promise to defend this oath against all enemies, both domestic and foreign.
You can find variations on these from numerous sources.
Accuracy and honesty: Data visualizations should correctly represent the underlying data and not deliberately mislead or deceive the audience.
Clarity and simplicity: Visualizations should be designed to make the data easier to understand, avoiding unnecessary complexity or clutter. Striking a balance between aesthetics and functionality is key to ensuring that the message is clear.
Fairness and objectivity: Data visualizers should strive to present data objectively, without introducing personal bias or promoting stereotypes.
Privacy and trust: We should be mindful of potential privacy concerns and adhere to relevant laws, regulations, and ethical guidelines to protect sensitive information.
Inclusiveness and accessibility: This includes using color schemes readable by individuals with color vision deficiencies. It also can mean providing alternative text descriptions (alt text) for visually impaired users. It also means considering cultural sensitivities when designing visuals.
Accessibility properties are used to determine ARIA (Accessible Rich Internet Applications) attributes when using Vega to render SVG output.
Use description = "..."
to add text descriptions of a graphic or an element of a graphic (axis, legend, etc.)
Use aria = FALSE
to set the “aria-hidden” attribute and remove the element from teh ARIA tree.
Some tips for writing alt text
weather <- read_csv(vega_data$seattle_weather$url)
seattle_weather_graphic <-
vl_chart() |>
vl_add_data_url(vega_data$seattle_weather$url) |>
vl_mark_point() |>
vl_encode_x("date:T", title = NA) |>
vl_encode_y("temp_max:Q", title = "High Temperature (C)") |>
vl_add_properties(
description = "A scatter plot of high temperatures (in degrees Celsius)
vs date for Seattle, WA, from 2012 through 2015. Temperatures
rarely go above 35 or below 0. Strong annual periodicity.",
width = 800, height = 300)
seattle_weather_graphic
For websites:
For images:
In R:
Advice:
What do you want the computer to do?
What does the computer need to know to do that?
How do you get the computer to do that? (the code)
The sketches you make will help with this.
rapid prototyping
easily create (multiple) reasonable1 graphics to explore data
audience: creator of graphic or others on “the team”
fine tuning
customization for presentation/publication/story telling
audience: usually some other person or group
Style matters
Name things well
Use comments as needed; but don’t put analysis discussion in comments.
Consider writing wrapper functions to avoid repetitive code
etc.
Context: Response and completion rates for an email marketing campaign where email recipients were asked to complete a survey.
Data Source: JSON, Excel, [HW 4, (Knaflic 2020, p 96)]
Challenge: Identify issues with this graphic and make something better. (See next slide first.)
Many of you made graphics something like this on for HW 4.
Challenge: Identify issues with this graphic and make something better.
Date Completion Rate Response Rate
1 Q1-2017 0.91 0.023
2 Q2-2017 0.93 0.018
base <-
vl_chart(mailing) |>
vl_mark_line(point = TRUE) |>
vl_encode_x("Date:O")
completion <-
base |>
vl_encode_y("Completion Rate:Q") |>
vl_encode_color(datum = "Completion", type = "nominal")
response <-
base |>
vl_encode_y("Response Rate:Q") |>
vl_encode_color(datum = "Response", type = "nominal")
(response + completion) |> vl_add_properties(width = 800, height = 200)
Date Completion Rate Response Rate
0 Q1-2017 0.91 0.023
1 Q2-2017 0.93 0.018
base = alt.Chart(mailing).mark_line(point = True).encode(
x = "Date:O",
color = alt.Color(datum = "Completion", type = "nominal")
)
completion = base.encode(
y = "Completion Rate:Q",
color = alt.Color(datum = "Completion"))
response = base.encode(
y = "Response Rate:Q",
color = alt.Color(datum = "Response"))
chart = response + completion
json = chart.to_json() # for pasting into Vega Editor or inspection
chart.properties(width = 800, height = 200)
Context: Customer touchpoints (email, email, or chat) over time.
Data Source: CSV, Excel, (Knaflic 2020, p 206)]
Challenge: Identify issues with this graphic and make something better.
Context: Net promoter score over time for a company and its competitors.
Context: Net promoter score over time for a company and its competitors.
Data Source: Excel (multiple sheets!), (Knaflic 2020, p 342)
Challenge: Investigate
Here’s a page from a monthly report on ticket volume and related metrics. (Click on the image to see a larger version.)
Data: Excel file
Challenges
Write a sentence describing a key takeaway for each graph shown in the report.
Imagine you need to tell a story with this data: which parts of the report would you focus on and which (if any) would you omit? It may be important to look at all of these things as we are exploring the data, but not all of the data is necessarily equally interesting when it comes to communicating it to our audience.
Use the data to create a webpage or slide deck1 to tell a visual story with the elements you selected to include in Step 2. Imagine this will be read by someone who is somewhat familiar with the data, but doesn’t deal with on a daily basis. Be sure to include enough text/context to walk them through your story.
The Data Visualization Society did a survey of its members and posted the data online so folks could visualize it (seems fitting).
Challenge: Create and interesting and informative graphic about some aspect(s) of the survey. (The survey has up to 73 items, so you will have to focus on just a portion of the survey responses.)
Info about the data: https://www.datavisualizationsociety.org/survey-history
Links to data sets:
Dataset 1: Job Titles • Dataset 2: Characteristics • Dataset 3: Employment
Dataset 4: Visualization • Dataset 5: Challenges