Final Portfolio
Introduction
Your final task for the semester is to create a small portfolio of your work. You will have an opportunity to revise some of your previous work and to create some new graphics.
Due Date: Thursday, March 19 at noon.
Some general instructions about the portfolio website
Be sure to include all your code.
Feel free to use
#| code-fold: truefor some (or all) chunks. This will hide your code until the user clicks to open it.
Any data sets you use should be available via a URL or in a standard R package so I have access to the data. You can create a
datadirectory and put CSV, JSON, or Excel data sets there if they are not already available via a URL else where. Then use[text](url)to include a link to the data in your document.
Also add
code-tools: trueto your YAML header. This will let me see the source document if I need to.
One of the goals for this project is to learn how to learn more. Another is to use a variety of graphical elements. So you may want to read Exercise 5 and Exercise 6 before starting the rest of the assignment.
Grading
Here are some guiding rubrics for grading your portfolio.
For graphics
A: Excellent design and use of vegalite features. No extraneous, unnecessary elements, but also not missing elements that would improve the graphic.
B: All tasks complete and good design practices generally followed, but missed opportunities to improve the design or to take advantage of additional vegalite features. Competent, but not improvable.
C: Didn’t complete all tasks; visualization misrepresents the data or violates good visualization practices, etc.
For discussion
A: Demonstrates quality reflection on the graphics design process. Well written, clear and concise, but thoughtful. Makes interesting connections or cites interesting examples.
B: Minimally answered all questions, but without demonstrating quality reflection.
C: Incomplete, superficial, overly brief.
Components of the portfolio
Exercise 1 (HW 4 revision) We’ve learned a lot about graphics since HW 4, so here’s your chance to improve upon what you did in HW 4 (and in our subsequent in-class dicsussions). We are going to focus on the genetics kit data. See HW4 for a reminder about the data and the task you had to do then.
Here is a reminder of what genetic share means: 23 and Me calls this “ancestry composition” and describes it like this:
Your Ancestry Composition report shows the percentage of your DNA that comes from each of 47 populations. We calculate your Ancestry Composition by comparing your genome to those of over 14,000 people with known ancestry. When a segment of your DNA closely matches the DNA from one of the 47 populations, we assign that ancestry to the corresponding segment of your DNA. We calculate the ancestry for individual segments of your genome separately, then add them together to compute your overall ancestry composition. Read more.
Scroll through the HW 4 Gallery to see the plots we created at that time. Find an example that has that has something you like about it and explain what you like.
Find an example that has somethng you don’t like, and explain what you don’t like about it.
Now create two graphics, one that helps compare the kits and one that helps compare twins.
- Give your graphics good titles,
- Use use other principles of good visualization,
- Do not restrict your plot to just a small subset of the data.
For each graphic, include a paragraph that tells the story of your graphic.
Exercise 2 (Data and graphics challenge) Complete one of the data and graphics challenges that we didn’t get to in class. (You will have to wait until we have done these to know what they are and which ones we didn’t already do.)
Exercise 3 (A new challenge) Challenge is probably the wrong word because the data set is quite small and fairly simple. But we want to be able to make good graphics for simple data too!
The data come from the Demographic and Health Surveys for Tanzania. You can obtain data like these (with many more items, and down to individual and household level detail) for many countries and years at dhsprogram.com.
Here is a very small summary of some data from Tanzania:
Here are some definitions used by DHS:
- Total fertility rate
- The average number of children a woman would have by the end of her childbearing years if she bore children at the current age-specific fertility rates. Age-specific fertility rates are calculated for the 3 years before the survey, based on detailed pregnancy histories provided by women.
- Unmet need for family planning
- Percentage of women who: (1) are not pregnant and not postpartum amenorrhoeic and are considered fecund and want to postpone their next birth for 2 or more years or stop childbearing altogether but are not using a contraceptive method, or (2) have a mistimed or unwanted current pregnancy, or (3) are postpartum amenorrhoeic and their most recent birth in the past 2 years was mistimed or unwanted.
Enter the data into Excel, a CSV, or JSON file.
You will have some decisions to make about things like variable names, etc. Be sure to include a link to the data set you create on your portfolio website. (See instructions at the top of the page.)
Notice that some of the surveys were conducted all in one year and some spanned two calendar years. How will you deal with that?
Use these data to create a visualization that tells a story.
Write a few sentences explaining the story told.
Exercise 4 (Your masterpiece) OK. It doesn’t have to compete with the graphics cited by Tufte (2001), but this is your chance to impress. It can also be a chance to try some things you’ve been wanting to try or to include some elements that you need for Exercise 5.
Using a data set of your choosing, create a graphic that demonstrates your abilities to design and create a graphic that tells a compelling story.
Be sure to pick a data set that is rich enough to make the graphics task interesting. I recommend that you pick data related to something you are interested in.
Explain the choices you made when designing your graphic and relate them to principles of good graphics that we have learned or seen in this class. Mention alternatives to your graphic that you considered but did not opt to submit. (You don’t have to include your alternatives, but you may if that makes it easier to explain.)
Be sure to include information about where you got your data from. This could be a link to a website, a proper citation of an article, a description of a research project you have been working on, etc.
You may use examples you find online for inspiration and coding suggestions, but your graphic should not be a direct copy of an existing example.
Exercise 5 (Using your palette) The grammar of graphics gives us a palette of graphical elements with which to “paint” our graphic. The palette includes various marks, channels, composition, etc. One of the goals for your portfolio is that you demonstrate the ability to use a variety of these features and use them effectively. Look over your graphics, and identify a place where you used
- an encoding channel other than x or y.
- layers
- facets
- concatenation or repeat
- non-default settings for a channel’s scale or guide
- tooltips
- another kind of interaction (panning/zooming, brushing, sliders, etc.)
The intention here is that you use each of these at least once in your portfolio to demonstrate your ability to use a wide range of features from the grammar of graphics. Keep that in mind as you go through the exercises.
Exercise 6 (Keep learning) It isn’t possible to learn everything about data visualiztion in such a short course, so you will need to keep learning.
Cite 2 or 3 specific examples in the graphics in your portfolio where you used a feature of Vega-Lite/vegabrite/Altair/altair that we did not learn in class. This might be using a new kind of mark or transform, or a way to customize a feature of the graphic, or a way to use interaction, or…
Here are some resources for finding/exploring new features:
vegabrite website. The Reference and Example Gallery sections are very helpful. You may also find the Design section interesting/helpful.
Vega-Lite documentation and Example gallery. You can usually convert things over to vegabrite or Altair/altair pretty easily once you see how things work in the native Vega-Lite.
The Altair documentation. The User Guide and Examples sections are very helpful.
The Vega-Lite API Examples might be a source of inspiration. This uses the javascript API for Vega-Lite.
The Visulization Curriculum developed at the University of Washington by Jeffrey Heer, Dominik Moritz, Jake VanderPlas, and Brock Craft (using Python/Altair).
There are other sites out there that could serve as inspiration as well. If you find a good one, be sure to let me know about it.
NoteAlternative for part a.A different option here: Learn how to make graphics using mosaic. Mosaic is a new system, still under development, which promises to be able to handle much larger data sets because it uses a different data model. Many of the core ideas will look familiar to you from Vegalite.
See also Heer (2024). You can find a pdf of this paper at https://idl.uw.edu/papers/mosaic.
You will also want to keep learning about principles of good graphics design. Cite 2 or 3 specific examples where you followed the advice in one of the resources below. Provide specific page (for print or pdf) or section (for HTML) references. Include direct links, if possible. (HTML books often make it easy to link directly to a section.)
Wilke (2019) contains lots of good information about creating good graphics. The chapter titles make it easy to find advice on particular issues. This will let you scan through chapters that address issues related to the graphics in your portfolio.
Healy (2019) uses
ggplot2, but the design principles should transfer over to any software.Tufte (2001) has lots of advice about designing “visual displays of quantitative information”.
Knaflic (2015) is the book that preceded Knaflic (2020). It has more details about graphics principles and fewer exercises. (The links to these resources are no longer available, so you would need to find these books somewhere else.)
A resource of your own choosing – but clear it with me first.