HW 3: First Vega-Lite Gallery

Graphic 1 (bar)

I made sure to use year as the x axis the way that Wilke uses it in all of his examples for time series. I would've liked to be able to change the axis titles to make it more readable, or would have done a clustered bar chart instead of a stacked bar chart.

I wish I could've added a slider but I was really unsure of how to incorporate count as a histogram because we hadn't learned that yet.

Graphic 2 (bar)

If I knew how, I would try to change the colors because I think that would go under the category of ugly.

I would like to switch to better looking colors.

Graphic 3 (bar)

I think the most important thing, is about the ugly, bad, and wrong.

Some of my data is so small, and some it very large, it make my Y axis become bad

Graphic 4 (bar)

{ "$schema": "https://vega.github.io/schema/vega-lite/v5.json", "height": 450, "width": 700, "params": [ { "name": "Industry", "value": "Construction", "bind": {"input": "select", "options": ["Government", "Mining and Extraction", "Construction", "Manufacturing", "Wholesale and Retail Trade", "Transportation and Utilities", "Information", "Finance", "Business services","Education and Health","Leisure and hospitality","Other","Agriculture","Self-employed"]} } ], "data": { "url": "https://cdn.jsdelivr.net/npm/vega-datasets@2.11.0/data/unemployment-across-industries.json" }, "mark": "bar", "transform": [{"filter": "datum.series == Industry"}], "encoding": { "x": { "field": "date", "type": "temporal", }, "y": { "field": "rate", "type": "quantitative", "title":"unemployment rate (%)", "scale": {"domain": [0, 28]} }, "color": {"field": "count", "type": "quantitative", "scale":{"scheme":"yelloworangered"}} } }

Originally, I had the unemployment rate mapped to both color and the y position. However, chapter two of the book said that it can make the graphic ambiguous if the scale is not one-to-one. So I adjusted the graphic such that counts is mapped to color, and unemployment rate is mapped to the y position.

I would liked to do a line graph with this data similar to the figure 2.3, with each line representing a different industry, however, because the dataset includes 10 different industries, they overlap and looked messy when put together.

Graphic 5 (bar)

For this graph, I used the bind parameter which lets the user interact with the graph and see specific jobs and how it evolves over time. For this graph, I used two position scales and one color scale. It would be fun to play with other scales such as size and shape, but I didn't know what other variables could be mapped to size and shape without making my graph redundant. Wilke's book inspires me to make better graphs, especially after seeing figure 2.3. I've never thought of squares in visualizations and it works really well. The book reminds me to use colors that are visible, so I chose blue and pink for a white background. I try to avoid colors such as red and green if I only need 2 colors because it might affect people with color blindness. I also implemented the one-to-one rule where I map one data value to one aesthetic value.

I am interested in seeing other ways people would map the values in this dataset.

Graphic 6 (bar)

I thought it was important to add color to the graph, so that each income group was easily identifiable. It also made the visual more aesthetically pleasing.

Graphic 7 (bar)

At first, I used the variable "pct" on the y-axis, but had to change it to "total," as it still shows the income per state. This follows what Wilke explains when talking about scales.

Graphic 8 (line)

To make sure my graphic was not considered "bad" by Wilke, that is no perceptual problems that make it confusing or unclear, I converted the year variable to be in date format using D3's number and date formatting specification system. I also set a scale for the number of deaths on the y axis because with the selector, the scales for the y axis were changing with each type of disaster. This would have been deemed "bad" by Wilke because it made my graphic confusing and hard to understand.

If I knew how, I would change the variable name, "Entity", because I do not think it is very clear and I would maybe limit the number of years labeled on the x-axis or decrease the font size so that they are not being squished together on the ends. I think this data is pretty interesting - it is interesting to see how deaths due to these different natural disasters seem to be decreasing over time.

Graphic 9 (line)

{ "$schema": "https://vega.github.io/schema/vega-lite/v5.json", "data": { "url":"https://cdn.jsdelivr.net/npm/vega-datasets@2.11.0/data/iowa-electricity.csv" }, "title": {"text": "Iowa's Power Generation"}, "mark": { "type": "line" }, "height": 300, "width": 600, "encoding": { "x": { "field": "year", "type": "temporal", "title": "Year", }, "y": { "field": "net_generation", "type": "quantitative", "title": "Power Generated" }, "color": { "field": "source", "scale": { "domain": ["Fossil Fuels", "Renewables", "Nuclear Energy"], "range": ["black", "darkgreen", "lightblue"] }, "title": "Source" } } }

The book had a reminder to think about color contrast. If I knew how there are little things I want to change like adding a small buffer on the left of the x axis so 2001 isn't cramming 2002. The font is also small.

I don't know the units. I learned a lot from creating and discarding other graphics in the process of making this. Got more comfortable with JSON and the basics.

Graphic 10 (point)

I took Wilke's advice on how not to make a plot ugly by scaling miles per gallon by fill instead of by color.

It would be nice if there was a way to better differentiate between the fills.

Graphic 11 (point)

The thinking about the data type and the aesthetics. How we can make our graphic be easily shows the data correctly and better

I hope I can add another filter something like search, for example, we can choose the job type and it can show us the job situation, we can use the slider to check for each year.

Graphic 12 (point)

I manually changed the domain of the x axis and y axis so that there was not too much extraneous space in order to make the graph easier to read. I also faceted by sex. The issue that I was running into is that there are a couple of penguins with nulls or nothing for Sex, and this caused the facet to have two extra subplots that are less than desirable, but I also didn't know what to do with them because they still are important to include. I also added in fill and fill opacity so that we could see the colors better and stands out in the background for visibility, but the fill opacity set at 0.7 allows us to see if there's a large number of overlapping penguins.

I wish I was able to filter or combine the null and the blank Sex penguins to rename them as "unknown" or something easier to understand. I didn't want to filter it to not include the nulls, and if I knew how to also change it, I would change the facet so that the male and female row would be at the top and the unknown would be at the bottom.

Graphic 13 (point)

In chapter one, Wilke's book talked mentioned the importance of having consistent scale (fig 1.1). Origionally, the scale on the y axis is inconsistent because the maximum of each industry's unemployment rate is different. I defined the scale so that it is consistent, and I also changed the titles of the guides so that the graphic is more readable. Also originally, I mapped counts to both color and size of the dots. Chapter two of the book talked about scale must be one-to-one. So I only used mapped counts to size, and kept color to be consistent.

Graphic 14 (point)

I used a scatterplot because I am using two position scales for continuous values. I also map my scale one-to-one. I faceted the movie genre while also mapping a certain color to each genre. This ensures that for each specific data value, there is exactly one aesthetics value and vice versa.

I wish I knew how to keep empty graphs in place. Now, when I use the slider to change the minimum production budget, some genres disappear because none of their movies use as much budget. But if I wasn't carefully looking at the scales that show which movie genre I am looking at and using a smaller screen, it looks like the points magically disappear and completely change places.

Graphic 15 (point)

{ "$schema": "https://vega.github.io/schema/vega-lite/v5.json", "height": 400, "width": 600, "data": { "url": "https://cdn.jsdelivr.net/npm/vega-datasets@2.11.0/data/barley.json" }, "mark": "point", "encoding": { "x": {"field": "site", "type": "ordinal"}, "y": {"field": "variety", "type": "nominal"}, "size": {"field": "yield", "type": "quantitative"}, "color":{"scale":{"scheme":"bluegreen-4"}, } } }

I think, I did a one-to-one scale making it less ambiguous. I also did points with a range in size so there was more distinction between the data points.

It would be interesting to do something like pair programming so that if something is unclear, the other person can call you out in order to correct it. I feel that too often I leave the graph as is because I know the context in which the graph was made, but that is not necessarily the case for the average person reading the graph.

Graphic 16 (point)

I decided to scale the axis so that the data points took up most of the space on the graph. I don't think Wilke specifically wrote about this in the chapters, but it is a way to scale the graph to make it easier to understand and more aesthetically pleasing.

Graphic 17 (point)

This graphic is informative, looks appealing, and could be printed as is. If the homework was not specific about creating a single view graphic, I would most likely facet based on years and put the number of cylinders on the x-axis.

Graphic 18 (point)

I was reminded to experiment with shapes other than dots, triangles both look like volcanoes but also I never noticed that having a point makes them easier to tell apart when next to each other. I want to label specifically the top 2 points somehow, this graphic isn't all that interesting but I wanted to use it because I had fun looking up what happened in 1902 and 1985.

This data and a lot of the sample ones are inherently spiky, which does not look good as a line unless altered or used carefully. For this one specifically I realized you can't really show a continuous line when the events are spontaneous.

Graphic 19 (point)

Wilke says that "good visual presentations tend to enhance the message of the visualization." He says that "jarring colors, imbalanced visual elements, or other features that distract" make it harder for the viewer to interpret the graphic correctly. Therefore, I tried to make sure my graphic was easy to understand and free of distractions by setting a scale on the x-axis so that the empty space before the lowest body mass value was removed and the data points could spread out more, overlapping less. I also filtered the values for "Sex" to only include those where they knew the sex of the penguin, either male and female, not the null and blank values. These null values added no helpful information and would only confuse the viewer with more categories.

I was a little wary of changing the range of values for body mass on the x-axis because starting at 2,500 grams instead of 0 is probably not what a viewer would expect, however penguins are never going to weigh below a certain amount and the data points are easier to see this way. If I knew how, I would probably change the units from grams to pounds because pounds are more familiar to me.

Graphic 20 (rect)

The graphic of temperature and the table of variable and appropriate scale mentioned in the reading influenced this design. I wanted to have a little more space between each rows (years).

Graphic 21 (rect)

{ "$schema": "https://vega.github.io/schema/vega-lite/v5.json", "data": { "url": "https://cdn.jsdelivr.net/npm/vega-datasets@2.11.0/data/jobs.json" }, "height":400, "width":800, "mark": "rect", "encoding": { "x": { "field": "year", "type": "ordinal", "title": "Year" }, "xOffset": { "field": "sex", "type": "nominal", "scale": { "rangeStep": 30 } }, // Offsets by gender "y": { "field": "perc", "type": "quantitative", "title": "Percentage of Workforce" }, "color": { "field": "sex", "type": "nominal", "legend": {"title": "Gender"} }, "size": { "field": "count", "type": "quantitative", "title": "Number of Workers" } } }

Wilke's made me think of what marks I should use to convey an interesting story that might be appealing. He made me think of the scale of one to one and how making that helps the graph to look less ambiguous.

I like how the offset by gender makes comparisons easier. It would be interesting to transform the data to show the differences between STEM and non-STEM careers, and perhaps even add a third category like trades. I could facet these into three separate graphs, but with so many fields, it might be a bit difficult to group all the fields into these three categories without going through all the data and sorting. Perhaps this plausible in the future with more knowledge of vega-lite.

Graphic 22 (trail)

I have redone this graph a couple of times, but one thing that I did revise was that I had originally made this graph a stacked bar graph. I decided it was really hard to read and just "bad" because I couldn't point out as many meaningful differences as a line graph. I intentionally made the colors for men and women blue and pink respectively so that there would be 1. a high contrast and 2. easy to differentiate as male or female. I also figured out that adding padding to the sides allowed the x and y axes to not be cut off. I was originally having issues with the frame cutting off the count.

I added a selector so that we can see different counts for different types of jobs over the years, but the JSON file was massive so I only added a couple. If this was possible (or if it was only possible in SQL or something) I would've liked to query all of the distinct jobs so that I could make an alphabetized list for my selector and allow my graphic to show all of the jobs in the JSON file.