Loading [MathJax]/jax/output/CommonHTML/jax.js
+ - 0:00:00
Notes for current slide
Notes for next slide









Week 6: Visualizing Information

EMSE 4572: Exploratory Data Analysis

John Paul Helveston

October 05, 2022

1 / 147
3 / 147

From here

4 / 147

"Having word processing software
doesn't make us great writers.
"

— Stephen Few

5 / 147

We don't write paragraphs like this

Image from Few (2012, pg. 227)

6 / 147

We don't write paragraphs like this

Image from Few (2012, pg. 227)

So don't make
graphs like this

Image from excelcharts.com

6 / 147

Week 6: Visualizing Information

1. The Human Visual-Memory System

2. The Psychology of Data Viz

BREAK

3. 10 Data Viz Best Practices

4. Making a (good) ggplot

7 / 147

Week 6: Visualizing Information

1. The Human Visual-Memory System

2. The Psychology of Data Viz

BREAK

3. 10 Data Viz Best Practices

4. Making a (good) ggplot

8 / 147

Good visualizations optimize for
the human visual-memory system

9 / 147

A (very) simplified model of the visual-memory system

10 / 147

A (very) simplified model of the visual-memory system

11 / 147

A (very) simplified model of the visual-memory system

12 / 147

A (very) simplified model of the visual-memory system

13 / 147

Two objectives of effective charts:

1. Grab & direct attention (iconic memory)

2. Reduce processing demands (working memory)

14 / 147

The power of pre-attentive processing

Count all the "5"'s

15 / 147

The power of pre-attentive processing

Count all the "5"'s

15 / 147

The power of pre-attentive processing

Count all the "5"'s

16 / 147

The power of pre-attentive processing

Count all the "5"'s

16 / 147
17 / 147

Pre-attentive attributes


17 / 147

Pre-attentive attributes

Numerical (ratio) data

18 / 147

Pre-attentive attributes

Numerical (ratio) data
Categorical (ordinal) data

19 / 147

Not all pre-attentive attributes are equal

20 / 147

Where is the red dot?

21 / 147

Where is the red dot?

21 / 147

Where is the red dot?

21 / 147

Where is the red dot?

21 / 147

For categorical data:

1. Hue (color) > shape

2. Less is more (stay in working memory!)

22 / 147

23 / 147

23 / 147
24 / 147

Week 6: Visualizing Information

1. The Human Visual-Memory System

2. The Psychology of Data Viz

BREAK

3. 10 Data Viz Best Practices

4. Making a (good) ggplot

25 / 147

Much of the content in this section is from
John Rauser's talk on YouTube

(Always cite your sources)

26 / 147
27 / 147

Cleveland, W. S., & McGill, R. (1985). Graphical perception and graphical methods for analyzing scientific data. Science, New Series, 229(4716), 828-833.

27 / 147

Cleveland's operations of pattern perception:

1. Estimation

2. Assembly

3. Detection

28 / 147

Cleveland's operations of pattern perception:

1. Estimation -------->

2. Assembly

3. Detection

  • Discrimination (X equal to Y?)

  • Ranking (X greater than Y?)

  • Ratioing (X double Y?)

29 / 147

Estimation: Hierarchy for numerical data

More Accurate

Less Accurate

30 / 147

Example: Life expectancy in countries in Asia

#> country lifeExp
#> 1 Afghanistan 43.828
#> 2 Iraq 59.545
#> 3 Cambodia 59.723
#> 4 Myanmar 62.069
#> 5 Yemen, Rep. 62.698
#> 6 Nepal 63.785
#> 7 Bangladesh 64.062
#> 8 India 64.698
#> 9 Pakistan 65.483
#> 10 Mongolia 66.803
#> 11 Korea, Dem. Rep. 67.297
#> 12 Thailand 70.616
#> 13 Indonesia 70.650
#> 14 Iran 70.964
#> 15 Philippines 71.688
#> 16 Lebanon 71.993
#> 17 Jordan 72.535
#> 18 Saudi Arabia 72.777
#> 19 China 72.961
#> 20 West Bank and Gaza 73.422
31 / 147
  1. Position on a common scale
  2. Position on
    non-aligned scales
  3. Length
  4. Angle
  5. Area
  6. Color saturation
  7. Color hue
32 / 147
  1. Position on a common scale
  2. Position on
    non-aligned scales
  3. Length
  4. Angle
  5. Area
  6. Color saturation
  7. Color hue
32 / 147
  1. Position on a common scale
  2. Position on
    non-aligned scales
  3. Length
  4. Angle
  5. Area
  6. Color saturation
  7. Color hue


  • / Discriminate
  • / Rank
  • Ratio
33 / 147
  1. Position on a common scale
  2. Position on
    non-aligned scales
  3. Length
  4. Angle
  5. Area
  6. Color saturation
  7. Color hue

Sorting helps a bit...

  • / Discriminate
  • / Rank
  • Ratio
34 / 147
  1. Position on a common scale
  2. Position on
    non-aligned scales
  3. Length
  4. Angle
  5. Area
  6. Color saturation
  7. Color hue


  • / Discriminate
  • / Rank
  • Ratio
35 / 147
  1. Position on a common scale
  2. Position on
    non-aligned scales
  3. Length
  4. Angle
  5. Area
  6. Color saturation
  7. Color hue

Align to 0 scale:

  • / Discriminate
  • / Rank
  • / Ratio
36 / 147
  1. Position on a common scale
  2. Position on
    non-aligned scales
  3. Length
  4. Angle
  5. Area
  6. Color saturation
  7. Color hue


  • / Discriminate
  • / Rank
  • / Ratio
37 / 147
  1. Position on a common scale
  2. Position on
    non-aligned scales
  3. Length
  4. Angle
  5. Area
  6. Color saturation
  7. Color hue

Area works okay for "bubble" charts

38 / 147
  1. Position on a common scale
  2. Position on
    non-aligned scales
  3. Length
  4. Angle
  5. Area
  6. Color saturation
  7. Color hue


  • / Discriminate
  • Rank
  • / Ratio
39 / 147
  1. Position on a common scale
  2. Position on
    non-aligned scales
  3. Length
  4. Angle
  5. Area
  6. Color saturation
  7. Color hue
40 / 147
  1. Position on a common scale
  2. Position on
    non-aligned scales
  3. Length
  4. Angle
  5. Area
  6. Color saturation
  7. Color hue


  • / Discriminate
  • / Rank
  • Ratio
41 / 147
  1. Position on a common scale
  2. Position on
    non-aligned scales
  3. Length
  4. Angle
  5. Area
  6. Color saturation
  7. Color hue


  • / Discriminate
  • / Rank
  • Ratio
42 / 147
  1. Position on a common scale
  2. Position on
    non-aligned scales
  3. Length
  4. Angle
  5. Area
  6. Color saturation
  7. Color hue


  • Discriminate
  • Rank
  • Ratio
43 / 147
  1. Position on a common scale
  2. Position on
    non-aligned scales
  3. Length
  4. Angle
  5. Area
  6. Color saturation
  7. Color hue

No need to scale to 0:

  • Lowers resolution
  • Isn't needed for accurate ratioing
44 / 147
  1. Position on a common scale
  2. Position on
    non-aligned scales
  3. Length
  4. Angle
  5. Area
  6. Color saturation
  7. Color hue

Sorting still matters!

45 / 147

Cleveland's operations of pattern perception:

1. Estimation

2. Assembly

3. Detection

46 / 147

Cleveland's operations of pattern perception:

1. Estimation

2. Assembly -------->

3. Detection




The grouping of graphical elements

47 / 147

Assembly: Gestalt Psychology


The whole has a reality that is entirely separate from the parts

48 / 147
49 / 147

Reification


50 / 147

Emergence

51 / 147

Law of Closure

Our minds fill in the missing information

52 / 147

Prägnanz


We strongly prefer to interpret stimuli as regular, simple, and orderly

53 / 147

Prägnanz


We strongly prefer to interpret stimuli as regular, simple, and orderly

53 / 147

Prägnanz


We strongly prefer to interpret stimuli as regular, simple, and orderly

53 / 147

Prägnanz


We strongly prefer to interpret stimuli as regular, simple, and orderly

53 / 147

Prägnanz

This should cause you cognitive pain

It's the graphical equivalent of this:

54 / 147

Prägnanz

This makes our brains happy

55 / 147

Law of Continuity

We will group together objects that follow an established direction

56 / 147

Law of Continuity

We will group together objects that follow an established direction

57 / 147

Law of Similarity

We see elements that are physically similar as part of the same object

58 / 147

Law of Similarity

We see elements that are physically similar as part of the same object

59 / 147

Law of Similarity

We see elements that are physically similar as part of the same object

60 / 147

Law of Similarity

We see elements that are physically similar as part of the same object

61 / 147

Law of Proximity

We tend to see elements that are physically near each other as part of the same object

62 / 147

Law of Proximity

We tend to see elements that are physically near each other as part of the same object

63 / 147

Law of Proximity

We tend to see elements that are physically near each other as part of the same object

64 / 147

Cleveland's operations of pattern perception:

1. Estimation

2. Assembly

3. Detection

65 / 147

Estimation: Hierarchy for numerical data

More Accurate

Less Accurate

66 / 147

Assembly: Gestalt Psychology

Law of Closure Prägnanz Law of Continuity Law of Similarity Law of Proximity
Fill in the missing information We like regular, simple, and orderly Group together objects with established direction Physically similar = same object Physically near = same object
67 / 147

Cleveland's operations of pattern perception:

1. Estimation

2. Assembly

3. Detection -------->






Recognizing that a geometric object encodes a physical value

68 / 147
69 / 147

Norman door (n.):

  1. A door where the design tells you to do the opposite of what you're actually supposed to do.

  2. A door that gives the wrong signal and needs a sign to correct it.

70 / 147

Norman door

71 / 147

Norman door

Non-Norman door

71 / 147
72 / 147
73 / 147
74 / 147
75 / 147
76 / 147
77 / 147

The white circles you see at the intersections is called the "Hermann Grid illusion"

78 / 147
79 / 147
05:00

Break!

Stand up, Move around, Stretch!

80 / 147

Week 6: Visualizing Information

1. The Human Visual-Memory System

2. The Psychology of Data Viz

BREAK

3. 10 Data Viz Best Practices

4. Making a (good) ggplot

81 / 147

10 Data Viz Best Practices

  1. Remove chart chunk
  2. Don't make 3D plots*
  3. Don't lie
  4. Don't use pie charts for proportions*
  5. Don't stack bars*
  6. Rotate and sort categorical axes*
  7. Eliminate legends & directly label geoms*
  8. Don't use pattern fills
  9. Don't use red & green together
  10. Consider tables for small data sets

*most of the time

82 / 147

10 Data Viz Best Practices

  1. Remove chart chunk
  2. Don't make 3D plots*
  3. Don't lie
  4. Don't use pie charts for proportions*
  5. Don't stack bars*
  6. Rotate and sort categorical axes*
  7. Eliminate legends & directly label geoms*
  8. Don't use pattern fills
  9. Don't use red & green together
  10. Consider tables for small data sets

*most of the time

83 / 147

"Erase non-data ink."

— Ed Tufte

84 / 147

Figure 1.6: `Monstrous Costs’ by Nigel Holmes, in Healy, 2018

85 / 147

Figure 1.6: `Monstrous Costs’ by Nigel Holmes, in Healy, 2018

85 / 147

Figure 24.1: From Data Looks Better Naked by Darkhorse Analytics

86 / 147
87 / 147
87 / 147

10 Data Viz Best Practices

  1. Remove chart chunk
  2. Don't make 3D plots*
  3. Don't lie
  4. Don't use pie charts for proportions*
  5. Don't stack bars*
  6. Rotate and sort categorical axes*
  7. Eliminate legends & directly label geoms*
  8. Don't use pattern fills
  9. Don't use red & green together
  10. Consider tables for small data sets

*most of the time

88 / 147

Humans aren't good at distinguishing 3D space

Penrose Stairs, made famous by
M.C. Escher (1898-1972)

89 / 147

Humans aren't good at distinguishing 3D space

Penrose Stairs, made famous by
M.C. Escher (1898-1972)

Ink proportions !=
true proportions

89 / 147

Occlusion: geoms are obscured

90 / 147

Multiple interpretations

91 / 147
92 / 147
92 / 147

The third dimension distracts from the data

(this is what Tufte calls "chart junk")

93 / 147
94 / 147
94 / 147

10 Data Viz Best Practices

  1. Remove chart chunk
  2. Don't make 3D plots*
  3. Don't lie
  4. Don't use pie charts for proportions*
  5. Don't stack bars*
  6. Rotate and sort categorical axes*
  7. Eliminate legends & directly label geoms*
  8. Don't use pattern fills
  9. Don't use red & green together
  10. Consider tables for small data sets

*most of the time

95 / 147

"Lie Factor" = Size of effect in graphicSize of effect in data

96 / 147

"Lie Factor" = Size of effect in graphicSize of effect in data=5.30.60.627.51818=7.830.53=14.8

Edward Tufte (2001) "The Visual Display of Quantitative Information", 2nd Edition, pg. 57-58.

97 / 147

"Lie Factor" = Size of effect in graphicSize of effect in data=5.30.60.627.51818=7.830.53=14.8

Edward Tufte (2001) "The Visual Display of Quantitative Information", 2nd Edition, pg. 57-58.

97 / 147

Bar charts should always start at 0

98 / 147

Bar charts should always start at 0

98 / 147

Bar charts should always start at 0

98 / 147

Don't cherry-pick your data

99 / 147

Make sure your chart makes sense

100 / 147

10 Data Viz Best Practices

  1. Remove chart chunk
  2. Don't make 3D plots*
  3. Don't lie
  4. Don't use pie charts for proportions*
  5. Don't stack bars*
  6. Rotate and sort categorical axes*
  7. Eliminate legends & directly label geoms*
  8. Don't use pattern fills
  9. Don't use red & green together
  10. Consider tables for small data sets

*most of the time

101 / 147
102 / 147
102 / 147

Exceptions:

- Small data

- Simple fractions

- If sum of parts matters

103 / 147

Exceptions:

- Small data

- Simple fractions

- If sum of parts matters

103 / 147

Best pie chart of all time

104 / 147

10 Data Viz Best Practices

  1. Remove chart chunk
  2. Don't make 3D plots*
  3. Don't lie
  4. Don't use pie charts for proportions*
  5. Don't stack bars*
  6. Rotate and sort categorical axes*
  7. Eliminate legends & directly label geoms*
  8. Don't use pattern fills
  9. Don't use red & green together
  10. Consider tables for small data sets

*most of the time

105 / 147

Stacked bars are rarely a good idea

106 / 147

"Parallel coordinates" plot usually works better

107 / 147

Exception:
When you care about the total more than the categories

108 / 147

Exception:
When you care about the total more than the categories

108 / 147

10 Data Viz Best Practices

  1. Remove chart chunk
  2. Don't make 3D plots*
  3. Don't lie
  4. Don't use pie charts for proportions*
  5. Don't stack bars*
  6. Rotate and sort categorical axes*
  7. Eliminate legends & directly label geoms*
  8. Don't use pattern fills
  9. Don't use red & green together
  10. Consider tables for small data sets

*most of the time

109 / 147

Rotate axes if you can't read them

110 / 147

Rotate axes if you can't read them

110 / 147

Default order is almost always wrong

Ordered by alphabet (default)
111 / 147

Default order is almost always wrong

Ordered by alphabet (default)
Ordered by count
111 / 147

Exception: Ordinal variables

112 / 147

10 Data Viz Best Practices

  1. Remove chart chunk
  2. Don't make 3D plots*
  3. Don't lie
  4. Don't use pie charts for proportions*
  5. Don't stack bars*
  6. Rotate and sort categorical axes*
  7. Eliminate legends & directly label geoms*
  8. Don't use pattern fills
  9. Don't use red & green together
  10. Consider tables for small data sets

*most of the time

113 / 147

Directly label geoms

114 / 147

Directly label geoms

114 / 147

Exception: When you have repeated categories

115 / 147

10 Data Viz Best Practices

  1. Remove chart chunk
  2. Don't make 3D plots*
  3. Don't lie
  4. Don't use pie charts for proportions*
  5. Don't stack bars*
  6. Rotate and sort categorical axes*
  7. Eliminate legends & directly label geoms*
  8. Don't use pattern fills
  9. Don't use red & green together
  10. Consider tables for small data sets

*most of the time

116 / 147
117 / 147
118 / 147
118 / 147

10 Data Viz Best Practices

  1. Remove chart chunk
  2. Don't make 3D plots*
  3. Don't lie
  4. Don't use pie charts for proportions*
  5. Don't stack bars*
  6. Rotate and sort categorical axes*
  7. Eliminate legends & directly label geoms*
  8. Don't use pattern fills
  9. Don't use red & green together
  10. Consider tables for small data sets

*most of the time

119 / 147

10% of males and 1% of females are color blind

120 / 147

10% of males and 1% of females are color blind

120 / 147

10% of males and 1% of females are color blind

120 / 147

Facets can be used to avoid color altogether

121 / 147

Facets can be used to avoid color altogether

121 / 147

10 Data Viz Best Practices

  1. Remove chart chunk
  2. Don't make 3D plots*
  3. Don't lie
  4. Don't use pie charts for proportions*
  5. Don't stack bars*
  6. Rotate and sort categorical axes*
  7. Eliminate legends & directly label geoms*
  8. Don't use pattern fills
  9. Don't use red & green together
  10. Consider tables for small data sets

*most of the time

122 / 147
123 / 147
124 / 147

Who do you think did a better job in tonight’s debate?

Clinton Trump
Among Democrats 99% 1%
Among Republicans 53% 47%
125 / 147

Your turn - go here

For your "bad" visualization:

1) Identify where the graphic falls on Cleveland's pattern recognition hierarchy

2) Any design rules that are broken

3) Suggest at least two improvements

10:00
127 / 147
128 / 147
129 / 147
129 / 147
130 / 147
130 / 147
131 / 147
131 / 147
132 / 147
132 / 147
132 / 147

Week 6: Visualizing Information

1. The Human Visual-Memory System

2. The Psychology of Data Viz

BREAK

3. 10 Data Viz Best Practices

4. Making a (good) ggplot

133 / 147

Making a (good) ggplot

Before:

After:

134 / 147

Making a (good) ggplot

  1. Format data frame
  2. Add geoms
  3. Flip coordinates?
  4. Reorder factors?
  5. Adjust scales
  6. Adjust theme
  7. Annotate
135 / 147

1) Format data frame

# Format the data frame
wildlife_impacts %>%
count(operator)
#> # A tibble: 4 × 2
#> operator n
#> <chr> <int>
#> 1 AMERICAN AIRLINES 14887
#> 2 DELTA AIR LINES 9005
#> 3 SOUTHWEST AIRLINES 17970
#> 4 UNITED AIRLINES 15116
136 / 147

2) Add geoms

# Format the data frame
wildlife_impacts %>%
count(operator) %>%
# Add geoms
ggplot() +
geom_col(
aes(x = operator, y = n),
width = 0.7, alpha = 0.8)
137 / 147

3) Flip coordinates - can you read the labels?

# Format the data frame
wildlife_impacts %>%
count(operator) %>%
# Add geoms
ggplot() +
geom_col(
aes(x = operator, y = n),
width = 0.7, alpha = 0.8) +
# Flip coordinates
coord_flip()
138 / 147

3) Flip coordinates - can you read the labels?

# Format the data frame
wildlife_impacts %>%
count(operator) %>%
# Add geoms
ggplot() +
geom_col(
aes(x = n, y = operator),
width = 0.7, alpha = 0.8)
139 / 147

4) Reorder factors with reorder()

# Format the data frame
wildlife_impacts %>%
count(operator) %>%
# Add geoms
ggplot() +
geom_col(
aes(x = n, y = reorder(operator, n)),
width = 0.7, alpha = 0.8)
140 / 147

5) Adjust scales

# Format the data frame
wildlife_impacts %>%
count(operator) %>%
# Add geoms
ggplot() +
geom_col(
aes(x = n, y = reorder(operator, n)),
width = 0.7, alpha = 0.8) +
# Adjust x axis scale
scale_x_continuous(
expand = expansion(mult = c(0, 0.05)))
141 / 147

5) Adjust scales - customize break points (if you want)

# Format the data frame
wildlife_impacts %>%
count(operator) %>%
# Add geoms
ggplot() +
geom_col(
aes(x = n, y = reorder(operator, n)),
width = 0.7, alpha = 0.8) +
# Adjust x axis scale
scale_x_continuous(
expand = expansion(mult = c(0, 0.05)),
breaks = c(0, 10000, 20000),
limits = c(0, 20000))
142 / 147

6) Adjust theme

Four cowplot themes you should know

143 / 147

6) Adjust theme

For horizontal bars, add only vertical grid

# Format the data frame
wildlife_impacts %>%
count(operator) %>%
# Add geoms
ggplot() +
geom_col(
aes(x = n, y = reorder(operator, n)),
width = 0.7, alpha = 0.8) +
# Adjust x axis scale
scale_x_continuous(
expand = expansion(mult = c(0, 0.05))) +
# Adjust theme
theme_minimal_vgrid()
144 / 147

7) Annotate

# Format the data frame
wildlife_impacts %>%
count(operator) %>%
mutate(operator = str_to_title(operator)) %>%
# Add geoms
ggplot() +
geom_col(
aes(x = n, y = reorder(operator, n)),
width = 0.7, alpha = 0.8) +
# Adjust x axis scale
scale_x_continuous(
expand = expansion(mult = c(0, 0.05))) +
# Adjust theme
theme_minimal_vgrid() +
# Annotate
labs(
x = 'Count',
y = NULL)
145 / 147

Finished product

wildlife_impacts %>%
count(operator) %>%
mutate(operator = str_to_title(operator)) %>%
ggplot() +
geom_col(
aes(x = n, y = reorder(operator, n)),
width = 0.7, alpha = 0.8) +
scale_x_continuous(
expand = expansion(mult = c(0, 0.05))) +
theme_minimal_vgrid() +
labs(
x = 'Count',
y = NULL)
146 / 147
15:00

Your turn

Use the gapminder.csv data to create the following plot, following these steps:

  1. Format data frame
  2. Add geoms
  3. Flip coordinates?
  4. Reorder factors?
  5. Adjust scales
  6. Adjust theme
  7. Annotate

147 / 147
Paused

Help

Keyboard shortcuts

, , Pg Up, k Go to previous slide
, , Pg Dn, Space, j Go to next slide
Home Go to first slide
End Go to last slide
Number + Return Go to specific slide
b / m / f Toggle blackout / mirrored / fullscreen mode
c Clone slideshow
p Toggle presenter mode
t Restart the presentation timer
?, h Toggle this help
oTile View: Overview of Slides
Esc Back to slideshow