<div class="shade_black" style="width:60%;right:0;bottom:0;padding:10px;border: dashed 4px white;margin: auto;">
 These slides are viewed best by Chrome and occasionally need to be refreshed if elements did not load properly. See <a href=/>here for PDF </a>.
</div>

.white[Press the **right arrow** to progress to the next slide!]

---

background-image: url(images/bg1.jpg)
background-size: cover
class: hide-slide-number split-70 title-slide
count: false

.column.shade_black[.content[

# .monash-blue.outline-text[ETC1010: Introduction to Data Analysis]

<h2 style="font-weight:900!important;">Advanced topics in data visualisation</h2>

.bottom_abs.width100[

Lecturer: *Nicholas Tierney*

Department of Econometrics and Business Statistics

ETC1010.Clayton-x@monash.edu

April 2020

]

]]

---
class: transition
# While the song is playing...

Draw a mental model / concept map of last lectures content on joins.

---
class: refresher
# recap

- Joins

---

# Joins with a person and a coat, by [Leight Tami](https://twitter.com/leigh_tami18/status/1021471889309487105/photo/1)

---
# Upcoming Due Dates

- Assignment 1: Due April 8 at 5pm (Today!)

---

# Exploring life expectancy and income

We want to plot life expectancy vs income, but there's a problem:

.pull-left[

```r
gap_life_au
## # A tibble: 9 x 3
## country year life_expectancy
## <chr> <dbl> <dbl>
## 1 Australia 2012 82.5
## 2 Australia 2013 82.6
## 3 Australia 2014 82.5
## 4 Australia 2015 82.5
## 5 Australia 2016 82.5
## 6 Australia 2017 82.4
## 7 Australia 2018 82.5
## 8 Australia 2019 82.7
## 9 Australia 2020 82.8
```
]

.pull-right[

```r
gap_income_au
## # A tibble: 9 x 3
## country year gdp
## <chr> <dbl> <dbl>
## 1 Australia 2012 42800
## 2 Australia 2013 43200
## 3 Australia 2014 43700
## 4 Australia 2015 44100
## 5 Australia 2016 44600
## 6 Australia 2017 44900
## 7 Australia 2018 45400
## 8 Australia 2019 45500
## 9 Australia 2020 45800
```
]

---

# We need them in the same dataframe!

We could try `bind_cols()`, to bind dataframes columns together

```r
bind_cols(gap_life_au,
 gap_income_au)
## # A tibble: 9 x 6
## country year life_expectancy country1 year1 gdp
## <chr> <dbl> <dbl> <chr> <dbl> <dbl>
## 1 Australia 2012 82.5 Australia 2012 42800
## 2 Australia 2013 82.6 Australia 2013 43200
## 3 Australia 2014 82.5 Australia 2014 43700
## 4 Australia 2015 82.5 Australia 2015 44100
## 5 Australia 2016 82.5 Australia 2016 44600
## 6 Australia 2017 82.4 Australia 2017 44900
## 7 Australia 2018 82.5 Australia 2018 45400
## 8 Australia 2019 82.7 Australia 2019 45500
## 9 Australia 2020 82.8 Australia 2020 45800
```

---

# But this has problems:

1. It produces messy output (country1, year1)
2. It doesn't work if the data doesn't have the same number of rows

```
## # A tibble: 9 x 6
## country year life_expectancy country1 year1 gdp
## <chr> <dbl> <dbl> <chr> <dbl> <dbl>
## 1 Australia 2012 82.5 Australia 2012 42800
## 2 Australia 2013 82.6 Australia 2013 43200
## 3 Australia 2014 82.5 Australia 2014 43700
## 4 Australia 2015 82.5 Australia 2015 44100
## 5 Australia 2016 82.5 Australia 2016 44600
## 6 Australia 2017 82.4 Australia 2017 44900
## 7 Australia 2018 82.5 Australia 2018 45400
## 8 Australia 2019 82.7 Australia 2019 45500
## 9 Australia 2020 82.8 Australia 2020 45800
```

---

# How to bind data?

For example, how do we add this co2 data to income or life?

```r
gap_co2_au
## # A tibble: 3 x 3
## country year co2
## <chr> <dbl> <dbl>
## 1 Australia 2012 17 
## 2 Australia 2013 16.1
## 3 Australia 2014 15.4
```

---

# How to bind data?

We can't use `bind_cols()`

```r
bind_cols(gap_co2_au,
          gap_income_au)
```

```
Error: Argument 2 must be length 3, not 9
```

We could think about a more complex approach using `filter`, and so on...

But surely this must be a problem that we encounter in data analysis?

Someone must have thought of a solution to this before?

They did! **Joins**!

---

# Joins!

We can use ` left_join()` to combine the income and life expectancy data

```r
left_join(x = gap_income_au,
 y = gap_life_au,
 by = c("country", "year"))
## # A tibble: 9 x 4
## country year gdp life_expectancy
## <chr> <dbl> <dbl> <dbl>
## 1 Australia 2012 42800 82.5
## 2 Australia 2013 43200 82.6
## 3 Australia 2014 43700 82.5
## 4 Australia 2015 44100 82.5
## 5 Australia 2016 44600 82.5
## 6 Australia 2017 44900 82.4
## 7 Australia 2018 45400 82.5
## 8 Australia 2019 45500 82.7
## 9 Australia 2020 45800 82.8
```

---

# Add co2 data with another join:

We get missings for co2, because we don't have c02 values for 2015 and beyond.

```r
left_join(x = gap_income_au,
 y = gap_life_au,
 by = c("country", "year")) %>% 
 left_join(gap_co2_au,
 by = c("country", "year"))
## # A tibble: 9 x 5
## country year gdp life_expectancy co2
## <chr> <dbl> <dbl> <dbl> <dbl>
## 1 Australia 2012 42800 82.5 17 
## 2 Australia 2013 43200 82.6 16.1
## 3 Australia 2014 43700 82.5 15.4
## 4 Australia 2015 44100 82.5 NA 
## 5 Australia 2016 44600 82.5 NA 
## 6 Australia 2017 44900 82.4 NA 
## 7 Australia 2018 45400 82.5 NA 
## 8 Australia 2019 45500 82.7 NA 
## 9 Australia 2020 45800 82.8 NA
```

---

# So now we can combine that together like so:

```r
gap_au <- left_join(x = gap_income_au,
 y = gap_life_au,
 by = c("country", "year")) %>% 
 left_join(gap_co2_au,
 by = c("country", "year"))

gap_au
## # A tibble: 9 x 5
## country year gdp life_expectancy co2
## <chr> <dbl> <dbl> <dbl> <dbl>
## 1 Australia 2012 42800 82.5 17 
## 2 Australia 2013 43200 82.6 16.1
## 3 Australia 2014 43700 82.5 15.4
## 4 Australia 2015 44100 82.5 NA 
## 5 Australia 2016 44600 82.5 NA 
## 6 Australia 2017 44900 82.4 NA 
## 7 Australia 2018 45400 82.5 NA 
## 8 Australia 2019 45500 82.7 NA 
## 9 Australia 2020 45800 82.8 NA
```

---

# Now we can make a plot!

```r
ggplot(gap_au,
       aes(x = gdp,
           y = life_expectancy)) + 
  geom_point()
```

---

# Your Turn: go to exercises on rstudio.cloud

open "joins.Rmd"

Discuss with your partner why these two joins produce different results?

.pull-left[

```r
left_join(gap_co2_au,
 gap_life_au)
## # A tibble: 3 x 4
## country year co2 life_expectancy
## <chr> <dbl> <dbl> <dbl>
## 1 Australia 2012 17 82.5
## 2 Australia 2013 16.1 82.6
## 3 Australia 2014 15.4 82.5
```
]

.pull-right[

```r
left_join(gap_life_au,
 gap_co2_au)
## # A tibble: 9 x 4
## country year life_expectancy co2
## <chr> <dbl> <dbl> <dbl>
## 1 Australia 2012 82.5 17 
## 2 Australia 2013 82.6 16.1
## 3 Australia 2014 82.5 15.4
## 4 Australia 2015 82.5 NA 
## 5 Australia 2016 82.5 NA 
## 6 Australia 2017 82.4 NA 
## 7 Australia 2018 82.5 NA 
## 8 Australia 2019 82.7 NA 
## 9 Australia 2020 82.8 NA
```
]

---
class: transition
# Your Turn:

What happens when we add data from New Zealand into the mix?

How can you join that data together?

---
class: transition

# Making effective data plots

1. Principles / science of data visualisation
2. Features of graphics

---
# Principles / science of data visualisation

- Palettes and colour blindness
  - change blindness
  - using proximity
  - hierarchy of mappings

---
# Features of graphics

- Layering statistical summaries
- Themes
- adding interactivity

---
# Palettes and colour blindness

There are three main types of colour palette:

- Qualitative: categorical variables
- Sequential: low to high numeric values
- Diverging: negative to positive values

---
# Qualitative: categorical variables

---
# Sequential: low to high numeric values

---
# Diverging: negative to positive values

---
# Example: TB data

```
## # A tibble: 157,820 x 5
## country year count gender age 
## <chr> <dbl> <dbl> <chr> <chr>
## 1 Afghanistan 1980 NA m 04 
## 2 Afghanistan 1981 NA m 04 
## 3 Afghanistan 1982 NA m 04 
## 4 Afghanistan 1983 NA m 04 
## 5 Afghanistan 1984 NA m 04 
## 6 Afghanistan 1985 NA m 04 
## 7 Afghanistan 1986 NA m 04 
## 8 Afghanistan 1987 NA m 04 
## 9 Afghanistan 1988 NA m 04 
## 10 Afghanistan 1989 NA m 04 
## # … with 157,810 more rows
```

---
# Example: TB data: adding relative change

```
## # A tibble: 219 x 4
## country `2002` `2012` reldif
## <chr> <dbl> <dbl> <dbl>
## 1 Afghanistan 6509 13907 1.14 
## 2 Albania 225 185 -0.178 
## 3 Algeria 8246 7510 -0.0893
## 4 American Samoa 1 0 -1 
## 5 Andorra 2 2 0 
## 6 Angola 17988 22106 0.229 
## 7 Anguilla 0 0 0 
## 8 Antigua and Barbuda 4 1 -0.75 
## 9 Argentina 5383 4787 -0.111 
## 10 Armenia 511 316 -0.382 
## # … with 209 more rows
```

---
# Example: Sequential colour with default palette

```r
ggplot(tb_map) + geom_polygon(aes(x = long, y = lat, group = group, fill = reldif)) +
  theme_map()
```

---
# Example: (improved) sequential colour with default palette

```r
library(viridis)
ggplot(tb_map) +
  geom_polygon(aes(x = long, y = lat, group = group, fill = reldif)) +
  theme_map() + scale_fill_viridis(na.value = "white")
```

---
# Example:  Diverging colour with better palette

```r
ggplot(tb_map) +
  geom_polygon(aes(x = long, y = lat, group = group, fill = reldif)) +
  theme_map() +
  scale_fill_distiller(palette = "PRGn", na.value = "white", limits = c(-7, 7))
```

---
# Summary on colour palettes

- Different ways to map colour to values:
  - Qualitative: categorical variables
  - Sequential: low to high numeric values
  - Diverging: negative to positive values

---
# Colour blindness

- About 8% of men (about 1 in 12), and 0.5% women (about 1 in 200) population have difficulty distinguishing between red and green. 
- Several colour blind tested palettes: RColorbrewer has an associated web site [colorbrewer.org](http://colorbrewer2.org) where the palettes are labelled. See also `viridis`, and `scico`.

---
# Plot of two coloured points: Normal Mode

---
# Plot of two coloured points: dicromat mode

---
# Showing all types of colourblindness

---
# Impact of colourblind-safe palette

```r
p2 <- p + scale_colour_brewer(palette = "Dark2")
p2
```

---
# Impact of colourblind-safe palette
<img src="lecture_4b_files/figure-html/cb-grid-1.png" width="100%" style="display: block; margin: auto;" />

---
# Impact of colourblind-safe palette

```r
p3 <- p + scale_colour_viridis_d()
p3
```

---
# Impact of colourblind-safe palette
<img src="lecture_4b_files/figure-html/cb-grid-viridis-1.png" width="100%" style="display: block; margin: auto;" />

---
# Summary colour blindness

- Apply colourblind-friendly colourscales
  - `+ scale_colour_viridis()`
  - `+ scale_colour_brewer(palette = "Dark2")`
  - `scico` R package

---
# Pre-attentiveness: Find the odd one out?

---
# Pre-attentiveness: Find the odd one out?

---
class: idea
# Using proximity in your plots

Basic rule: place the groups that you want to compare close to each other

---
# Which plot answers which question?

- "Is the incidence similar for males and females in 2012 across age groups?"
- "Is the incidence similar for age groups in 2012, across gender?"

---
# incidence similar for: (M and F) or (age, across gender) ?"

???

Here are two different arrangements of the tb data. To answer the question "Is the incidence similar for males and females in 2012 across age groups?" the first arrangement is better. It puts males and females right beside each other, so the relative heights of the bars can be seen quickly. The answer to the question would be "No, the numbers were similar in youth, but males are more affected with increasing age."

The second arrangement puts the focus on age groups, and is better to answer the question "Is the incidence similar for age groups in 2012, across gender?" To which the answer would be "No, among females, the incidence is higher at early ages. For males, the incidence is much more uniform across age groups."

---
# "Incidence similar for M & F in 2012 across age?"

- Males & females next to each other: relative heights of bars is seen quickly. 
- Auestion answer: "No, the numbers were similar in youth, but males are more affected with increasing age."

---
# "Incidence similar for age in 2012, across gender?"

- Puts the focus on age groups 
- Answer to the question: "No, among females, the incidence is higher at early ages. For males, the incidence is much more uniform across age groups."

---
# Proximity wrap up

- Facetting of plots, and proximity are related to change blindness, an area of study in cognitive psychology. 
- There are a series of fabulous videos illustrating the effects of making a visual break, on how the mind processes it by Daniel Simons lab. 
- Here's one example:  
[The door study](https://www.youtube.com/watch?v=FWSxSQsspiQ)

---
# Layering

- *Statistical summaries:* It is common to layer plots, particularly by adding statistical summaries, like a model fit, or means and standard deviations. The purpose is to show the **trend** in relation to the **variation**. 
- *Maps:* Commonly maps provide the framework for data collected spatially. One layer for the map, and another for the data.

---
# `geom_point()`

```r
ggplot(df, aes(x = x, y = y1)) + geom_point()
```

---
# `geom_smooth(method = "lm", se = FALSE)`

```r
ggplot(df, aes(x = x, y = y1)) + geom_point() +
  geom_smooth(method = "lm", se = FALSE)
```

---
# `geom_smooth(method = "lm")`

```r
ggplot(df, aes(x = x, y = y1)) + geom_point() +
  geom_smooth(method = "lm")
```

---
# `geom_point()`

```r
ggplot(df, aes(x = x, y = y2)) + geom_point()
```

---
# `geom_smooth(method = "lm", se = FALSE)`

```r
ggplot(df, aes(x = x, y = y2)) + geom_point() +
  geom_smooth(method = "lm", se = FALSE)
```

---
`geom_smooth(se = FALSE)`

```r
ggplot(df, aes(x = x, y = y2)) + geom_point() +
  geom_smooth(se = FALSE)
```

---
`geom_smooth(se = FALSE, span = 0.05)`

```r
ggplot(df, aes(x = x, y = y2)) + geom_point() +
  geom_smooth(se = FALSE, span = 0.05)
```

---
# `geom_smooth(se = FALSE, span = 0.2)`

```r
p1 <- ggplot(df, aes(x = x, y = y2)) + geom_point() +
 geom_smooth(se = FALSE, span = 0.2)
p1
```

---
# Interactivity with magic plotly

```r
library(plotly)
ggplotly(p1)
```

<div id="htmlwidget-710b308d0abf6b7633c8" style="width:100%;height:288px;" class="plotly html-widget"></div>
<script type="application/json" data-for="htmlwidget-710b308d0abf6b7633c8">{"x":{"data":[{"x":[0.474483286263421,0.324434765148908,0.325746753485873,0.582559700123966,0.711087431525812,0.651691729435697,0.151819179998711,0.442929597338662,0.86024803109467,0.0611734576523304,0.356307561276481,0.814886787906289,0.854994969675317,0.239068261347711,0.729794929502532,0.343650162220001,0.747681618435308,0.217602873686701,0.13372308248654,0.858957155141979,0.311790849780664,0.178137335227802,0.25456409715116,0.940610976656899,0.342354607535526,0.8151065770071,0.64016257529147,0.0318129849620163,0.0907038075383753,0.984811526257545,0.554964450420812,0.45912787062116,0.723635725909844,0.625565577764064,0.810919689945877,0.288700188975781,0.446427447488531,0.693068364402279,0.892065022373572,0.50309585314244,0.406089650001377,0.0872858227230608,0.0650719322729856,0.433128696167842,0.900433107512072,0.365846161497757,0.784263617824763,0.555389807326719,0.0107836641836911,0.372544757789001,0.713185791391879,0.138876386918128,0.859992825426161,0.0568809977266937,0.911353582516313,0.995175875490531,0.0590974802616984,0.422646714374423,0.962378041353077,0.276879175798967,0.703106208238751,0.700312114553526,0.0125957895070314,0.879924739478156,0.221911077853292,0.480875472538173,0.358734952751547,0.454905983293429,0.300047591328621,0.956981934607029,0.25764431222342,0.953621354186907,0.244778328575194,0.218487506266683,0.351245704339817,0.216675809351727,0.221611548215151,0.938005771720782,0.492283856961876,0.302607804071158,0.622754853451625,0.358519420027733,0.217690974008292,0.766591291641816,0.0661978966090828,0.758190354099497,0.428607885725796,0.611864273436368,0.448872539447621,0.751543669495732,0.329539361177012,0.543596841860563,0.881255478365347,0.641002414980903,0.338823050726205,0.430750111816451,0.726878642570227,0.68005837360397,0.00201533990912139,0.849159999284893],"y":[0.629078536867805,-0.833065723631463,-1.12411818122418,-1.58301262825581,0.591873144314166,-1.57253154745935,0.649561326289739,1.67157031245602,-2.18832979257379,2.22486923946985,-0.995856290349544,0.70893308555162,-0.640649453689804,0.936636391171645,0.970064729225973,-0.690052117349644,-0.388481875043501,0.406589286785362,1.38448506095525,0.480611614670113,-0.618105295908149,0.695098015638333,0.482965502447769,0.319238329044498,1.32384552365515,0.496554185921632,0.741128273309642,-0.41781941188487,-0.298134387006044,0.501525355406209,-1.26122306520859,1.81440355869754,0.598770571215205,-0.0387948719109672,0.474084681917008,0.156206427427275,-0.0215187905701595,-0.14897854830659,0.189156411359033,-0.938027345345082,-0.8963551795027,0.170154122820951,2.49980056640201,-0.108446600513957,2.13413302314554,0.939113600158979,-0.78390868134331,-1.48919939340315,0.777770037811325,1.6892734731413,-1.29882922958217,0.358929308730497,0.0408621656089728,3.9983213705436,0.104155149849053,0.961119785000653,3.57879232313628,1.16812767341527,1.94859858741613,-0.0698895519504933,1.72094983097518,-2.56386752785743,1.49021557900008,0.380246367692462,0.780619696019066,-0.181543444142316,-0.493578569571804,0.213982369226294,1.11883110585093,1.06999442153108,0.164078530955763,0.32562504810098,1.52787511558191,1.53293833516411,-0.0637113432634687,-1.13224112046176,0.528394991892066,0.0683515401355055,-0.392018597181995,0.060139864832727,0.46526420747008,1.64491702421181,0.110758338287787,-0.325359688522348,3.09674380451169,-0.389398229970823,-1.60136023705982,-0.432809564152544,-0.332581181026376,-0.27953553436535,1.52891320078926,-1.56286671540943,1.22846589531099,-0.299601439549297,0.427130598516743,-0.781135599555405,1.02500612747663,0.249931856531163,3.77639433689333,-0.0390564042530422],"text":["x: 0.47448329 y2: 0.62907854","x: 0.32443477 y2: -0.83306572","x: 0.32574675 y2: -1.12411818","x: 0.58255970 y2: -1.58301263","x: 0.71108743 y2: 0.59187314","x: 0.65169173 y2: -1.57253155","x: 0.15181918 y2: 0.64956133","x: 0.44292960 y2: 1.67157031","x: 0.86024803 y2: -2.18832979","x: 0.06117346 y2: 2.22486924","x: 0.35630756 y2: -0.99585629","x: 0.81488679 y2: 0.70893309","x: 0.85499497 y2: -0.64064945","x: 0.23906826 y2: 0.93663639","x: 0.72979493 y2: 0.97006473","x: 0.34365016 y2: -0.69005212","x: 0.74768162 y2: -0.38848188","x: 0.21760287 y2: 0.40658929","x: 0.13372308 y2: 1.38448506","x: 0.85895716 y2: 0.48061161","x: 0.31179085 y2: -0.61810530","x: 0.17813734 y2: 0.69509802","x: 0.25456410 y2: 0.48296550","x: 0.94061098 y2: 0.31923833","x: 0.34235461 y2: 1.32384552","x: 0.81510658 y2: 0.49655419","x: 0.64016258 y2: 0.74112827","x: 0.03181298 y2: -0.41781941","x: 0.09070381 y2: -0.29813439","x: 0.98481153 y2: 0.50152536","x: 0.55496445 y2: -1.26122307","x: 0.45912787 y2: 1.81440356","x: 0.72363573 y2: 0.59877057","x: 0.62556558 y2: -0.03879487","x: 0.81091969 y2: 0.47408468","x: 0.28870019 y2: 0.15620643","x: 0.44642745 y2: -0.02151879","x: 0.69306836 y2: -0.14897855","x: 0.89206502 y2: 0.18915641","x: 0.50309585 y2: -0.93802735","x: 0.40608965 y2: -0.89635518","x: 0.08728582 y2: 0.17015412","x: 0.06507193 y2: 2.49980057","x: 0.43312870 y2: -0.10844660","x: 0.90043311 y2: 2.13413302","x: 0.36584616 y2: 0.93911360","x: 0.78426362 y2: -0.78390868","x: 0.55538981 y2: -1.48919939","x: 0.01078366 y2: 0.77777004","x: 0.37254476 y2: 1.68927347","x: 0.71318579 y2: -1.29882923","x: 0.13887639 y2: 0.35892931","x: 0.85999283 y2: 0.04086217","x: 0.05688100 y2: 3.99832137","x: 0.91135358 y2: 0.10415515","x: 0.99517588 y2: 0.96111979","x: 0.05909748 y2: 3.57879232","x: 0.42264671 y2: 1.16812767","x: 0.96237804 y2: 1.94859859","x: 0.27687918 y2: -0.06988955","x: 0.70310621 y2: 1.72094983","x: 0.70031211 y2: -2.56386753","x: 0.01259579 y2: 1.49021558","x: 0.87992474 y2: 0.38024637","x: 0.22191108 y2: 0.78061970","x: 0.48087547 y2: -0.18154344","x: 0.35873495 y2: -0.49357857","x: 0.45490598 y2: 0.21398237","x: 0.30004759 y2: 1.11883111","x: 0.95698193 y2: 1.06999442","x: 0.25764431 y2: 0.16407853","x: 0.95362135 y2: 0.32562505","x: 0.24477833 y2: 1.52787512","x: 0.21848751 y2: 1.53293834","x: 0.35124570 y2: -0.06371134","x: 0.21667581 y2: -1.13224112","x: 0.22161155 y2: 0.52839499","x: 0.93800577 y2: 0.06835154","x: 0.49228386 y2: -0.39201860","x: 0.30260780 y2: 0.06013986","x: 0.62275485 y2: 0.46526421","x: 0.35851942 y2: 1.64491702","x: 0.21769097 y2: 0.11075834","x: 0.76659129 y2: -0.32535969","x: 0.06619790 y2: 3.09674380","x: 0.75819035 y2: -0.38939823","x: 0.42860789 y2: -1.60136024","x: 0.61186427 y2: -0.43280956","x: 0.44887254 y2: -0.33258118","x: 0.75154367 y2: -0.27953553","x: 0.32953936 y2: 1.52891320","x: 0.54359684 y2: -1.56286672","x: 0.88125548 y2: 1.22846590","x: 0.64100241 y2: -0.29960144","x: 0.33882305 y2: 0.42713060","x: 0.43075011 y2: -0.78113560","x: 0.72687864 y2: 1.02500613","x: 0.68005837 y2: 0.24993186","x: 0.00201534 y2: 3.77639434","x: 0.84916000 y2: -0.03905640"],"type":"scatter","mode":"markers","marker":{"autocolorscale":false,"color":"rgba(0,0,0,1)","opacity":1,"size":5.66929133858268,"symbol":"circle","line":{"width":1.88976377952756,"color":"rgba(0,0,0,1)"}},"hoveron":"points","showlegend":false,"xaxis":"x","yaxis":"y","hoverinfo":"text","frame":null},{"x":[0.00201533990912139,0.0145869922582532,0.0271586446073849,0.0397302969565167,0.0523019493056485,0.0648736016547802,0.077445254003912,0.0900169063530438,0.102588558702176,0.115160211051307,0.127731863400439,0.140303515749571,0.152875168098703,0.165446820447834,0.178018472796966,0.190590125146098,0.20316177749523,0.215733429844362,0.228305082193493,0.240876734542625,0.253448386891757,0.266020039240889,0.27859169159002,0.291163343939152,0.303734996288284,0.316306648637416,0.328878300986547,0.341449953335679,0.354021605684811,0.366593258033943,0.379164910383075,0.391736562732206,0.404308215081338,0.41687986743047,0.429451519779602,0.442023172128733,0.454594824477865,0.467166476826997,0.479738129176129,0.49230978152526,0.504881433874392,0.517453086223524,0.530024738572656,0.542596390921788,0.555168043270919,0.567739695620051,0.580311347969183,0.592883000318315,0.605454652667446,0.618026305016578,0.63059795736571,0.643169609714842,0.655741262063974,0.668312914413105,0.680884566762237,0.693456219111369,0.706027871460501,0.718599523809632,0.731171176158764,0.743742828507896,0.756314480857028,0.768886133206159,0.781457785555291,0.794029437904423,0.806601090253555,0.819172742602687,0.831744394951818,0.84431604730095,0.856887699650082,0.869459351999214,0.882031004348345,0.894602656697477,0.907174309046609,0.919745961395741,0.932317613744872,0.944889266094004,0.957460918443136,0.970032570792268,0.9826042231414,0.995175875490531],"y":[1.98929163321757,2.05969377183421,2.08856115648828,2.06918387037093,2.00573535206896,1.9205961759357,1.8265914223019,1.69002014545795,1.40719828976543,1.03212031737477,0.682420929042233,0.474718180920701,0.40237124205249,0.396937563144641,0.433270123557019,0.486221902649484,0.530645879781899,0.541395034314126,0.549251537654457,0.574580925727058,0.650393643416978,0.581303537001759,0.369541355916095,0.14353187372202,0.0276795848258934,-0.0505284548561138,-0.0264796699832941,0.182203597270367,0.377859642518159,0.549797132493213,0.458498089198135,0.233596834567644,-0.00801485787192687,-0.149445214594244,-0.0204483225139421,0.157803453256446,0.239194920555156,0.230428818205299,0.162005366538707,-0.052764526964822,-0.437448178658346,-0.877944957924206,-1.25938057697088,-1.46688074800686,-1.42287488906134,-1.2287261348432,-1.03921779371581,-0.880801808419787,-0.63307126246482,-0.392603381807672,-0.256611588646745,-0.218100597664176,-0.19328819530244,-0.157232043589417,-0.113063610031532,-0.0585951674424325,0.0279468200628461,0.113792010069826,0.175505461299782,0.0554983646757212,-0.102413191017504,-0.143775196997345,-0.0778718509889765,0.014311734336519,0.0917056827136482,0.111329151006594,0.0486391075487174,-0.0196581530540648,0.00461337004512003,0.106708841961851,0.234639687104974,0.358131800140136,0.513108359550121,0.686139882183287,0.820415996381205,0.868816945940157,0.873845953843414,0.872048250663202,0.852701891344546,0.814319124612401],"text":["x: 0.00201534 y2: 1.989291633","x: 0.01458699 y2: 2.059693772","x: 0.02715864 y2: 2.088561156","x: 0.03973030 y2: 2.069183870","x: 0.05230195 y2: 2.005735352","x: 0.06487360 y2: 1.920596176","x: 0.07744525 y2: 1.826591422","x: 0.09001691 y2: 1.690020145","x: 0.10258856 y2: 1.407198290","x: 0.11516021 y2: 1.032120317","x: 0.12773186 y2: 0.682420929","x: 0.14030352 y2: 0.474718181","x: 0.15287517 y2: 0.402371242","x: 0.16544682 y2: 0.396937563","x: 0.17801847 y2: 0.433270124","x: 0.19059013 y2: 0.486221903","x: 0.20316178 y2: 0.530645880","x: 0.21573343 y2: 0.541395034","x: 0.22830508 y2: 0.549251538","x: 0.24087673 y2: 0.574580926","x: 0.25344839 y2: 0.650393643","x: 0.26602004 y2: 0.581303537","x: 0.27859169 y2: 0.369541356","x: 0.29116334 y2: 0.143531874","x: 0.30373500 y2: 0.027679585","x: 0.31630665 y2: -0.050528455","x: 0.32887830 y2: -0.026479670","x: 0.34144995 y2: 0.182203597","x: 0.35402161 y2: 0.377859643","x: 0.36659326 y2: 0.549797132","x: 0.37916491 y2: 0.458498089","x: 0.39173656 y2: 0.233596835","x: 0.40430822 y2: -0.008014858","x: 0.41687987 y2: -0.149445215","x: 0.42945152 y2: -0.020448323","x: 0.44202317 y2: 0.157803453","x: 0.45459482 y2: 0.239194921","x: 0.46716648 y2: 0.230428818","x: 0.47973813 y2: 0.162005367","x: 0.49230978 y2: -0.052764527","x: 0.50488143 y2: -0.437448179","x: 0.51745309 y2: -0.877944958","x: 0.53002474 y2: -1.259380577","x: 0.54259639 y2: -1.466880748","x: 0.55516804 y2: -1.422874889","x: 0.56773970 y2: -1.228726135","x: 0.58031135 y2: -1.039217794","x: 0.59288300 y2: -0.880801808","x: 0.60545465 y2: -0.633071262","x: 0.61802631 y2: -0.392603382","x: 0.63059796 y2: -0.256611589","x: 0.64316961 y2: -0.218100598","x: 0.65574126 y2: -0.193288195","x: 0.66831291 y2: -0.157232044","x: 0.68088457 y2: -0.113063610","x: 0.69345622 y2: -0.058595167","x: 0.70602787 y2: 0.027946820","x: 0.71859952 y2: 0.113792010","x: 0.73117118 y2: 0.175505461","x: 0.74374283 y2: 0.055498365","x: 0.75631448 y2: -0.102413191","x: 0.76888613 y2: -0.143775197","x: 0.78145779 y2: -0.077871851","x: 0.79402944 y2: 0.014311734","x: 0.80660109 y2: 0.091705683","x: 0.81917274 y2: 0.111329151","x: 0.83174439 y2: 0.048639108","x: 0.84431605 y2: -0.019658153","x: 0.85688770 y2: 0.004613370","x: 0.86945935 y2: 0.106708842","x: 0.88203100 y2: 0.234639687","x: 0.89460266 y2: 0.358131800","x: 0.90717431 y2: 0.513108360","x: 0.91974596 y2: 0.686139882","x: 0.93231761 y2: 0.820415996","x: 0.94488927 y2: 0.868816946","x: 0.95746092 y2: 0.873845954","x: 0.97003257 y2: 0.872048251","x: 0.98260422 y2: 0.852701891","x: 0.99517588 y2: 0.814319125"],"type":"scatter","mode":"lines","name":"fitted values","line":{"width":3.77952755905512,"color":"rgba(51,102,255,1)","dash":"solid"},"hoveron":"points","showlegend":false,"xaxis":"x","yaxis":"y","hoverinfo":"text","frame":null}],"layout":{"margin":{"t":30.6118721461187,"r":7.30593607305936,"b":44.5662100456621,"l":37.2602739726027},"plot_bgcolor":"rgba(235,235,235,1)","paper_bgcolor":"rgba(255,255,255,1)","font":{"color":"rgba(0,0,0,1)","family":"","size":14.6118721461187},"xaxis":{"domain":[0,1],"automargin":true,"type":"linear","autorange":false,"range":[-0.0476426868699491,1.0448339022696],"tickmode":"array","ticktext":["0.00","0.25","0.50","0.75","1.00"],"tickvals":[0,0.25,0.5,0.75,1],"categoryorder":"array","categoryarray":["0.00","0.25","0.50","0.75","1.00"],"nticks":null,"ticks":"outside","tickcolor":"rgba(51,51,51,1)","ticklen":3.65296803652968,"tickwidth":0.66417600664176,"showticklabels":true,"tickfont":{"color":"rgba(77,77,77,1)","family":"","size":11.689497716895},"tickangle":-0,"showline":false,"linecolor":null,"linewidth":0,"showgrid":true,"gridcolor":"rgba(255,255,255,1)","gridwidth":0.66417600664176,"zeroline":false,"anchor":"y","title":{"text":"x","font":{"color":"rgba(0,0,0,1)","family":"","size":14.6118721461187}},"hoverformat":".2f"},"yaxis":{"domain":[0,1],"automargin":true,"type":"linear","autorange":false,"range":[-2.89197697277748,4.32643081546365],"tickmode":"array","ticktext":["-2","0","2","4"],"tickvals":[-2,0,2,4],"categoryorder":"array","categoryarray":["-2","0","2","4"],"nticks":null,"ticks":"outside","tickcolor":"rgba(51,51,51,1)","ticklen":3.65296803652968,"tickwidth":0.66417600664176,"showticklabels":true,"tickfont":{"color":"rgba(77,77,77,1)","family":"","size":11.689497716895},"tickangle":-0,"showline":false,"linecolor":null,"linewidth":0,"showgrid":true,"gridcolor":"rgba(255,255,255,1)","gridwidth":0.66417600664176,"zeroline":false,"anchor":"x","title":{"text":"y2","font":{"color":"rgba(0,0,0,1)","family":"","size":14.6118721461187}},"hoverformat":".2f"},"shapes":[{"type":"rect","fillcolor":null,"line":{"color":null,"width":0,"linetype":[]},"yref":"paper","xref":"paper","x0":0,"x1":1,"y0":0,"y1":1}],"showlegend":false,"legend":{"bgcolor":"rgba(255,255,255,1)","bordercolor":"transparent","borderwidth":1.88976377952756,"font":{"color":"rgba(0,0,0,1)","family":"","size":11.689497716895}},"hovermode":"closest","barmode":"relative"},"config":{"doubleClick":"reset","showSendToCloud":false},"source":"A","attrs":{"e5434c3d649f":{"x":{},"y":{},"type":"scatter"},"e543529929d3":{"x":{},"y":{}}},"cur_data":"e5434c3d649f","visdat":{"e5434c3d649f":["function (y) ","x"],"e543529929d3":["function (y) ","x"]},"highlight":{"on":"plotly_click","persistent":false,"dynamic":false,"selectize":false,"opacityDim":0.2,"selected":{"opacity":1},"debounce":0},"shinyEvents":["plotly_hover","plotly_click","plotly_selected","plotly_relayout","plotly_brushed","plotly_brushing","plotly_clickannotation","plotly_doubleclick","plotly_deselect","plotly_afterplot","plotly_sunburstclick"],"base_url":"https://plot.ly"},"evals":[],"jsHooks":[]}</script>

---
# Themes: Add some style to your plot

.left-code[

```r
p <- ggplot(mtcars) +
 geom_point(aes(x = wt, 
 y = mpg, 
 colour = factor(gear))) +
 facet_wrap(~am)
p

```
]

.right-plot[
<img src="lecture_4b_files/figure-html/mtcars-out-1.png" width="100%" style="display: block; margin: auto;" />
]

---
# Theme: theme_minimal

.left-code[

```r
p + 
  theme_minimal()
```
]

.right-plot[
<img src="lecture_4b_files/figure-html/mtcars-minimal-out-1.png" width="100%" style="display: block; margin: auto;" />
]
---
# Theme: ggthemes `theme_few()`

.left-code[

```r
p + 
  theme_few() + 
  scale_colour_few()
```
]

.right-plot[
<img src="lecture_4b_files/figure-html/mtcars-theme-few-out-1.png" width="100%" style="display: block; margin: auto;" />
]

---
# Theme: ggthemes `theme_excel()` 😷

.left-code[

```r
p + 
  theme_excel() + 
  scale_colour_excel()
```
]

.right-plot[
<img src="lecture_4b_files/figure-html/mtcars-theme-excel-out-1.png" width="100%" style="display: block; margin: auto;" />
]
---
# Theme: for fun

.left-code[

```r
library(wesanderson)
p + 
  scale_colour_manual(
    values = wes_palette("Royal1")
    )

```
]

.right-plot[
<img src="lecture_4b_files/figure-html/theme-wes-out-1.png" width="100%" style="display: block; margin: auto;" />
]

---
# Summary: themes

- The `ggthemes` package has many different styles for the plots. 
- Other packages such as `xkcd`, `skittles`, `wesanderson`, `beyonce`, `ochre`, ....

---
background-image: url(images/munzer-hierarchy.png)
background-size: contain
background-position: 50% 50%
class: center, bottom, white

---
# Hierarchy of mappings

1. Position - common scale (BEST): axis system
2. Position - nonaligned scale: boxes in a side-by-side boxplot
3. Length, direction, angle: pie charts, regression lines, wind maps
4. Area: bubble charts
5. Volume, curvature: 3D plots
6. Shading, color (WORST): maps, points coloured by numeric variable

- [Di's crowd-sourcing expt](http://visiphilia.org/2016/08/03/CM-hierarchy)
- Nice explanation by [Peter Aldous](http://paldhous.github.io/ucb/2016/dataviz/week2.html)
- [General plotting advice and a book from Naomi Robbins](https://www.forbes.com/sites/naomirobbins/#2b1e20082a6a)

---
# Your Turn:

- lab quiz open (requires answering questions from Lab exercise)
- go to rstudio.cloud and check out exercise 4-B
- If you want to use R / Rstudio on your laptop:
  - Install R + Rstudio (see [Stuart Lee's instructions](https://github.com/sa-lee/installr))
  - open R
  - type the following:
  ```r
  # install.packages("usethis")
  library(usethis)
  use_course("https://ida.numbat.space/exercises/4b/ida-exercise-4b.zip")
  ```

---

# Resources

- Kieran Healy [Data Visualization](http://socviz.co/index.html)
- Winston Chang (2012) [Cookbook for R](graphics cookbook)
- Antony Unwin (2014) [Graphical Data Analysis](http://www.gradaanwr.net)
- Naomi Robbins (2013) [Creating More Effective Charts](http://www.nbr-graphs.com)

Notes for current slide

Notes for next slide

These slides are viewed best by Chrome and occasionally need to be refreshed if elements did not load properly. See here for PDF .

Press the right arrow to progress to the next slide!

1/66

ETC1010: Introduction to Data Analysis

Week 4, part B

Advanced topics in data visualisation

Lecturer: Nicholas Tierney

Department of Econometrics and Business Statistics

ETC1010.Clayton-x@monash.edu

April 2020

1/66

While the song is playing...

Draw a mental model / concept map of last lectures content on joins.

2/66

recapJoins
3/66

Joins with a person and a coat, by Leight Tami

4/66

Upcoming Due DatesAssignment 1: Due April 8 at 5pm (Today!)
5/66

Exploring life expectancy and income

We want to plot life expectancy vs income, but there's a problem:

gap_life_au
## # A tibble: 9 x 3
##   country    year life_expectancy
##   <chr>     <dbl>           <dbl>
## 1 Australia  2012            82.5
## 2 Australia  2013            82.6
## 3 Australia  2014            82.5
## 4 Australia  2015            82.5
## 5 Australia  2016            82.5
## 6 Australia  2017            82.4
## 7 Australia  2018            82.5
## 8 Australia  2019            82.7
## 9 Australia  2020            82.8

gap_income_au
## # A tibble: 9 x 3
##   country    year   gdp
##   <chr>     <dbl> <dbl>
## 1 Australia  2012 42800
## 2 Australia  2013 43200
## 3 Australia  2014 43700
## 4 Australia  2015 44100
## 5 Australia  2016 44600
## 6 Australia  2017 44900
## 7 Australia  2018 45400
## 8 Australia  2019 45500
## 9 Australia  2020 45800

6/66

We need them in the same dataframe!

We could try bind_cols(), to bind dataframes columns together

bind_cols(gap_life_au,
          gap_income_au)
## # A tibble: 9 x 6
##   country    year life_expectancy country1  year1   gdp
##   <chr>     <dbl>           <dbl> <chr>     <dbl> <dbl>
## 1 Australia  2012            82.5 Australia  2012 42800
## 2 Australia  2013            82.6 Australia  2013 43200
## 3 Australia  2014            82.5 Australia  2014 43700
## 4 Australia  2015            82.5 Australia  2015 44100
## 5 Australia  2016            82.5 Australia  2016 44600
## 6 Australia  2017            82.4 Australia  2017 44900
## 7 Australia  2018            82.5 Australia  2018 45400
## 8 Australia  2019            82.7 Australia  2019 45500
## 9 Australia  2020            82.8 Australia  2020 45800

7/66

But this has problems:

It produces messy output (country1, year1)
It doesn't work if the data doesn't have the same number of rows

## # A tibble: 9 x 6
##   country    year life_expectancy country1  year1   gdp
##   <chr>     <dbl>           <dbl> <chr>     <dbl> <dbl>
## 1 Australia  2012            82.5 Australia  2012 42800
## 2 Australia  2013            82.6 Australia  2013 43200
## 3 Australia  2014            82.5 Australia  2014 43700
## 4 Australia  2015            82.5 Australia  2015 44100
## 5 Australia  2016            82.5 Australia  2016 44600
## 6 Australia  2017            82.4 Australia  2017 44900
## 7 Australia  2018            82.5 Australia  2018 45400
## 8 Australia  2019            82.7 Australia  2019 45500
## 9 Australia  2020            82.8 Australia  2020 45800

8/66

How to bind data?

For example, how do we add this co2 data to income or life?

gap_co2_au
## # A tibble: 3 x 3
##   country    year   co2
##   <chr>     <dbl> <dbl>
## 1 Australia  2012  17  
## 2 Australia  2013  16.1
## 3 Australia  2014  15.4

9/66

How to bind data?

We can't use bind_cols()

bind_cols(gap_co2_au,
          gap_income_au)

Error: Argument 2 must be length 3, not 9

10/66

How to bind data?

We can't use bind_cols()

bind_cols(gap_co2_au,
          gap_income_au)

Error: Argument 2 must be length 3, not 9

We could think about a more complex approach using filter, and so on...

10/66

How to bind data?

We can't use bind_cols()

bind_cols(gap_co2_au,
          gap_income_au)

Error: Argument 2 must be length 3, not 9

We could think about a more complex approach using filter, and so on...

But surely this must be a problem that we encounter in data analysis?

10/66

How to bind data?

We can't use bind_cols()

bind_cols(gap_co2_au,
          gap_income_au)

Error: Argument 2 must be length 3, not 9

We could think about a more complex approach using filter, and so on...

But surely this must be a problem that we encounter in data analysis?

Someone must have thought of a solution to this before?

10/66

How to bind data?

We can't use bind_cols()

bind_cols(gap_co2_au,
          gap_income_au)

Error: Argument 2 must be length 3, not 9

We could think about a more complex approach using filter, and so on...

But surely this must be a problem that we encounter in data analysis?

Someone must have thought of a solution to this before?

They did! Joins!

10/66

Joins!

We can use left_join() to combine the income and life expectancy data

left_join(x = gap_income_au,
          y = gap_life_au,
          by = c("country", "year"))
## # A tibble: 9 x 4
##   country    year   gdp life_expectancy
##   <chr>     <dbl> <dbl>           <dbl>
## 1 Australia  2012 42800            82.5
## 2 Australia  2013 43200            82.6
## 3 Australia  2014 43700            82.5
## 4 Australia  2015 44100            82.5
## 5 Australia  2016 44600            82.5
## 6 Australia  2017 44900            82.4
## 7 Australia  2018 45400            82.5
## 8 Australia  2019 45500            82.7
## 9 Australia  2020 45800            82.8

11/66

Add co2 data with another join:

We get missings for co2, because we don't have c02 values for 2015 and beyond.

left_join(x = gap_income_au,
          y = gap_life_au,
          by = c("country", "year")) %>% 
  left_join(gap_co2_au,
            by = c("country", "year"))
## # A tibble: 9 x 5
##   country    year   gdp life_expectancy   co2
##   <chr>     <dbl> <dbl>           <dbl> <dbl>
## 1 Australia  2012 42800            82.5  17  
## 2 Australia  2013 43200            82.6  16.1
## 3 Australia  2014 43700            82.5  15.4
## 4 Australia  2015 44100            82.5  NA  
## 5 Australia  2016 44600            82.5  NA  
## 6 Australia  2017 44900            82.4  NA  
## 7 Australia  2018 45400            82.5  NA  
## 8 Australia  2019 45500            82.7  NA  
## 9 Australia  2020 45800            82.8  NA

12/66

So now we can combine that together like so:

gap_au <- left_join(x = gap_income_au,
          y = gap_life_au,
          by = c("country", "year")) %>% 
  left_join(gap_co2_au,
            by = c("country", "year"))
gap_au
## # A tibble: 9 x 5
##   country    year   gdp life_expectancy   co2
##   <chr>     <dbl> <dbl>           <dbl> <dbl>
## 1 Australia  2012 42800            82.5  17  
## 2 Australia  2013 43200            82.6  16.1
## 3 Australia  2014 43700            82.5  15.4
## 4 Australia  2015 44100            82.5  NA  
## 5 Australia  2016 44600            82.5  NA  
## 6 Australia  2017 44900            82.4  NA  
## 7 Australia  2018 45400            82.5  NA  
## 8 Australia  2019 45500            82.7  NA  
## 9 Australia  2020 45800            82.8  NA

13/66

Now we can make a plot!

ggplot(gap_au,
       aes(x = gdp,
           y = life_expectancy)) + 
  geom_point()

14/66

Your Turn: go to exercises on rstudio.cloud

open "joins.Rmd"

Discuss with your partner why these two joins produce different results?

left_join(gap_co2_au,
          gap_life_au)
## # A tibble: 3 x 4
##   country    year   co2 life_expectancy
##   <chr>     <dbl> <dbl>           <dbl>
## 1 Australia  2012  17              82.5
## 2 Australia  2013  16.1            82.6
## 3 Australia  2014  15.4            82.5

left_join(gap_life_au,
          gap_co2_au)
## # A tibble: 9 x 4
##   country    year life_expectancy   co2
##   <chr>     <dbl>           <dbl> <dbl>
## 1 Australia  2012            82.5  17  
## 2 Australia  2013            82.6  16.1
## 3 Australia  2014            82.5  15.4
## 4 Australia  2015            82.5  NA  
## 5 Australia  2016            82.5  NA  
## 6 Australia  2017            82.4  NA  
## 7 Australia  2018            82.5  NA  
## 8 Australia  2019            82.7  NA  
## 9 Australia  2020            82.8  NA

15/66

Your Turn:

What happens when we add data from New Zealand into the mix?

How can you join that data together?

16/66

Making effective data plotsPrinciples / science of data visualisation
Features of graphics
17/66

Principles / science of data visualisationPalettes and colour blindness
change blindness
using proximity
hierarchy of mappings
18/66

Features of graphicsLayering statistical summaries
Themes
adding interactivity
19/66

Palettes and colour blindness

There are three main types of colour palette:

Qualitative: categorical variables
Sequential: low to high numeric values
Diverging: negative to positive values

20/66

Qualitative: categorical variables

21/66

Sequential: low to high numeric values

22/66

Diverging: negative to positive values

23/66

Example: TB data

## # A tibble: 157,820 x 5
##    country      year count gender age  
##    <chr>       <dbl> <dbl> <chr>  <chr>
##  1 Afghanistan  1980    NA m      04   
##  2 Afghanistan  1981    NA m      04   
##  3 Afghanistan  1982    NA m      04   
##  4 Afghanistan  1983    NA m      04   
##  5 Afghanistan  1984    NA m      04   
##  6 Afghanistan  1985    NA m      04   
##  7 Afghanistan  1986    NA m      04   
##  8 Afghanistan  1987    NA m      04   
##  9 Afghanistan  1988    NA m      04   
## 10 Afghanistan  1989    NA m      04   
## # … with 157,810 more rows

24/66

Example: TB data: adding relative change

## # A tibble: 219 x 4
##    country             `2002` `2012`  reldif
##    <chr>                <dbl>  <dbl>   <dbl>
##  1 Afghanistan           6509  13907  1.14  
##  2 Albania                225    185 -0.178 
##  3 Algeria               8246   7510 -0.0893
##  4 American Samoa           1      0 -1     
##  5 Andorra                  2      2  0     
##  6 Angola               17988  22106  0.229 
##  7 Anguilla                 0      0  0     
##  8 Antigua and Barbuda      4      1 -0.75  
##  9 Argentina             5383   4787 -0.111 
## 10 Armenia                511    316 -0.382 
## # … with 209 more rows

25/66

Example: Sequential colour with default palette

ggplot(tb_map) + geom_polygon(aes(x = long, y = lat, group = group, fill = reldif)) +
  theme_map()

26/66

Example: (improved) sequential colour with default palette

library(viridis)
ggplot(tb_map) +
  geom_polygon(aes(x = long, y = lat, group = group, fill = reldif)) +
  theme_map() + scale_fill_viridis(na.value = "white")

27/66

Example: Diverging colour with better palette

ggplot(tb_map) +
  geom_polygon(aes(x = long, y = lat, group = group, fill = reldif)) +
  theme_map() +
  scale_fill_distiller(palette = "PRGn", na.value = "white", limits = c(-7, 7))

28/66

Summary on colour palettesDifferent ways to map colour to values:Qualitative: categorical variables
Sequential: low to high numeric values
Diverging: negative to positive values

29/66

Colour blindness

About 8% of men (about 1 in 12), and 0.5% women (about 1 in 200) population have difficulty distinguishing between red and green.
Several colour blind tested palettes: RColorbrewer has an associated web site colorbrewer.org where the palettes are labelled. See also viridis, and scico.

30/66

Plot of two coloured points: Normal Mode

31/66

Plot of two coloured points: dicromat mode

32/66

Showing all types of colourblindness

33/66

Impact of colourblind-safe palette

p2 <- p + scale_colour_brewer(palette = "Dark2")
p2

34/66

Impact of colourblind-safe palette

35/66

Impact of colourblind-safe palette

p3 <- p + scale_colour_viridis_d()
p3

36/66

Impact of colourblind-safe palette

37/66

Summary colour blindnessApply colourblind-friendly colourscales+ scale_colour_viridis()
+ scale_colour_brewer(palette = "Dark2")
scico R package

38/66

Pre-attentiveness: Find the odd one out?

39/66

Pre-attentiveness: Find the odd one out?

40/66

Using proximity in your plots

Basic rule: place the groups that you want to compare close to each other

41/66

Which plot answers which question?"Is the incidence similar for males and females in 2012 across age groups?"
"Is the incidence similar for age groups in 2012, across gender?" 
42/66

incidence similar for: (M and F) or (age, across gender) ?"

43/66

"Incidence similar for M & F in 2012 across age?"

Males & females next to each other: relative heights of bars is seen quickly.
Auestion answer: "No, the numbers were similar in youth, but males are more affected with increasing age."

44/66

"Incidence similar for age in 2012, across gender?"

Puts the focus on age groups
Answer to the question: "No, among females, the incidence is higher at early ages. For males, the incidence is much more uniform across age groups."

45/66

Proximity wrap up

Facetting of plots, and proximity are related to change blindness, an area of study in cognitive psychology.
There are a series of fabulous videos illustrating the effects of making a visual break, on how the mind processes it by Daniel Simons lab.
Here's one example:
The door study

46/66

LayeringStatistical summaries: It is common to layer plots, particularly by adding statistical summaries, like a model fit, or means and standard deviations. The purpose is to show the trend in relation to the variation. 
Maps: Commonly maps provide the framework for data collected spatially. One layer for the map, and another for the data.
47/66

`geom_point()`

ggplot(df, aes(x = x, y = y1)) + geom_point()

48/66

`geom_smooth(method = "lm", se = FALSE)`

ggplot(df, aes(x = x, y = y1)) + geom_point() +
  geom_smooth(method = "lm", se = FALSE)

49/66

`geom_smooth(method = "lm")`

ggplot(df, aes(x = x, y = y1)) + geom_point() +
  geom_smooth(method = "lm")

50/66

`geom_point()`

ggplot(df, aes(x = x, y = y2)) + geom_point()

51/66

`geom_smooth(method = "lm", se = FALSE)`

ggplot(df, aes(x = x, y = y2)) + geom_point() +
  geom_smooth(method = "lm", se = FALSE)

52/66

geom_smooth(se = FALSE)

ggplot(df, aes(x = x, y = y2)) + geom_point() +
  geom_smooth(se = FALSE)

53/66

geom_smooth(se = FALSE, span = 0.05)

ggplot(df, aes(x = x, y = y2)) + geom_point() +
  geom_smooth(se = FALSE, span = 0.05)

54/66

`geom_smooth(se = FALSE, span = 0.2)`

p1 <- ggplot(df, aes(x = x, y = y2)) + geom_point() +
  geom_smooth(se = FALSE, span = 0.2)
p1

55/66

Interactivity with magic plotly

library(plotly)
ggplotly(p1)

56/66

Themes: Add some style to your plot

p <- ggplot(mtcars) +
  geom_point(aes(x = wt, 
                 y = mpg, 
                 colour = factor(gear))) +
  facet_wrap(~am)
p

57/66

Theme: theme_minimal

p + 
  theme_minimal()

58/66

Theme: ggthemes `theme_few()`

p + 
  theme_few() + 
  scale_colour_few()

59/66

Theme: ggthemes `theme_excel()` 😷

p + 
  theme_excel() + 
  scale_colour_excel()

60/66

Theme: for fun

library(wesanderson)
p + 
  scale_colour_manual(
    values = wes_palette("Royal1")
    )

61/66

Summary: themesThe ggthemes package has many different styles for the plots. 
Other packages such as xkcd, skittles, wesanderson, beyonce, ochre, ....
62/66

63/66

Hierarchy of mappings

Position - common scale (BEST): axis system
Position - nonaligned scale: boxes in a side-by-side boxplot
Length, direction, angle: pie charts, regression lines, wind maps
Area: bubble charts
Volume, curvature: 3D plots
Shading, color (WORST): maps, points coloured by numeric variable

Di's crowd-sourcing expt
Nice explanation by Peter Aldous
General plotting advice and a book from Naomi Robbins

64/66

Your Turn:lab quiz open (requires answering questions from Lab exercise)
go to rstudio.cloud and check out exercise 4-B
If you want to use R / Rstudio on your laptop:Install R + Rstudio (see Stuart Lee's instructions)
open R
type the following:# install.packages("usethis")
library(usethis)
use_course("https://ida.numbat.space/exercises/4b/ida-exercise-4b.zip")


65/66

Resources

Kieran Healy Data Visualization
Winston Chang (2012) Cookbook for R
Antony Unwin (2014) Graphical Data Analysis
Naomi Robbins (2013) Creating More Effective Charts

66/66

ETC1010: Introduction to Data Analysis

Week 4, part B

Advanced topics in data visualisation

Lecturer: Nicholas Tierney

Department of Econometrics and Business Statistics

ETC1010.Clayton-x@monash.edu

April 2020

1/66

Paused

Help

Keyboard shortcuts

↑, ←, Pg Up, k	Go to previous slide
↓, →, Pg Dn, Space, j	Go to next slide
Home	Go to first slide
End	Go to last slide
Number + Return	Go to specific slide
b / m / f	Toggle blackout / mirrored / fullscreen mode
c	Clone slideshow
p	Toggle presenter mode
t	Restart the presentation timer
?, h	Toggle this help

Esc	Back to slideshow

ETC1010: Introduction to Data Analysis

Week 4, part B

Advanced topics in data visualisation

While the song is playing...

recap

Joins with a person and a coat, by Leight Tami

Upcoming Due Dates

Exploring life expectancy and income

We need them in the same dataframe!

But this has problems:

How to bind data?

How to bind data?

How to bind data?

How to bind data?

How to bind data?

How to bind data?

Joins!

Add co2 data with another join:

So now we can combine that together like so:

Now we can make a plot!

Your Turn: go to exercises on rstudio.cloud

Your Turn:

Making effective data plots

Principles / science of data visualisation

Features of graphics

Palettes and colour blindness

Qualitative: categorical variables

Sequential: low to high numeric values

Diverging: negative to positive values

Example: TB data

Example: TB data: adding relative change

Example: Sequential colour with default palette

Example: (improved) sequential colour with default palette

Example: Diverging colour with better palette

Summary on colour palettes

Colour blindness

Plot of two coloured points: Normal Mode

Plot of two coloured points: dicromat mode

Showing all types of colourblindness

Impact of colourblind-safe palette

Impact of colourblind-safe palette

Impact of colourblind-safe palette

Impact of colourblind-safe palette

Summary colour blindness

Pre-attentiveness: Find the odd one out?

Pre-attentiveness: Find the odd one out?

Using proximity in your plots

Which plot answers which question?

incidence similar for: (M and F) or (age, across gender) ?"

"Incidence similar for M & F in 2012 across age?"

"Incidence similar for age in 2012, across gender?"

Proximity wrap up

Layering

geom_point()

geom_smooth(method = "lm", se = FALSE)

geom_smooth(method = "lm")

geom_point()

geom_smooth(method = "lm", se = FALSE)

geom_smooth(se = FALSE, span = 0.2)

Interactivity with magic plotly

Themes: Add some style to your plot

Theme: theme_minimal

Theme: ggthemes theme_few()

Theme: ggthemes theme_excel() 😷

Theme: for fun

Summary: themes

Hierarchy of mappings

Your Turn:

Resources

ETC1010: Introduction to Data Analysis

Week 4, part B

Advanced topics in data visualisation

Help

`geom_point()`

`geom_smooth(method = "lm", se = FALSE)`

`geom_smooth(method = "lm")`

`geom_point()`

`geom_smooth(method = "lm", se = FALSE)`

`geom_smooth(se = FALSE, span = 0.2)`

Theme: ggthemes `theme_few()`

Theme: ggthemes `theme_excel()` 😷