<div class="shade_black"  style="width:60%;right:0;bottom:0;padding:10px;border: dashed 4px white;margin: auto;">
<i class="fas fa-exclamation-circle"></i> These slides are viewed best by Chrome and occasionally need to be refreshed if elements did not load properly. See <a href=/>here for PDF <i class="fas fa-file-pdf"></i></a>.
</div>

<br>

.white[Press the **right arrow** to progress to the next slide!]

---

background-image: url(images/bg1.jpg)
background-size: cover
class: hide-slide-number split-70 title-slide
count: false

.column.shade_black[.content[

<br>

# .monash-blue.outline-text[ETC1010: Introduction to Data Analysis]

<br>

<h2 style="font-weight:900!important;">Week of introduction</h2>

.bottom_abs.width100[

Lecturer: *Nicholas Tierney*

Department of Econometrics and Business Statistics

<span><i class="fas  fa-envelope faa-float animated "></i></span>  ETC1010.Clayton-x@monash.edu

April 2020

<br>
]

]]

---
class: transition
# While the song is playing...

Draw a mental model / concept map of last lectures content on Missing Data.

---

background-image: url(https://www.kdnuggets.com/images/cartoon-turkey-data-science.jpg)
background-size: contain
background-position: 50% 50%

---
# Overview

- Different file formats
    - Audio / binary
- Web data
    - responsible scraping
    - scraping
    - JSON

---
class: transition
# Recap on some tricky topics

- pipes `%>%` ("then")
- assignment `<-` ("gets")

---
# The pipe operator: `%>%`

- Code to tell a story about a little bunny foo foo (borrowed from https://r4ds.had.co.nz/pipes.html):
- Using functions for each verb: `hop()`, `scoop()`, `bop()`.

> Little bunny Foo Foo
Went hopping through the forest
Scooping up the field mice
And bopping them on the head

---
# Approach: Intermediate steps

```r
foo_foo_1 <- hop(foo_foo, through = forest)
foo_foo_2 <- scoop(foo_foo_1, up = field_mice)
foo_foo_3 <- bop(foo_foo_2, on = head)
```

- Main downside: forces you to name each intermediate element. 
- Sometimes these steps form natural names. If this is the case - go ahead.
- **But many times there are not natural names**
- Adding number suffixes to make the names unique leads to problems.

---
# Approach: Intermediate steps

```r
foo_foo_1 <- hop(foo_foo, through = forest)
foo_foo_2 <- scoop(foo_foo_1, up = field_mice)
foo_foo_3 <- bop(foo_foo_2, on = head)
```
- Code is cluttered with unimportant names
- Suffix has to be carefully incremented on each line.
- I've done this! 
- 99% of the time I miss a number somewhere, and there goes my evening ... debugging my code.

---
# Another Approach: Overwrite the original

```r
foo_foo <- hop(foo_foo, through = forest)
foo_foo <- scoop(foo_foo, up = field_mice)
foo_foo <- bop(foo_foo, on = head)
```

- Overwrite originals instead of creating intermediate objects 
- Less typing (and less thinking). Less likely to make mistakes?
- **Painful debugging**: need to re-run the code from the top.
- Repitition of object - (`foo_foo` written 6 times!) Obscures what changes.

---
# (Yet) Another approach: function composition

```r
bop(
  scoop(
    hop(foo_foo, through = forest),
    up = field_mice
  ), 
  on = head
)
```

- You need to read inside-out, and right-to-left.
- Arguments are spread far apart
- Harder to read

---
# Pipe `%>%` can help!

.pull-left[
`f(x)`

`g(f(x))`

`h(g(f(x)))`
]

.pull-right[
`x %>% f()`

`x %>% f() %>% g()`

`x %>% f() %>% g() %>% h()`
]

---
# Solution: Use the pipe - `%>%`

```r
foo_foo %>%
  hop(through = forest) %>%
  scoop(up = field_mice) %>%
  bop(on = head)
```

- focusses on verbs, not nouns. 
- Can be read as a series of function compositions like actions.

> Foo Foo hops, then scoops, then bops.

- read more at: https://r4ds.had.co.nz/pipes.html

---
class: transition

# Assignment `<-`

"gets"

---
# Assignment

We can perform calculations in R:

```r
1 + 1
read_csv("data.csv")
```

---
# Assignment

But what if we want to use that information later?

```r
1 + 1
read_csv("data.csv")
```

---
# Assignment

We can assign these things to an object using `<-`

This reads as "gets".

```r
x <- 1 + 1
my_data <- read_csv("data.csv")
```

- x 'gets' 1+1
- my_data 'gets' the output of read_csv...
---
# Assignment

Then we can use those things in other calculations

```r
x <- 1 + 1
my_data <- read_csv("data.csv")

x * x

my_data %>% 
  select(age, height, weight) %>% 
  mutate(bmi = weight / height^2)
```

---
class: transition
# Take 3 minutes to discuss these two concepts with your breakout room

- What are pipes `%>%`
- What is assignment? `<-`

---
class: transition
# The many shapes and sizes of data

---
# Data as an audio file

```
## # A tibble: 50,001 x 3
##        t  left right
##    <int> <int> <int>
##  1     1    28    29
##  2     2    27    28
##  3     3    26    24
##  4     4    24    27
##  5     5    22    18
##  6     6    15    19
##  7     7    15    13
##  8     8    12    13
##  9     9    15    16
## 10    10    18    16
## # … with 49,991 more rows
```

---
# Plotting audio data?

---
# Compare left and right channels

???

Oh, same sound is on both channels! A tad drab.

---
# Compute statistics

```r
df_wavs %>%
  filter(channel == "left") %>%
  group_by(word) %>%
  summarise(
    m = mean(value),
    s = sd(value),
    mx = max(value),
    mn = min(value)
  ) %>%
  as_table()
```

---
# Di's music

```
##                             artist      type        lvar        lave  lmax
## Dancing Queen                 Abba      Rock  17600755.6 -90.0068667 29921
## Knowing Me                    Abba      Rock   9543020.9 -75.7667187 27626
## Take a Chance                 Abba      Rock   9049481.5 -98.0629244 26372
## Mamma Mia                     Abba      Rock   7557437.3 -90.4710616 28898
## Lay All You                   Abba      Rock   6282285.6 -88.9526309 27940
## Super Trouper                 Abba      Rock   4665866.7 -69.0208450 25531
## I Have A Dream                Abba      Rock   3369670.3 -71.6828788 14699
## The Winner                    Abba      Rock   1135862.0 -67.8190467  8928
## Money                         Abba      Rock   6146942.6 -76.2807500 22962
## SOS                           Abba      Rock   3482881.8 -74.1300038 15517
## V1                         Vivaldi Classical   3677296.2  66.6530970 24229
## V2                         Vivaldi Classical    771491.8  21.6560941  6936
## V3                         Vivaldi Classical   5227573.2  88.6465556 17721
## V4                         Vivaldi Classical    334719.1  13.8318847  4123
## V5                         Vivaldi Classical    836904.9  34.5677377 17306
## V6                         Vivaldi Classical  13936436.7 216.2317586 30355
## V7                         Vivaldi Classical   3636324.3   9.8366363 21450
## V8                         Vivaldi Classical    295397.4   2.8143227  2985
## V9                         Vivaldi Classical   4335879.0  10.9015767 22271
## V10                        Vivaldi Classical    472630.0   3.8890862  8194
## M1                          Mozart Classical   2819795.0  -5.8667602 14939
## M2                          Mozart Classical   2836957.9  -5.6074580 13382
## M3                          Mozart Classical   9089372.1  -5.9719205 25265
## M4                          Mozart Classical   4056229.7  -6.2272904 21328
## M5                          Mozart Classical   1568925.6  -6.1993790 15839
## M6                          Mozart Classical   7758409.1  -5.7700183 22496
## All in a Days Work            Eels      Rock  40275677.4 -10.6893416 32759
## Saturday Morning              Eels      Rock 129472199.3  50.2115773 32759
## The Good Old Days             Eels      Rock  18838849.0  -0.0222764 30386
## Love of the Loveless          Eels      Rock  43201194.3   1.8926897 32759
## Girl                          Eels      Rock  88547131.0   0.3358761 32744
## Agony                         Eels      Rock  16285811.4  -0.1405876 30106
## Rock Hard Times               Eels      Rock  54651552.8   1.9848514 32759
## Restraining                   Eels      Rock  12322434.4   1.0259790 23221
## Lone Wolf                     Eels      Rock  63878899.8   3.3592505 32759
## Wrong About Bobby             Eels      Rock  43668186.9  -2.0125667 32759
## Love Me Do                 Beatles      Rock  28806811.2  -5.7452189 24159
## I Want to Hold Your Hand   Beatles      Rock  61257693.6  -6.0340682 28502
## Cant Buy Me Love           Beatles      Rock  76729438.0  -5.9583971 30102
## I Feel Fine                Beatles      Rock  52497242.7  -5.7314591 29911
## Ticket to Ride             Beatles      Rock  68104547.0  -6.1449114 30415
## Help                       Beatles      Rock  52569372.5  -5.7166301 32318
## Yesterday                  Beatles      Rock  23080907.6  -6.0298337 28169
## Yellow Submarine           Beatles      Rock  39908667.5  -6.2616915 29061
## Eleanor Rigby              Beatles      Rock  18819753.2  -6.1265193 21680
## Penny Lane                 Beatles      Rock  58614798.7  -5.9971242 31131
## B1                       Beethoven Classical   8368952.9  -0.9538330 26645
## B2                       Beethoven Classical    293608.3  -0.1247094  4554
## B3                       Beethoven Classical   8051764.6  -0.3316964 24194
## B4                       Beethoven Classical  23493873.6  -0.9411538 32766
## B5                       Beethoven Classical   1640232.8   1.3899979 20877
## B6                       Beethoven Classical    343973.1  -2.4748955  9225
## B7                       Beethoven Classical   3644784.2  -1.0426907 24633
## B8                       Beethoven Classical  15030950.3  -1.4394652 26066
## The Memory of Trees           Enya  New wave   1135493.1 -10.6183398  9994
## Anywhere Is                   Enya  New wave  12230252.2 -17.8372700 24968
## Pax Deorum                    Enya  New wave   1723627.9  -6.8327065 13227
## Waterloo                      Abba      Rock  24898675.9 -93.9961871 29830
## V11                        Vivaldi Classical   1879989.2  12.7213373  8601
## V12                        Vivaldi Classical    737349.6   5.7190022  7089
## V13                        Vivaldi Classical   2865979.9  21.4467629 17282
## Hey Jude                   Beatles      Rock   8651854.1  -6.1322408 18509
##                             lfener     lfreq
## Dancing Queen            105.92095  59.57379
## Knowing Me               102.83616  58.48031
## Take a Chance            102.32488 124.59397
## Mamma Mia                101.61648  48.76513
## Lay All You              100.30076  74.02039
## Super Trouper            100.24848  81.40140
## I Have A Dream           104.59686 305.18689
## The Winner               104.34921 277.66056
## Money                    102.24066 165.15799
## SOS                      104.36243 146.73700
## V1                        99.25243 329.53792
## V2                       104.36737 843.83240
## V3                       104.61255 165.76781
## V4                       104.35005 293.99972
## V5                        88.82821 198.38305
## V6                       104.82354 198.46716
## V7                       100.52727 877.77243
## V8                       108.28717  58.41722
## V9                       101.32881 176.53441
## V10                       98.83731 526.04942
## M1                       102.25358 342.26017
## M2                       102.04646 511.85517
## M3                       102.23796 429.27618
## M4                        97.59887 343.75319
## M5                        99.47580 288.44819
## M6                       101.66942 459.24182
## All in a Days Work       106.07617  65.48281
## Saturday Morning         114.00229  41.40515
## The Good Old Days        105.45611 165.24210
## Love of the Loveless     108.37835 174.64185
## Girl                     112.00916 392.28702
## Agony                    103.94171 312.06322
## Rock Hard Times          108.87503 312.37864
## Restraining              103.29906 185.59771
## Lone Wolf                109.53291  66.13469
## Wrong About Bobby        108.35559  98.83404
## Love Me Do               109.95808 126.50757
## I Want to Hold Your Hand 111.81813 294.67263
## Cant Buy Me Love         112.71792 100.20089
## I Feel Fine              110.66333 110.39972
## Ticket to Ride           111.59957 107.30853
## Help                     110.34618 137.10594
## Yesterday                107.30541 173.50631
## Yellow Submarine         109.47442  91.49508
## Eleanor Rigby            108.65894 164.92667
## Penny Lane               111.18901  92.46240
## B1                       101.74095 354.12025
## B2                       103.26160 316.03761
## B3                       101.20132 397.18666
## B4                       106.18220 529.26679
## B5                        94.59029 445.00551
## B6                        93.38874 331.68283
## B7                        97.86394 181.01349
## B8                       106.39731 249.08280
## The Memory of Trees      102.16132 155.06430
## Anywhere Is              105.75748  79.31957
## Pax Deorum               101.86845  49.01748
## Waterloo                 107.73299 146.04306
## V11                      105.81750  58.83780
## V12                      102.92123 175.94562
## V13                      102.11314  61.44533
## Hey Jude                  83.88195 219.53773
```

---
# Plot Di's music

---
# Plot Di's Music

Abba is just different from everyone else!

---
# Question time:

-   "How does `data` appear different than `statistics` in the time series?"
-   "What format is the data in an audio file?"
-   "How is Abba different from the other music clips?",

---
# Why look at audio data?

- Data comes in many shapes and sizes
- Audio data can be transformed ("rectangled") into a data.frame
- Another type of data is data on the web.
- Extracting data from websites is called "web scraping".

---
# Scraping the web: what? why?

- Increasing amount of data is available on the web.
- These data are provided in an unstructured format: you can always copy&paste, but it's 
time-consuming and prone to errors.
- Web scraping is the process of extracting this information automatically and transform it into 
a structured dataset.

---
# Scraping the web: what? why?

1. Screen scraping: extract data from source code of website, with html parser (easy) or regular expression matching (less easy).
2. Web APIs (application programming interface): website offers a set of structured http  requests that return JSON or XML files.
- Why R? It includes all tools necessary to do web scraping, familiarity, direct analysis of data... But python, perl, java are also efficient tools.

---
class: transition
# Web Scraping with `rvest` and `polite`

---
# Hypertext Markup Language

Most of the data on the web is still largely available as HTML - while it is structured (hierarchical / tree based) it often is not available in a form useful for analysis (flat / tidy).