background-color: #006DAE class: middle center hide-slide-number <div class="shade_black" style="width:60%;right:0;bottom:0;padding:10px;border: dashed 4px white;margin: auto;"> <i class="fas fa-exclamation-circle"></i> These slides are viewed best by Chrome and occasionally need to be refreshed if elements did not load properly. See <a href=/>here for PDF <i class="fas fa-file-pdf"></i></a>. </div> <br> .white[Press the **right arrow** to progress to the next slide!] --- background-image: url(images/bg1.jpg) background-size: cover class: hide-slide-number split-70 title-slide count: false .column.shade_black[.content[ <br> # .monash-blue.outline-text[ETC1010: Introduction to Data Analysis] <h2 class="monash-blue2 outline-text" style="font-size: 30pt!important;">Week 1, part B</h2> <br> <h2 style="font-weight:900!important;">Week of introduction</h2> .bottom_abs.width100[ Lecturer: *Nicholas Tierney* Department of Econometrics and Business Statistics
<i class="fas fa-envelope faa-float animated "></i>
ETC1010.Clayton-x@monash.edu 11th Mar 2020 <br> ] ]] <div class="column transition monash-m-new delay-1s" style="clip-path:url(#swipe__clip-path);"> <div class="background-image" style="background-image:url('images/large.png');background-position: center;background-size:cover;margin-left:3px;"> <svg class="clip-svg absolute"> <defs> <clipPath id="swipe__clip-path" clipPathUnits="objectBoundingBox"> <polygon points="0.5745 0, 0.5 0.33, 0.42 0, 0 0, 0 1, 0.27 1, 0.27 0.59, 0.37 1, 0.634 1, 0.736 0.59, 0.736 1, 1 1, 1 0, 0.5745 0" /> </clipPath> </defs> </svg> </div> </div> --- <img src="images/rstudio-cooking-example.jpeg" width="90%" style="display: block; margin: auto;" /> .small.bottom[ From Jessica Ward (@JKRWard) of R Ladies Newcaslte (UK) - @RLadiesNCL https://twitter.com/RLadiesNCL/status/1138812826917724160 ] --- background-image: url(images/unvotes-oz-usa.png) background-size: contain background-position: 50% 50% class: refresher center, bottom --- background-image: url(images/edstem.png) background-size: contain background-position: 50% 50% class: refresher, center, bottom --- class: refresher # R essentials: A short list (for now) - Functions are (most often) verbs, followed by what they will be applied to in parentheses: ```r do_this(to_this) ``` -- - Columns (variables) in data frames are accessed with `$`: ```r dataframe$var_name ``` -- - Packages are installed with `install.packages`, and loaded with `library` , once per session: ```r install.packages("package_name") library(package_name) ``` --- # Today: Outline - Why we care about Reproducibility - R + markdown = Rmarkdown - Controling output and input of rmarkdown - Exercises on creating rmarkdown reports on the humble platypus - Form up assignment groups - Quiz - Assignment 1 released next week. --- class: transition # We are in a tight spot with reproducibility .blockquote[ Only 6 out of 53 landmark results could be reproduced -- [Amgen, 2014*](https://www.nature.com/articles/483531a) ] -- .blockquote[ An estimated 75% - 90% of preclinical results cannot be reproduced -- [Begley, 2015*](https://www.ncbi.nlm.nih.gov/pubmed/25552691) ] -- .blockquote[ Estimated **annual** cost of irreproducibility for biomedical industry = 28 Billion USD -- [Freedman, 2015*](https://journals.plos.org/plosbiology/article?id=10.1371/journal.pbio.1002165) ] .footnote[`*` Heard via Garret Grolemund's [great talk](https://www.youtube.com/watch?v=HVlwNayog-k)] --- background-image: url(gifs/njt-gif-jaws-dolly-zoom.gif) background-size: contain background-position: 50% 50% class: center, bottom, white --- background-image: url(gifs/ice-climber-fall.gif) background-size: contain background-position: 50% 50% class: center, bottom, white --- background-image: url("images/open-science-berg.jpeg") background-size: contain background-position: 50% 50% class: center, bottom, white --- class: transition # So what can we do about it? --- # Reproducibility checklist Near-term goals: - Are the tables and figures reproducible from the code and data? - Does the code actually do what you think it does? - In addition to what was done, is it clear **why** it was done? (e.g., how were parameter settings chosen?) -- Long-term goals: - Can the code be used for other data? - Can you extend the code to do other things? --- # Literate programming is a partial solution - Literate programming shines some light on this dark area of science. - An idea from [Donald Knuth](https://en.wikipedia.org/wiki/Donald_Knuth) where you combine your text with your code output to create a document. - A _blend_ of your literature (**text**), and your programming (**code**), to create something you can read from top to bottom. --- # So -- ## Imagine a report: Introduction, methods, results, discussion, and conclusion, -- All the bits of code that make each section. -- With rmarkdown, you can see all the pieces of your data analysis all together. -- Each time you knit the analysis is ran from the beginning --- # Markdown as a new player to legibility -- In 2004, [John Gruber](https://en.wikipedia.org/wiki/John_Gruber), of [daring fireball](https://daringfireball.net/) created [markdown](https://en.wikipedia.org/wiki/Markdown), a simple way to create text that rendered into a HTML webpage. --- # Markdown as a new player to legibility .pull-left[ ``` - bullet list - bullet list - bullet list ``` ] -- .pull-right[ - bullet list - bullet list - bullet list ] --- # Markdown as a new player to legibility .pull-left[ ``` 1. numbered list 2. numbered list 3. numbered list __bold__, **bold**, _italic_, *italic* > quote of something profound ``` ] -- .pull-right[ 1. numbered list 2. numbered list 3. numbered list __bold__, **bold**, _italic_, *italic* > quote of something profound ] --- # Markdown as a new player to legibility With very little marking up, we can create rich text, that **actually resembles** the text that we want to see. -- **Learn to use markdown** Spend five minutes working through [markdowntutorial.com](https://www.markdowntutorial.com/)
05
:
00
-- ## This the end of part 1 of lecture 1b --- class: transition # Start of part 2 of lecture 1b --- # Rmarkdown helps address the reproducibility problem * Q: How do we take `markdown` + `R code` = "literate programming environment" * A: `Rmarkdown` --- # Rmarkdown... -- Provides an environment where you can write your complete analysis, and marries your text, and code together into a rich document. -- You write your code as code chunks, put your text around that, and then hey presto, you have a document you can reproduce. --- # Reminder: You've already used rmarkdown! <img src="images/unvotes-oz-usa.png" width="90%" style="display: block; margin: auto;" /> --- # How will we use R Markdown? - Every assignment + project / is an R Markdown document - You'll always have a template R Markdown document to start with - The amount of scaffolding in the template will decrease over the semester - These lecture notes are created using R Markdown (!) --- # The anatomy of an rmarkdown document There are three parts to an rmarkdown document. - Metadata (YAML) - Text (markdown formatting) - Code (code formatting) -- .vhuge.center[DEMO] --- # Metadata: YAML (YAML Ain't Markup Language) - The metadata of the document tells you how it is formed - what the **title** is, what **date** to put, and other control information. - If you're familiar with `\(\LaTeX\)`, this is similar to how you specify document type, styles, fonts, options, etc in the front matter / preamble. --- # Metadata: YAML - Rmarkdown documents use YAML to provide the metadata. It looks like this: ```YAML --- title: "An example document" author: "Nicholas Tierney" output: html_document --- ``` It starts an ends with three dashes `---`, and has fields like the following: `title`, `author`, and `output`. --- # Text Is markdown, as we discussed in the earlier section, It provides a simple way to mark up text .pull-left[ ``` 1. bullet list 2. bullet list 3. bullet list ``` ] .pull-right[ 1. bullet list 1. bullet list 1. bullet list ] --- # Code We refer to code in an rmarkdown document in two ways: 1. Code chunks, and 2. Inline code. --- # Code: Code chunks `Code chunks` are marked by three backticks and curly braces with `r` inside them: ````markdown ```{r chunk-name} # a code chunk ``` ```` --- # Code: backtick? **A backtick** is a special character you might not have seen before, it is typically located under the tilde key (`~`). On USA / Australia keyboards, is under the escape key: <img src="images/ansi-keyboard.png" width="80%" style="display: block; margin: auto;" /> .small[ image from https://commons.wikimedia.org/wiki/File:ANSI_Keyboard_Layout_Diagram_with_Form_Factor.svg ] --- # Code: Inline code Sometimes you want to run the code inside a sentence. This is called running the code "inline". -- You might want to run the code inline to name the number of variables or rows in a dataset in a sentence like: > There are XXX observations in the airquality dataset, and XXX variables. --- # Code: Inline code You can call code "inline" like so: ````markdown There are `r nrow(airquality) ` observations in the airquality dataset, and `r ncol(airquality) ` variables. ```` Which gives you the following sentence -- > There are 153 observations in the airquality dataset, and 6 variables. --- # Code: Inline code - If your data changes upstream -- - You don't need to work out where you mentioned your data -- - You just update the document. 🎉 --- # Your Turn: Put it together Go to `rstudio.cloud` and go to "ida-exercise-1b" * open the document "01-oz-atlas.Rmd" * knit the document * Change the data section at the top to be from a different state instead of "New South Wales" * knit the document again * How do the text and figures in the document change?
05
:
00
--- class: transition center bottom background-image: url(https://imgs.xkcd.com/comics/art_project.png) background-size: contain background-position: 50% 50% --- class: transition # End of part 2 of Lecture 1B Make sure you finish the exercise on the rstudio.cloud --- class: transition # Start of part 3 of Lecture 1B --- # Code: Chunk names Straight after the ` ```{r ` you can use a text string to name the chunk: ````markdown ```{r read-crime-data} crime <- read_csv("data/crime-data.csv") ``` ```` --- # Code: Chunk names Naming code chunks has three advantages: 1. Navigate to specific chunks using the drop-down code navigator in the bottom-left of the script editor. 2. Graphics produced by chunks now have useful names. 3. You can set up networks of cached chunks to avoid re-performing expensive computations on every run. --- # Code: Chunk names Every chunk should ideally have a name. Naming things is hard, but follow these rules and you'll be fine: 1. One word that describes the action (e.g., "read") 2. One word that describes the thing inside the code (e.g, "gapminder") 3. Separate words with "-" (e.g., `read-gapminder`) --- # Code: Chunk options You can control how the code is output by changing the code chunk options which follow the title. ````markdown ```{r read-gapminder, eval = FALSE, echo = TRUE} gap <- read_csv("gapminder.csv") ``` ```` What do you think this does?
00
:
30
--- # Code: Chunk options The code chunk options you need to know about right now are: * `cache`: TRUE / FALSE. Do you want to save the output of the chunk so it doesn't have to run next time? * `eval`: TRUE / FALSE Do you want to evaluate the code? * `echo`: TRUE / FALSE Do you want to print the code? * `include`: TRUE / FALSE Do you want to include code output in the final output document? Setting to `FALSE` means nothing is put into the output document, but the code is still run. You can read more about the options at the official documentation: https://yihui.name/knitr/options/#code-evaluation --- # Your turn * go to `rstudio.cloud`, open document `01-oz-atlas.Rmd` and change the document so that the code output is hidden, but the graphics are shown. (Hint: Google "rstudio rmarkdown cheatsheet" for some tips!) * Re-Knit the document. * Take a look at the [R Markdown Gallery](https://rmarkdown.rstudio.com/gallery.html).
05
:
00
--- class: transition # End of Part 3 of Lecture 1B --- class: transition # Start of Part 4 of Lecture 1B --- # Global options: Set and forget You can set the default chunk behaviour once at the top of the `.Rmd` file using a chunk like: ```r knitr::opts_chunk$set( echo = FALSE, cache = TRUE ) ``` then you will only need to add chunk options when you have the occasional one that you'd like to behave differently. --- # Your turn * Go to your `01-oz-atlas.Rmd` document on `rstudio.cloud` and change the global settings at the top of the rmarkdown document to `echo = FALSE`, and `cache = TRUE` ```r knitr::opts_chunk$set( echo = FALSE, cache = TRUE ) ``` - Update the other code chunks by removing the code chunk options.
05
:
00
--- class: transition # End of part 4 of lecture 1b --- class: transition # Start of part 5 of lecture 1b --- class: transition middle # DEMO The many different outputs of rmarkdown --- # Your turn: Different types of documents 1. Change the output of your current R Markdown file to produce a **Word document**. Now try to produce pdf - this may not work! That's OK, we do'nt need it right now. 2. Create a new document that will produce a slide show `File > New R Markdown > Presentation` 3. Create a flexdashboard document - see this option in the `File > New R Markdown > From template` list. --- # Recap: - There is a Reproducibility Crisis - rmarkdown = YAML + text + code - rmarkdown has many different output types - Platypus are interesting! - Assignment will be announced next week --- # Learning more: - [R Markdown cheat sheet](https://github.com/rstudio/cheatsheets/raw/master/rmarkdown-2.0.pdf) and Markdown Quick Reference (Help -> Markdown Quick Reference) handy, we'll refer to it often as the course progresses --- class: transition # Your Turn - Go to rstudio.cloud to `oz-atlas-final.Rmd` - Follow prompts and questions to learn more about the Australian native platypus - (copy content over from previous doc) - Take the Ed lab quiz for today. --- background-image: url(images/bg1.jpg) background-size: cover class: hide-slide-number split-70 count: false .column.shade_black[.content[ <br><br> # That's it! .bottom_abs.width100[ Lecturer: Nicholas Tierney Department of Econometrics and Business Statistics<br>
<i class="fas fa-envelope faa-float animated "></i>
ETC1010.Clayton-x@monash.edu 11th Mar 2020 ] <br /> This work is licensed under a <a rel="license" href="http://creativecommons.org/licenses/by-nc-sa/4.0/">Creative Commons Attribution-NonCommercial-ShareAlike 4.0 International License</a>. <a rel="license" href="http://creativecommons.org/licenses/by-nc-sa/4.0/"><img alt="Creative Commons License" style="border-width:0" src="https://i.creativecommons.org/l/by-nc-sa/4.0/88x31.png" /></a> ]] <div class="column transition monash-m-new delay-1s" style="clip-path:url(#swipe__clip-path);"> <div class="background-image" style="background-image:url('images/large.png');background-position: center;background-size:cover;margin-left:3px;"> <svg class="clip-svg absolute"> <defs> <clipPath id="swipe__clip-path" clipPathUnits="objectBoundingBox"> <polygon points="0.5745 0, 0.5 0.33, 0.42 0, 0 0, 0 1, 0.27 1, 0.27 0.59, 0.37 1, 0.634 1, 0.736 0.59, 0.736 1, 1 1, 1 0, 0.5745 0" /> </clipPath> </defs> </svg> </div> </div> ??? *Note on ethics:* When you use someone else's work, you need to (1) check if it's allowed, that it has a [creative commons license](http://creativecommons.org/licenses/by/4.0/), (2) reference them as the source. - If you have a big document, build it up in pieces. You can run just one code chunk at a time, or the past several, or even one line of code. The "Run" button has a menu of options of doing the coding in pieces. - The workspace of your R Markdown document is separate from the Console - rmarkdown runs code from the start to finish in a new environment ??? * Reproducibility: Why we care * Rmarkdown * YAML * Code * text * markdown (online quiz) * rmarkdown - edit the existing one on platypus! * code chunks * code chunk names * chunk options * exercise on this * setting different chunk options globally * exercise extending the platypus report * make assignment groups * release assignment * quiz Should be able to answer the questions: How should I start an rmarkdown document? What do I put in the YAML metadata? How do I create a code chunk? What sort of options to I need to worry about for my code? What is the value in a reproducible report? What is markdown? Can I combine my software and my writing?