We have pulled some data on platypuses from the Atlas of Living Australia using the “ALA4R” package, and done some subsetting of the data to only look at a few of the key variables.

Let’s read in data with read_csv:

library(readr)
platypus <- read_csv(here::here("data/platypus.csv"))
## Parsed with column specification:
## cols(
##   id = col_character(),
##   commonName = col_character(),
##   scientificName = col_character(),
##   state = col_character(),
##   latitude = col_double(),
##   longitude = col_double(),
##   eventDate = col_date(format = ""),
##   sex = col_character()
## )
# platypus_vic <- read_csv(here::here("exercises/1b/data/platypus-vic.csv"))

This code can be read as: “read in this .csv file from the data folder”.

(We’ll talk more about what here::here means in an upcoming lecture.)

There are observations of platypus that we have seen.

Task: - Add a section header called “About the data” to your document. Write a paragraph about the “Atlas of Living Australia” (use your internet search skills!). - You could even try adding a picture of a platypus into your report. You can do this using markdown syntax or include_graphics() - try looking online for the rstudio markdown cheatsheet, or typing ?include_graphics() into the console.

About the data

Adding plots

Let’s make some plots. Two of the variables in the data set are the latitude and longitude indicating where the animal was spotted. This is going to be the first plot, made using the ggplot2 package from tidyverse suite.

library(tidyverse)
## ── Attaching packages ─────────────────────────────────────────────────── tidyverse 1.3.0 ──
## ✓ ggplot2 3.2.1          ✓ dplyr   0.8.4     
## ✓ tibble  2.1.3          ✓ stringr 1.4.0     
## ✓ tidyr   1.0.2          ✓ forcats 0.5.0     
## ✓ purrr   0.3.3.9000
## ── Conflicts ────────────────────────────────────────────────────── tidyverse_conflicts() ──
## x dplyr::filter() masks stats::filter()
## x dplyr::lag()    masks stats::lag()
ggplot(data = platypus,
       aes(x = longitude, 
           y = latitude)) + 
  geom_point()

If you are good at recognising the shape of Australia, you might realise that the sightings match Australia!

But, we can make it look a bit more like Australia by making a map projection by adding coord_map():

ggplot(platypus,
       aes(x = longitude, 
           y = latitude), 
       alpha = 0.1) + 
  geom_point() +
  coord_map()

This changes the dimensions of the plot to more closely match the longitude and latitude lines.

We can add a map underneath using the powerful leaflet package:

library(leaflet)
leaflet(platypus) %>% 
  addTiles() %>%
  addCircleMarkers(clusterOptions = markerClusterOptions())
## Assuming "longitude" and "latitude" are longitude and latitude, respectively

Are there platypus sightings near where hou live?

Write a couple of paragraphs about the locations of platypus in Australia, based on the map that you have created.

subsetting the data

Let’s subset the data to only look at platypus sightings from Victoria

platypus_vic <- platypus %>% 
  filter(state == "Victoria")

Temporal trend

The date of the sighting is another variable in the data set.

Let’s count how many distinct dates there are using n_distinct()

n_distinct(platypus_vic$eventDate)
## [1] 1415

We can even perform summary on the year data to see what the range of values is.

summary(platypus_vic$eventDate)
##         Min.      1st Qu.       Median         Mean      3rd Qu.         Max. 
## "1839-04-01" "1988-02-25" "1996-09-06" "1993-07-21" "2003-02-25" "2020-02-10" 
##         NA's 
##        "125"

We can plot of the sightings over time. The variable is called year. It is considered to be a number variable by R, so the first step is to get R to recognise that it is a date time object.

The code below creates a new column called date, that is a tidies up version of eventDate:

library(lubridate)
## 
## Attaching package: 'lubridate'
## The following object is masked from 'package:base':
## 
##     date
platypus_vic_tidy <- platypus_vic %>% 
  mutate(date = ymd(eventDate),
         year = year(date),
         month = month(date),
         day = day(date))

We can also explore occurences over time:

summary(platypus_vic_tidy$date)
##         Min.      1st Qu.       Median         Mean      3rd Qu.         Max. 
## "1839-04-01" "1988-02-25" "1996-09-06" "1993-07-21" "2003-02-25" "2020-02-10" 
##         NA's 
##        "125"
summary(platypus_vic_tidy$month)
##    Min. 1st Qu.  Median    Mean 3rd Qu.    Max.    NA's 
##   1.000   2.000   5.000   5.592   9.000  12.000     125
ggplot(platypus_vic_tidy,
       aes(x = date)) + 
  geom_histogram()
## `stat_bin()` using `bins = 30`. Pick better value with `binwidth`.
## Warning: Removed 125 rows containing non-finite values (stat_bin).

There are some records dating back before 1850!

# These just have the lat/long wrong
platypus_vic_tidy %>% filter(eventDate < ymd("1850-01-01")) 

Let’s focus on records since 1900, and count the number for each year.

platypus_vic_1900 <- platypus_vic_tidy %>% 
  filter(year > 1900) %>% 
  count(year) 

ggplot(data = platypus_vic_1900) +
  geom_point(aes(x = year, 
                 y = n))

Add a trend line.

ggplot(data = platypus_vic_1900, 
       aes(x = year, 
           y = n)) +
  geom_point() +
  geom_smooth(se = F)
## `geom_smooth()` using method = 'loess' and formula 'y ~ x'

Make it interactive with the magic function ggplotly() so that we can investigate some observations:

library(plotly)
## 
## Attaching package: 'plotly'
## The following object is masked from 'package:ggplot2':
## 
##     last_plot
## The following object is masked from 'package:stats':
## 
##     filter
## The following object is masked from 'package:graphics':
## 
##     layout
ggplotly()
## `geom_smooth()` using method = 'loess' and formula 'y ~ x'

Discussion question: Was there a population explosion in 1980 and 2004? Is the population of platypus been increasing since 1900, and descreasing in the last decade?

Add a new section to your report, titled “Temporal patterns”. Write a couple of paragraphs on what you have learned about the data over time.

Appendix

Below contains the code used to download the data

# load the ALA4R package, which contains the platypus data
library(ALA4R)
# Take a look at what the package does using the code `help(package="ALA4R")`

# Look up the scientific name for platypus using:
specieslist("platypus")

# This returns a lot of different organisms with "platypus" in the name, but you should be able to find one line with the relevant information, that its scientific name is "Ornithorhynchus anatinus". 

platypus <- occurrences("Ornithorhynchus anatinus", download_reason_id=10)

# 518426.7 is NOWHERE near Australia. Let's filter it out
platypus <- platypus$data %>% filter(longitude < 518426)

# subset the data using dplyr filter and select commands
platypus_vic <- platypus %>% 
  filter(state == "Victoria") %>% 
  select(id,
         commonName,
         scientificName,
         state,
         latitude,
         longitude,
         year:day,
         sex)


write_csv(platypus_vic, "data/platypus-vic.csv")

# You can learn more about `read_csv` and `write_csv` by typing `?write_csv`.