ETC1010: Introduction to Data Analysis
Week 1
Week of introduction
Lecturer: Nicholas Tierney
Department of Econometrics and Business Statistics
ETC1010.Clayton-x@monash.edu
9th Mar 2020
Press the right arrow to progress to the next slide!
Lecturer: Nicholas Tierney
Department of Econometrics and Business Statistics
ETC1010.Clayton-x@monash.edu
9th Mar 2020
This is a course on introduction to data analysis.
This is a course on introduction to data analysis.
You can also think of it as introduction to data science.
This is a course on introduction to data analysis.
You can also think of it as introduction to data science.
Q - What data analysis background does this course assume?
A - None.
This is a course on introduction to data analysis.
You can also think of it as introduction to data science.
Q - What data analysis background does this course assume?
A - None.
Q - Is this an intro stat course?
A - Statistics ≠ data science. BUT they are closely related. This course is a great way to get started with statistics. But is not your typical high school statistics course.
This is a course on introduction to data analysis.
You can also think of it as introduction to data science.
Q - What data analysis background does this course assume?
A - None.
Q - Is this an intro stat course?
A - Statistics ≠ data science. BUT they are closely related. This course is a great way to get started with statistics. But is not your typical high school statistics course.
Q - Will we be doing computing?
A - Yes.
Q - Is this an intro Computer Science course?
A - No, but there are some shared themes.
Q - Is this an intro Computer Science course?
A - No, but there are some shared themes.
Q - What computing language will we learn?
A - R.
Q - Is this an intro Computer Science course?
A - No, but there are some shared themes.
Q - What computing language will we learn?
A - R.
Q: Why not language X?
A: We can discuss that over ☕.
Q - Is this an intro Computer Science course?
A - No, but there are some shared themes.
Q - What computing language will we learn?
A - R.
Q: Why not language X?
A: We can discuss that over ☕.
Taught as a lectorial (Lecture + Tutorial)
Q - Is this an intro Computer Science course?
A - No, but there are some shared themes.
Q - What computing language will we learn?
A - R.
Q: Why not language X?
A: We can discuss that over ☕.
Taught as a lectorial (Lecture + Tutorial)
It is not (typically) recorded because you are doing work
Q - Is this an intro Computer Science course?
A - No, but there are some shared themes.
Q - What computing language will we learn?
A - R.
Q: Why not language X?
A: We can discuss that over ☕.
Taught as a lectorial (Lecture + Tutorial)
It is not (typically) recorded because you are doing work
You have to show up to class to practice!
This course is brought to you today by the letter "R"!
Grover image sourced from https://en.wikipedia.org/wiki/Grover.
R is a language for data analysis. If R seems a bit confusing, disorganized, and perhaps incoherent at times, in some ways that's because so is data analysis.
-- Roger Peng, 12/07/2018
R Consortium conducted a survey of users 2017.
These are the locations of respondents to an R Consortium survey conducted in 2017.
8% of R users are between 18-24 BUT 45% of R users are between 25-34!
ABS, CSIRO, ATO, Microsoft, Energy Qld, Auto and General, Bank of Qld, BHP, AEMO, Google, Flight Centre, Youi, Amadeus Investment Partners, Yahoo, Sydney Trains, Tennis Australia, Rio Tinto, Reserve Bank of Australia, PwC, Oracle, Netflix, NOAA Fisheries, NAB, Menulog, Macquarie Bank, Honeywell, Geoscience Australia, DFAT, DPI, CBA, Bank of Italy, Australian Red Cross Blood Service, Amazon, Bunnings.
R is a statistical programming language
RStudio is a convenient interface for R (an integrated development environment, IDE)
R is a statistical programming language
RStudio is a convenient interface for R (an integrated development environment, IDE)
If R were an airplane, RStudio would be the airport, providing many, many supporting services that make it easier for you, the pilot, to take off and go to awesome places. Sure, you can fly an airplane without an airport, but having those runways and supporting infrastructure is a game-changer
Go to http://bit.ly/etc1010-s1-2020 to log in to RStudio cloud.
Log in with Google / GitHub / other credentials.
This section is based on an exercise from data science in a box by Mine Çetinkaya-Rundel
unvotes.Rmd
. Then click on the "Knit" button.yaml
-- we'll talk about what this means later) and knit again.do_this(to_this)do_that(to_this, to_that, with_those)
do_this(to_this)do_that(to_this, to_that, with_those)
For example:
do_this(to_this)do_that(to_this, to_that, with_those)
For example:
mean(c(1,2,1,2))## [1] 1.5
$
:dataframe$var_name
$
:dataframe$var_name
For example:
$
:dataframe$var_name
For example:
starwars$name## [1] "Luke Skywalker" "C-3PO" "R2-D2" ## [4] "Darth Vader" "Leia Organa" "Owen Lars" ## [7] "Beru Whitesun lars" "R5-D4" "Biggs Darklighter" ## [10] "Obi-Wan Kenobi" "Anakin Skywalker" "Wilhuff Tarkin" ## [13] "Chewbacca" "Han Solo" "Greedo" ## [16] "Jabba Desilijic Tiure" "Wedge Antilles" "Jek Tono Porkins" ## [19] "Yoda" "Palpatine" "Boba Fett" ## [22] "IG-88" "Bossk" "Lando Calrissian" ## [25] "Lobot" "Ackbar" "Mon Mothma" ## [28] "Arvel Crynyd" "Wicket Systri Warrick" "Nien Nunb" ## [31] "Qui-Gon Jinn" "Nute Gunray" "Finis Valorum" ## [34] "Jar Jar Binks" "Roos Tarpals" "Rugor Nass" ## [37] "Ric Olié" "Watto" "Sebulba" ## [40] "Quarsh Panaka" "Shmi Skywalker" "Darth Maul" ## [43] "Bib Fortuna" "Ayla Secura" "Dud Bolt" ## [46] "Gasgano" "Ben Quadinaros" "Mace Windu" ## [49] "Ki-Adi-Mundi" "Kit Fisto" "Eeth Koth" ## [52] "Adi Gallia" "Saesee Tiin" "Yarael Poof" ## [55] "Plo Koon" "Mas Amedda" "Gregar Typho" ## [58] "Cordé" "Cliegg Lars" "Poggle the Lesser" ## [61] "Luminara Unduli" "Barriss Offee" "Dormé" ## [64] "Dooku" "Bail Prestor Organa" "Jango Fett" ## [67] "Zam Wesell" "Dexter Jettster" "Lama Su" ## [70] "Taun We" "Jocasta Nu" "Ratts Tyerell" ## [73] "R4-P17" "Wat Tambor" "San Hill" ## [76] "Shaak Ti" "Grievous" "Tarfful" ## [79] "Raymus Antilles" "Sly Moore" "Tion Medon" ## [82] "Finn" "Rey" "Poe Dameron" ## [85] "BB8" "Captain Phasma" "Padmé Amidala"
install.packages
function and loaded with the library
function, once per session:install.packages("package_name")library(package_name)
Some of our best final projects:
Data preparation accounts for about 80% of the work of data scientists
Data preparation accounts for about 80% of the work of data scientists
The learning goals associated with this unit are to:
If you feed a person a fish, they eat for a day. If you teach a person to fish, they eat for a lifetime.
Whatever I do in the data analysis that is shown to you during the class, you can do it, too.
We will start out using the rstudio cloud server.
In the future we will have R and Rstudio installed locally.
This course is also set up as a "MoVE unit", which means you can borrow a laptop from the university for class hours.
It is also possible to set up R and RStudio onto a USB stick to use with your borrowed laptop.
Assessment | Weight | Task |
---|---|---|
Reading Quiz | 5% | Complete prior to each class, for the first 8 weeks on ED. Quiz needs to be completed by class time. No mulligans. One can be missed without penalty. |
Lab Exercise | 5% | Each class period will have a quiz to be completed individually. Two can be missed without penalty. |
There is time at the end of class to complete lab exercise on ED:
Assessment | Weight | Task |
---|---|---|
Assignment | 12% | Teamwork, data analysis challenge, due in weeks 4, and 8 |
Mid-Sem Theory + Concept exam | 8% | Due week 6 |
Data Analysis Exam | 10% | Due week 11 |
Project | 10% | Due week 11 |
Final Exam | 50% | TBA |
(DEMO)
First search existing discussion for answers. If the question has already been answered, you're done! If it has already been asked but you're not satisfied with the answer, add to the thread.
Give your question context from course concepts not course assignments.
Error: could not find function "ggplot"
"Do the reading prior to each class period.
Participate actively in this class.
Do the reading prior to each class period.
Participate actively in this class.
Ask questions on the ed.
Come to consultation if you have questions.
Practice the materials taught in each lectorial by doing more exercises from the textbook.
Come to consultation if you have questions.
Practice the materials taught in each lectorial by doing more exercises from the textbook.
Be curious, be positive, be engaged.
All information is on the website 😄
All information is on the website 😄
Post questions on ED instead of questions over email
Intent: Students from all diverse backgrounds and perspectives be well-served by this course, that students' learning needs be addressed both in and out of class, and that the diversity that the students bring to this class be viewed as a resource, strength and benefit.
It is my intent to present materials and activities that are respectful of diversity: gender identity, sexuality, disability, age, socioeconomic status, ethnicity, race, nationality, religion, and culture. Let me know ways to improve the effectiveness of the course for you personally, or for other students or student groups.
If you have a name and/or set of pronouns that differ from those that appear in your official Monash records, please let me know!
If you feel like your performance in the class is being impacted by your experiences outside of class, please don't hesitate to come and talk with me. I want to be a resource for you. If you prefer to speak with someone outside of the course, talk to Di Cook, or look at the services available to you in the Monash student support services.
What we expect:
Conducted according to the Monash policies.
Each member of the group completes the entire assignment, as best they can.
Each student will be randomly assigned another team's submission to provide feedback on three things:
Conflicts can arise in group work.
They can be both productive and destructive.
Conflicts can arise in group work.
They can be both productive and destructive.
Teams need to work on managing conflicts and building on the strengths of all team members.
For each assignment, you will be given the option to comment on the efforts of your other group members.
If a team member has not contributed to an assignment submission, they might score a 0.
For each assignment, you will be given the option to comment on the efforts of your other group members.
If a team member has not contributed to an assignment submission, they might score a 0.
In this situation the team will need to discuss team function and dysfunction with the instructor.
Assignment 1 will be announced at class on Monday Week 2
Check your knowledge and comprehension by taking your first lab quiz on Ed
Go to the ED page, and complete the lab quiz before next Monday, 16th March.
Lecturer: Nicholas Tierney
Department of Econometrics and Business Statistics
ETC1010.Clayton-x@monash.edu
9th Mar 2020
This work is licensed under a Creative Commons Attribution-NonCommercial-ShareAlike 4.0 International License.
RStudio started a trend by writing really concise summaries, and others have added to the collection. You can find the RStudio collection in the "Help" menu on the IDE, and at https://www.rstudio.com/resources/cheatsheets/.
Start with the RStudio IDE cheatsheet.
Lecturer: Nicholas Tierney
Department of Econometrics and Business Statistics
ETC1010.Clayton-x@monash.edu
9th Mar 2020
Keyboard shortcuts
↑, ←, Pg Up, k | Go to previous slide |
↓, →, Pg Dn, Space, j | Go to next slide |
Home | Go to first slide |
End | Go to last slide |
Number + Return | Go to specific slide |
b / m / f | Toggle blackout / mirrored / fullscreen mode |
c | Clone slideshow |
p | Toggle presenter mode |
t | Restart the presentation timer |
?, h | Toggle this help |
Esc | Back to slideshow |