ENV221

Author

Peng Zhao

1 Overview

1.1 Learning objectives

In this lecture, you will

  1. know the general information of the module,
  2. get familiar with the learning platforms,
  3. build groups,
  4. know what R can do, and
  5. set up the R environment.

1.2 Module overview

1.2.1 Features

  • NOT a course that only teaches you the concepts but don’t tell you an easy way to apply them.
  • NOT a course that only covers the details of R language and introduces clever coding techniques.
  • It is an intersection of statistics and R.

1.2.2 Foolish Assumptions

  • English
    • Computer (e.g. operating system)
    • Maths (e.g. Euler’s number)
  • Computer knowledge
    • File path
    • File name: base name + extension

1.2.3 Participants

  • Module leader
    • Dr. Peng Zhao
    • Websites: Search engine, or Connect (contact me if you would like to do SURF or FYP with me).
    • Contact: (preferred), Tel. 81889185 (for emergency), WeChat
    • Q: what do you call me for beginning an email?
Click to see the answer

Proper: Dear/Hello/Hi Dr. Zhao, Mr. Zhao, Peng

Improper: Peng Zhao, Teacher, Teacher Zhao

  • TAs

    • Mr. Bokun Sun replies student’s emails and takes care of the online forum.
    • Mr. Minhao Wang takes care of the online quizzes
    • Mr. Shanxing Gong and Mr. Baifeng Zhu take care of the onsite quizzes and help the students with the exercises.
  • Students

    • Environmental Sciences: 20
    • Bioinformatics: 37
    • including off-campus students
    • One student representative in ENV, and one in BIO (Send an email to me with the WeChat IDs by Thursday).

1.2.4 Learning platforms

1.2.5 Teaching schedule and syllabus

See the module handbook.

  • Tuesday: Lectures
  • Wednesday: office hours (only by appointment via emails)
  • Friday: Exercises and quizzes

1.2.6 Assessments

  • Quizzes (45%, 10:00 - 10:15 Every Friday in Week 3-14 except 12; Mock quiz in Week 2)
  • Oral presentation (15%, Week 12): Do you agree with group work? The student representative must send me an email for the confirmation.
  • Final Exam (40%, Exam Week)

Important:

  • For onsite assessments (quizzes + exam), you must sign the attendance table.
  • For online assessments (quizzes + exam), you must be invigilated with one or two cameras (recorded). See “Guide for Online Exam.pdf”.
  • The students who appears neither on the attendance table nor in the recorded invigilations will get 0 marks, even if they submit the task sheets.

1.2.7 Previous experiences

Possible reasons for the failures:

  1. R has a steep learning curve. Difficult for beginners.

Figure 1: Learning curves

  1. Lack of engagement in class: Reading materials were not completed; Absent from class.
  2. Missing (deadlines for) submissions.

Question: How can we improve this module?

MQ: Positive feedback from the previous year.

MQ: Suggestions from the previous year.

Actions:

  1. No tight weeks.
  2. Improved html presentation (No PPT/slides).
  3. More learning on R.
  4. More activities, e.g. exercises, Q&A on the forum, office hours, appointments, etc..
  5. More quizzes, less weight in the exam.
  6. Finish your work in advance.
  7. Mid-term MQ.
  8. Group work.

1.2.8 Build Groups

  • 3 students in a group.
  • Group exercises: help each other in your group; present your results
  • Oral presentation.
  • Randomly mixed.
  • Remember your group name.
  • Sit close onsite.
Click to see the R code
student_list <- xlsx::read.xlsx('overview/List of Module.xlsx', 1, startRow = 9)
student_list <- student_list[order(student_list$Name), ]
fruits <- c('Apple','Watermelon','Orange','Pear','Cherry','Strawberry','Grape','Mango','Blueberry','Plum','Banana','Raspberry','Papaya','Kiwi','Pineapple','Lemon','Apricot','Grapefruit','Coconut','Avocado','Peach')
teams <- rep(fruits, each = 3)
n <- nrow(student_list)
teams <- teams[1:n]
student_list$Team <- sample(teams, n)
View(student_list[, c('Name', 'Email', 'Team')])
xlsx::write.xlsx(student_list[, c('Name', 'Email', 'Team')], 'team_list.xlsx', row.names = FALSE)

1.3 R overview and set-up

1.3.1 What is R

R is a programming language and free software environment for statistical computing and graphics supported by the R Foundation for Statistical Computing. The R language is widely used among statisticians and data miners for developing statistical software and data analysis. Polls, data mining surveys, and studies of scholarly literature databases show substantial increases in popularity; as of April 2021, R ranks 16th in the TIOBE index, a measure of popularity of programming languages.

— Wikipedia: R (programming language)

In short, R is a language, with which you can talk to your computer.

1.3.2 Why R

Poll: What computer software do you often use for statistics or scientific calculation? R, SPSS, Excel, python, MatLab…

Software Difficulty Type Cost Usage Support Best for
Excel Easy GUI Cheap Wide Widespread Graphs
R Difficult Code Free Increasing Strongly online Cutting edge
SPSS Medium GUI Expensive Social Sci. Manual Statistics
SAS Difficult Code Expensive Decreasing Manual Complex

R usages

R: Free, powerful, community-friendly

1.3.3 Installation

1.3.4 RStudio

Demonstration: RStudio interface.

  • Script
  • Console
  • Environment: variables and functions
  • Files, folders, Graphs, Help.

Create an R project:

  • File - New project - New directory - New project
  • Organize your R scripts and products in a project:
  • Save your related files in the project folder
  • Always start your work from the .Rproj file

Create a script (.R):

  • File - New File - R Script. Hotkey ctrl+shift+N

Use hotkeys:

  1. ctrl + enter (run the line where the cursor is)
  2. F1 (get help)
  3. shift + alt + k (see all shortcuts)

1.3.5 Statistics/calculation with R

  • Download the demo dataset iris.csv from the downloading area.
  • Save it in your R project.

# Import data. This line is a comment.
iris <- read.csv("iris.csv")

# Simple Calculation
mean(iris$Petal.Length)

# Summarized data
summary(iris)

# ANOVA
myaov = aov(Sepal.Length ~ Species,
            data = iris)
summary(myaov)

# Insert a column
iris$Sepal.Length.m <- iris$Sepal.Length / 100

# Export data
write.csv(iris, "iris2.csv")

# Assignment
x <- 3
smr <- summary(iris)

1.3.6 Graphing with R

# Basic
plot(x = iris$Petal.Width, 
     y = iris$Petal.Length)
boxplot(iris$Sepal.Length)
demo(graphics)
demo(persp)
demo(image)
demo()

# ggplot2
## Install a package
install.packages("ggplot2")
# Load a package
library("ggplot2") # require("ggplot2")
# Use functions in the package
ggplot(iris, aes(Petal.Length, Petal.Width))+ 
  geom_point()
example(qplot)

# GGally
install.packages("GGally")
library("GGally")
ggpairs(iris[, 2:6])
ggpairs(iris[, 2:6], 
        aes(color = Species, alpha = 0.1))
example(ggpairs)

# plotly
library(plotly)
demo('crosstalk-highlight-binned-target-a')
demo(package = 'plotly')

# rgl
library(rgl)
demo(flag)
demo(package = 'rgl')

# MSG (Modern Statistical Graphs)
install.packages("MSG")
library(MSG)
demo(basketball)
demo(pointArts)

1.3.7 Modeling with R

# A linear model
mylm <- lm(iris$Petal.Width ~ iris$Petal.Length)
mylm
summary(mylm)

# Visualization
plot(x = iris$Petal.Length, 
     y = iris$Petal.Width)
abline(mylm)

ggplot(iris, aes(Petal.Length,Petal.Width))+ 
  geom_point() + 
  geom_smooth(method = "lm")

# basic
demo(nlm)
demo(lm.glm)
demo(smooth)

library(MSG)
demo(gradArrows1) # Gradient descent method

1.3.8 Communication with R: R Markdown

You type You get
*italic* italic
**bold** bold
PM~2.5~ subscript \(\text{PM}_{2.5}\)
R^2^ superscript \(\text{R}^2\)
$E = mc^2$ equations: \(E = mc^2\), \(G = \sqrt[n]{\prod_{i = 1}^{n} x_i}\)
[link](http://mpic.de) hyperlink link
![](image) insert an image
# Chapter 1 headings
1. list... numbered list
- list... unnumbered list

1.3.9 R Commander (Rcmdr)

install.packages("Rcmdr")
library(Rcmdr)
  • Load data
  • Calculation
  • Graphs
  • Communication

1.4 Further readings

  • A Beginner’s Guide to R, Chapter 1
  • A (very) short introduction to R