ENV221 L02

Author

Peng Zhao

1 Overview of the module and R

2 R Basic Operations

2.1 Learning objectives

In this lecture, you will

  1. do basic maths calculations,
  2. know the concepts of objects and functions, vectors and data frames,
  3. learn how to import data and export data, and
  4. apply functions to data frames.

2.2 Simple Maths

1 + 2 - 4 * 5 / 6 ^ 2
1 + (2 - 4 * (5 / 6)) ^ 2
2 ^ 0.5
7 %/% 3
7 %% 3
?`%/%`

2.3 Vector

2.3.1 Definition

A vector is a sequence of data elements of the same basic type. It is the simplest type of data structure in R.

— http://www.r-tutor.com/r-introduction/vector

c(1, 3, 5, 7)

2.3.2 Calculation of a vector and a number

c(1, 3, 5, 7) + 10
c(1, 3, 5, 7) ^ 2

2.3.3 Calculation of multiple vectors

c(1, 2, 3, 4) + c(10, 20, 30, 40)

2.3.4 Sequence

c(1, 2, 3, 4, 5, 6, 7, 8, 9) ^ 2
(1:9) ^ 2 
9:1
1 / (9:1)

2.4 Function

(1:9) ^ 0.5
sqrt(1:9)
exp(1:9) 
log(1:9)
log10(1:9)
round(3.14)

2.5 Object

# assignment
x = 1:9
x <- 1:9  # alt+_
y <- x <- 1:9
1:9 -> x
assign('x', 1:9)

x
x + 10
x ^ 2

# indexing
x[7]
x[c(1,3,8)]
x[1:6]

# multiple vectors
y <- 11:19
y
x + y
c(x, y)

Use functions on objects:

sqrt(x)
exp(x)
log(x)
log10(x)
mean(x) 
min(x)
max(x)
range(x)
sd(x)

Question: How can I memorize the function names?

Click to see the solution.
  • RStudio auto-completeness.
  • RStudio TAB.
  • F1.

Action: Missing values

Missing values, which are presented as NA (Not Available), are often found in real life. Suppose you measure the air temperature at five locations in the campus. The temperatures are:

x <- c(29, 28, 28, NA, 30)

The fourth value is missing due to some accident. What is the mean temperature? Use the mean() function.

2.6 Data frame

2.6.1 Manual input

df_manual <- data.frame(age = c(21, 31, 23, 40, 36),
                        gender = c('f', 'm', 'm', 'f', 'f'))
df_manual

2.6.2 Indexing

Use integers:

df_manual[3, 2]
df_manual[2, 1:2]
df_manual[2, ]
df_manual[, 2]

Use column names (recommended):

df_manual$age
df_manual$age[3]
df_manual[, 'age']
df_manual[3, 'age']

Use logical values:

df_manual$age[c(FALSE, TRUE, TRUE, FALSE, FALSE)]

Calculation with a column in a data frame as a vector:

df_manual$age - 2
2022 - df_manual$age
mean(df_manual)
sd(df_manual)

2.6.3 Functions for data frames

summary(df_manual)
names(df_manual)
str(df_manual)
nrow(df_manual)
ncol(df_manual)
dim(df_manual)

2.6.4 Import a data frame

Normally, we edit the data in another program, save it as a .csv file, and import it into R.

Demonstration:

  • Open the ‘airquality.csv’ file with Excel. Take a look. Close it.
  • Open it with Notepad. Take a look. Close it.
  • Import it into R.
df_aq <- read.csv('data/airquality.csv')
summary(df_aq)

If you don’t like codes, click File - Import Dataset - From Text (base), and choose airquality.csv. You will get an object with the same name as the file name airquality.

Indexing:

df_aq[2, 4]
df_aq[2, 1:5]
df_aq[2, ]
df_aq[, 3]

Use column names:

df_aq$Wind
df_aq[, 'Wind']

Each column is a vector. Use the vector operation mentioned before.

Calculation:

# Convert Wind from mph (mile per hour) to m/s (meter per second)
df_aq$Wind * 1.609344 * 1000 / 3600
wind_ms <- df_aq$Wind * 1.609344 * 1000 / 3600
df_aq$wind_ms <- df_aq$Wind * 1.609344 * 1000 / 3600

2.6.5 More functions for data frames

colSums(df_aq)
rowSums()
colMeans()
rowMeans()
apply(df_aq, 1, mean) # the mean of each row (meaningless in this example)
apply(df_aq, 2, max) # the maximum of each column
tapply(df_aq$Temp, df_aq$Month, mean) # the mean of each month

2.6.6 R built-in datasets

data()
iris
names(iris)

2.6.7 Export a data frame

write.csv(df_aq, "data/airquality_new.csv")

Double check:

  1. Open “data/airquality_new.csv” with Excel. Take a look. Close it.

  2. Open “data/airquality_new.csv” with Notepad. Take a look. Close it.