1 + 2 - 4 * 5 / 6 ^ 2
1 + (2 - 4 * (5 / 6)) ^ 2
2 ^ 0.5
7 %/% 3
7 %% 3
`%/%` ?
ENV221 L02
1 Overview of the module and R
2 R Basic Operations
2.1 Learning objectives
In this lecture, you will
- do basic maths calculations,
- know the concepts of objects and functions, vectors and data frames,
- learn how to import data and export data, and
- apply functions to data frames.
2.2 Simple Maths
2.3 Vector
2.3.1 Definition
A vector is a sequence of data elements of the same basic type. It is the simplest type of data structure in R.
— http://www.r-tutor.com/r-introduction/vector
c(1, 3, 5, 7)
2.3.2 Calculation of a vector and a number
c(1, 3, 5, 7) + 10
c(1, 3, 5, 7) ^ 2
2.3.3 Calculation of multiple vectors
c(1, 2, 3, 4) + c(10, 20, 30, 40)
2.3.4 Sequence
c(1, 2, 3, 4, 5, 6, 7, 8, 9) ^ 2
1:9) ^ 2
(9:1
1 / (9:1)
2.4 Function
1:9) ^ 0.5
(sqrt(1:9)
exp(1:9)
log(1:9)
log10(1:9)
round(3.14)
2.5 Object
# assignment
= 1:9
x <- 1:9 # alt+_
x <- x <- 1:9
y 1:9 -> x
assign('x', 1:9)
x+ 10
x ^ 2
x
# indexing
7]
x[c(1,3,8)]
x[1:6]
x[
# multiple vectors
<- 11:19
y
y+ y
x c(x, y)
Use functions on objects:
sqrt(x)
exp(x)
log(x)
log10(x)
mean(x)
min(x)
max(x)
range(x)
sd(x)
Question: How can I memorize the function names?
Click to see the solution.
- RStudio auto-completeness.
- RStudio TAB.
- F1.
Action: Missing values
Missing values, which are presented as NA (Not Available), are often found in real life. Suppose you measure the air temperature at five locations in the campus. The temperatures are:
<- c(29, 28, 28, NA, 30) x
The fourth value is missing due to some accident. What is the mean temperature? Use the mean()
function.
2.6 Data frame
2.6.1 Manual input
<- data.frame(age = c(21, 31, 23, 40, 36),
df_manual gender = c('f', 'm', 'm', 'f', 'f'))
df_manual
2.6.2 Indexing
Use integers:
3, 2]
df_manual[2, 1:2]
df_manual[2, ]
df_manual[2] df_manual[,
Use column names (recommended):
$age
df_manual$age[3]
df_manual'age']
df_manual[, 3, 'age'] df_manual[
Use logical values:
$age[c(FALSE, TRUE, TRUE, FALSE, FALSE)] df_manual
Calculation with a column in a data frame as a vector:
$age - 2
df_manual2022 - df_manual$age
mean(df_manual)
sd(df_manual)
2.6.3 Functions for data frames
summary(df_manual)
names(df_manual)
str(df_manual)
nrow(df_manual)
ncol(df_manual)
dim(df_manual)
2.6.4 Import a data frame
Normally, we edit the data in another program, save it as a .csv file, and import it into R.
Demonstration:
- Open the ‘airquality.csv’ file with Excel. Take a look. Close it.
- Open it with Notepad. Take a look. Close it.
- Import it into R.
<- read.csv('data/airquality.csv')
df_aq summary(df_aq)
If you don’t like codes, click File - Import Dataset - From Text (base), and choose airquality.csv. You will get an object with the same name as the file name airquality
.
Indexing:
2, 4]
df_aq[2, 1:5]
df_aq[2, ]
df_aq[3] df_aq[,
Use column names:
$Wind
df_aq'Wind'] df_aq[,
Each column is a vector. Use the vector operation mentioned before.
Calculation:
# Convert Wind from mph (mile per hour) to m/s (meter per second)
$Wind * 1.609344 * 1000 / 3600
df_aq<- df_aq$Wind * 1.609344 * 1000 / 3600
wind_ms $wind_ms <- df_aq$Wind * 1.609344 * 1000 / 3600 df_aq
2.6.5 More functions for data frames
colSums(df_aq)
rowSums()
colMeans()
rowMeans()
apply(df_aq, 1, mean) # the mean of each row (meaningless in this example)
apply(df_aq, 2, max) # the maximum of each column
tapply(df_aq$Temp, df_aq$Month, mean) # the mean of each month
2.6.6 R built-in datasets
data()
irisnames(iris)
2.6.7 Export a data frame
write.csv(df_aq, "data/airquality_new.csv")
Double check:
Open “data/airquality_new.csv” with Excel. Take a look. Close it.
Open “data/airquality_new.csv” with Notepad. Take a look. Close it.