Analysis of Variance (ANOVA)

Numerical vs. categorical

Dr. Peng Zhao (✉ peng.zhao@xjtlu.edu.cn)

Department of Health and Environmental Sciences
Xi’an Jiaotong-Liverpool University

1 Learning objectives

In this lecture, you will

  1. Understand the concept of the analysis of variance (ANOVA), and
  2. Carry out one-way and two-way ANOVA for answering scientific questions.

2 Revisit the t-test

Example: Rats on diets

A biologist studies the weight gain of male lab rats on diets over a 4-week period. Three different diets are applied. The results are shown in the following table.

dtf <- data.frame(diet1 = c(90, 95, 100),
                  diet2 = c(120, 125, 130),
                  diet3 = c(125, 130, 135))
diet1 diet2 diet3
90 120 125
95 125 130
100 130 135
Weight gain (gram) of male lab rats
  • Are the weight gains of the three treatments are all equal?
  • In another word, do the diets have influence on the weight gain?

If we do the \(t\)-test between each two treatments…

dtt <- data.frame(Number_of_Samples = 2:10)
dtt$Number_of_Tests <- choose(dtt$Number_of_Samples, 2)
dtt$alpha_overall <- round(1 - (1 - 0.05)^dtt$Number_of_Tests, 2)
Number_of_Samples Number_of_Tests alpha_overall
2 1 0.05
3 3 0.14
4 6 0.26
5 10 0.40
6 15 0.54
7 21 0.66
8 28 0.76
9 36 0.84
10 45 0.90
The increasing number of samples and the probability that at least one of the \(t\)-tests results in a significant difference

Instead of \(t\)-test for testing the means, we can transform the data as:

dtf2 <- stack(dtf)
names(dtf2) <- c("wg", "diet")
wg diet
90 diet1
95 diet1
100 diet1
120 diet2
125 diet2
130 diet2
125 diet3
130 diet3
135 diet3

and analyse the relationship between the weight gain (numerical variable) of the diet treatments (categorical variable).

3 One-way ANOVA

Analysis of variance (ANOVA):
One of the most widely used statistical techniques. The test partitions the total variation present in a set of data into two or more components. Associated with each of these components is a specific source of variation, so that it is possible to ascertain the contributions of each of these sources to the total variation.
One-way ANOVA:
A test that concerns only one independent variable (\(x\)), which is called a factor and has multiple levels (settings, groups).
  1. Hypotheses:
    • \(H_0: \mu _1 = \mu _2 = \mu _3 = ... = \mu_k\)
    • \(H_1\): at least one mean is different from others
    • Reject \(H_0\)? Given \(\alpha\).
  2. Collect data. Suppose we have \(k\) samples. The \(i\)-th sample has \(n_i\) observations.
A data set which has only one independent variable with of multiple levels
Level 1 Level 2 Level 3 Level \(k\)
\(x_{1,1}\) \(x_{2, 1}\) \(x_{3,1}\) \(x_{k,1}\)
\(x_{1,2}\) \(x_{2,2}\) \(x_{3,2}\) \(x_{k,2}\)
\(x_{1,3}\) \(x_{2, 3}\) \(x_{3,3}\) \(x_{k, 3}\)
. . . . .
. \(x_{2, n_2}\) . . .
. \(x_{3,n_3}\) . .
\(x_{1, n_1}\) \(x_{k, n_k}\)
Mean \(\bar x_1\) \(\bar x_2\) \(\bar x_3\) \(\bar x_k\)
  1. Calculate a test statistic: \(F\)-test.
The Entries of One-Way ANOVA Table
Source \(df\) \(SS\) \(MS\) \(F\)
Within samples \(df_W = n - k\) \(SS_W = \sum_{i=1}^{k} \sum_{j=1}^{n_i}(x_{ij}-\bar x_i)^2\) \(MS_W = \frac{SS_W}{df_W}\) \(F = \frac{MS_B}{MS_W}\)
Between samples \(df_B = k - 1\) \(SS_B = \sum_{i = 1}^k n_i(\bar x_i - \bar x)^2\) \(MS_B = \frac {SS_B}{df_B}\)
Total \(df_T = n - 1\) \(SS_T = \sum_{i = 1}^k \sum _{j=1}^{n_i} (x_{ij}-\bar x) ^2\) \(MS_T = \frac{SS_T}{df_T}\)
\(n\)
The total number of the observations. \(n = \sum n_i\)
\(x_{ij}\)
The \(j\)-th observation in the \(i\)-th sample
\(\bar x\)
Overall/grand mean of all the observations. \(\bar x = \frac{\sum x_{ij}}{n}\)
\(MS\)
Mean of squared deviation from the mean
\(SS\)
Sum of squared deviation from the mean
\(df\)
degrees of freedom

If \(H_0\) is true, then any difference you see between the \(k\) samples are due to chance, i.e. \(\sigma^2_\mathrm {between} = \sigma^2_\mathrm {withiin} = \sigma^2_\mathrm {total}\), i.e. \(MS_B = MS_T = MS_W\). ==> \(F\)-test.

From the previous table, we can get:

\[df_T = df_W+ df_B\]

\[SS_T = SS_W + SS_B\]

which can be used for double-check.

  1. Decision. Reject or not reject \(H_0\).
  2. Conclusion. Whether the categorical variable has effect on the numerical variable.

Example: Rats on diets

  1. Hypotheses and question:

    • \(H_0: \mu _1 = \mu _2 = \mu _3\)
    • \(H_1\): at least one mean is different from others
    • Reject \(H_0\)?
  2. Collect data:

  3. Calculate a test statistics: \(F\) test.

Action: Fill in the ANOVA table.

The ANOVA table for the weight gain experiment
Source \(df\) \(SS\) \(MS\) \(F\)
Within samples
Between samples
Total
Code
# df
(k <- ncol(dtf))
(n <- length(unlist(dtf)))
(dfW <- n - k)
(dfB <- k - 1)
(dfT <- n-1)

# mean
(xbar <- mean(unlist(dtf)))
(xibar <- colMeans(dtf))

# SS
(SSW1 <- sum((dtf$diet1 - xibar[1]) ^ 2))
(SSW2 <- sum((dtf$diet2 - xibar[2]) ^ 2))
(SSW3 <- sum((dtf$diet3 - xibar[3]) ^ 2))
(SSW <- SSW1 + SSW2 + SSW3)
(SSB1 <- length(dtf$diet1) * (xibar[1] - xbar) ^ 2)
(SSB2 <- length(dtf$diet2) * (xibar[2] - xbar) ^ 2)
(SSB3 <- length(dtf$diet3) * (xibar[3] - xbar) ^ 2)
(SSB <- SSB1 + SSB2 + SSB3)

# Double check
(SST <- sum((unlist(dtf) - xbar) ^ 2))
SSW + SSB

SSB/SST # Correlation ratio

# F
(MSW <- SSW / dfW)
(MSB <- SSB / dfB)
(F_score <- MSB / MSW)
(F_critical <- qf(0.95, df1 = dfB, df2 = dfW))
pf(F_score, df1 = dfB, df2 = dfW, lower.tail = FALSE)
Mean and within-/between-samples sum of squares
Diet 1 Diet2 Diet3 Total
90 120 125
95 125 130
100 130 135
\(\bar x_i\) 95 125 130
\(SS_W\) 50 50 50 150
\(SS_B\) 1408.3333333 208.3333333 533.3333333 2150
The ANOVA table for the weight gain experiment
Source \(df\) \(SS\) \(MS\) \(F\)
Within samples 6 150 25 43
Between samples 2 2150 1075
Total 2300
  1. Decision.

Since the F value is 43, which exceeds the critical value of 5.1432528 at the significance level of \(\alpha = 0.05\), we can reject \(H_0\).

One step:

wg_aov <- aov(wg ~ diet, data = dtf2)
summary(wg_aov)
            Df Sum Sq Mean Sq F value   Pr(>F)    
diet         2   2150    1075      43 0.000277 ***
Residuals    6    150      25                     
---
Signif. codes:  0 '***' 0.001 '**' 0.01 '*' 0.05 '.' 0.1 ' ' 1
  1. Conclusion.

There is a significant effect of diet on the weight again of male laboratory rats.

The connection between ANOVA and t-test:

Code
dtf3 <- dtf2[1:6, ]
t.test(wg~diet, dtf3)
summary(aov(wg~diet, dtf3))

4 Understand ANOVA

4.1 The model

\[x_{i j}=\mu+\tau_{j}+\epsilon_{i j} ; \quad i=1,2, \ldots, n_{j} ; \quad j=1,2, \ldots, k\]

\(\mu\)
The grand mean. The mean of all \(k\) population means.
\(\tau_{j}\)
The treatment effect. The difference between the mean of the j-th population and the grand mean (\(\bar x_j - \mu\)).
\(\epsilon_{i j}\)
The error term. The amount by which an individual measurement differs from the mean of the population to which it belongs (\(x_{ij} - \bar x_j\)).
  • \(H_{0}: \mu_{1}=\mu_{2}=\cdots=\mu_{k}\)

  • \(H_{A}:\) not all \(\mu_{k}\) are equal

4.2 Assumptions

  • The \(k\) sets of observed data constitute \(k\) independent random samples from the respective populations.
  • Each of the populations from which the samples come is normally distributed with mean \(\mu_{j}\) and variance \(\sigma_{j}^{2}\).
  • Each of the populations has the same variance. That is, \(\sigma_{1}^{2}=\sigma_{2}^{2}=\cdots=\sigma_{k}^{2}=\sigma^{2},\) the common variance.
  • The \(\tau_{j}\) are unknown constants and \(\sum \tau_{j}=0\) since the sum of all deviations of the \(\mu_{j}\) from their mean, \(\mu,\) is zero.
  • The \(\epsilon_{i j}\) have a mean of \(0,\) since the mean of \(x_{i j}\) is \(\mu_{j}\)
  • The \(\epsilon_{i j}\) have a variance equal to the variance of the \(x_{i j},\) since the \(\epsilon_{i j}\) and \(x_{i j}\) differ only by a constant; that is, the error variance is equal to \(\sigma^{2},\) the common variance specified in the assumption above.
  • The \(\epsilon_{i j}\) are normally (and independently) distributed.

4.3 Estimating the statistics

The population variance \(\sigma^{2}\) may be estimated in two ways:

  1. \(\sigma ^2 = MS_W = \frac{SS_W}{df_W}\)
  2. \(\sigma ^2 = MS_B = \frac{SS_B}{df_B}\)

Compare the two estimates of the population variance: \(F_\mathrm {score} = MSB/MSW\).

  • The numerator df: \(k − 1\).
  • The denominator df: \(n - k\).

5 Repeated measures

Repeated measures (matched samples, randomized blocks, within subjects):

  • \(k\) samples.
  • Each sample has \(n\) observations.
A data set which has repeated measures with only one independent variable with of multiple levels
ID Level 1 Level 2 Level 3 Level \(k\)
1 \(x_{1,1}\) \(x_{2, 1}\) \(x_{3,1}\) \(x_{k,1}\)
2 \(x_{1,2}\) \(x_{2,2}\) \(x_{3,2}\) \(x_{k,2}\)
3 \(x_{1,3}\) \(x_{2, 3}\) \(x_{3,3}\) \(x_{k, 3}\)
. . . . . .
n \(x_{1, n}\) \(x_{2, n}\) \(x_{3, n}\). \(x_{k, n}\)
The Entries of One-Way ANOVA Table for Repeated Measures
Source \(df\) \(SS\) \(MS\) \(F\)
Within samples \(df_W = nk - k\) \(SS_W = \sum_{i=1}^{k} \sum_{j=1}^{n}(x_{ij}-\bar x_i)^2\) \(MS_W = \frac{SS_W}{df_W}\) \(F = \frac{MS_B}{MS_{Wcorr}}\)
Between samples \(df_B = k - 1\) \(SS_B = \sum_{i = 1}^k n(\bar x_i - \bar x)^2\) \(MS_B = \frac {SS_B}{df_B}\)
Subjects (Row) \(df_S = n-1\) \(SS_S = k\sum _{j=1}^{n} (\bar x_j - \bar x)\)
Within Corrected \(df_{Wcorr} = df_W- df_S\) \(SS_{Wcorr} = SS_W-SS_S\) \(MS_{Wcorr} = \frac{SS_{Wcorr}}{df_{Wcorr}}\)
Total \(df_T = nk - 1\) \(SS_T = \sum_{i = 1}^k \sum _{j=1}^{n} (x_{ij}-\bar x) ^2\) \(MS_T = \frac{SS_T}{df_T}\)

Demo: Weight-loss program

dtf <- data.frame(
  # id = LETTERS[1:10],
  before = c(198, 201, 210, 185, 204, 156, 167, 197, 220, 186),
  one = c(194, 203, 200, 183, 200, 153, 166, 197, 215, 184),
  two = c(191, 200, 192, 180, 195, 150, 167, 195, 209, 179),
  three = c(188, 196, 188, 178, 191, 145, 166, 192, 205, 175)
)
rownames(dtf) <- LETTERS[1:10]
before one month two months three months
A 198 194 191 188
B 201 203 200 196
C 210 200 192 188
D 185 183 180 178
E 204 200 195 191
F 156 153 150 145
G 167 166 167 166
H 197 197 195 192
I 220 215 209 205
J 186 184 179 175

Is the weight-loss program effective?

Click to see the transformed data frame

Action: Follow the steps and fill in the ANOVA table.

  1. Hypotheses and question:

    • \(H_0: \mu_0 = \mu_1 = \mu_2 = \mu_3\)
    • \(H_1:\) Not \(H_0\)
    • Question: Reject \(H_0\)? Given \(\alpha\).
  2. Collect data

  3. Calculate a test statistic.

The ANOVA table for the Weight-loss program
Source \(df\) \(SS\) \(MS\) \(F\)
Within samples
Between samples
Subjects (Row)
Within Corrected
Total
Click to see the results
dtf <- data.frame(
  # id = LETTERS[1:10],
  before = c(198, 201, 210, 185, 204, 156, 167, 197, 220, 186),
  one = c(194, 203, 200, 183, 200, 153, 166, 197, 215, 184),
  two = c(191, 200, 192, 180, 195, 150, 167, 195, 209, 179),
  three = c(188, 196, 188, 178, 191, 145, 166, 192, 205, 175)
)

  # df
(k <- ncol(dtf))
(n <- nrow(dtf))
(N <- length(unlist(dtf)))
(dfW <- N - k)
(dfB <- k - 1)
(dfT <- N - 1)
(dfS <- n - 1)
(dfWcorr <- dfW-dfS)

# mean
(xbar <- mean(unlist(dtf)))
(xibar <- colMeans(dtf))
(xjbar <- rowMeans(dtf))

# SS
(SSW1 <- sum((dtf$before - xibar[1]) ^ 2))
(SSW2 <- sum((dtf$one - xibar[2]) ^ 2))
(SSW3 <- sum((dtf$two - xibar[3]) ^ 2))
(SSW4 <- sum((dtf$three - xibar[4]) ^ 2))
(SSW <- SSW1 + SSW2 + SSW3 + SSW4)
sum(apply(dtf, 2, function(x) (x - mean(x)) ^2))

(SSB1 <- n * (xibar[1] - xbar) ^ 2)
(SSB2 <- n * (xibar[2] - xbar) ^ 2)
(SSB3 <- n * (xibar[3] - xbar) ^ 2)
(SSB4 <- n * (xibar[4] - xbar) ^ 2)
(SSB <- SSB1 + SSB2 + SSB3 + SSB4)
sum(n * (xibar - xbar) ^ 2)

(SSS <- k * sum((xjbar - xbar) ^ 2))
(SSWcorr <- SSW-SSS)

# Double check
(SST <- sum((unlist(dtf) - xbar) ^ 2))
SSW + SSB

# F
(MSB <- SSB / dfB)
(MSWcorr <- SSWcorr / dfWcorr)
(F_score <- MSB / MSWcorr)
(F_critical <- qf(0.95, df1 = dfB, df2 = dfWcorr))
pf(F_score, df1 = dfB, df2 = dfWcorr, lower.tail = FALSE)
The ANOVA table for the Weight-loss program
Source \(df\) \(SS\) \(MS\) \(F\)
Within samples 36 1.18409^{4} 25 24.4851201
Between samples 3 569.075 189.6916667
Subjects (Row) 9 1.1631725^{4}
Within Corrected 27 209.175 7.7472222
Total 39 1.2409975^{4}
  1. Decision.

    With 3 and 27 degrees of freedom, the critical \(F\) for \(\alpha = 0.05\) is 2.9603513, which is smaller than the calculated \(F\) value 24.4851201. Thus, the decision is to reject \(H_0\).

One step:

dtf2 <- stack(dtf)
names(dtf2) <- c("w", "level")
w_aov <- aov(w ~ level, data = dtf2)
summary(w_aov)
            Df Sum Sq Mean Sq F value Pr(>F)
level        3    569   189.7   0.577  0.634
Residuals   36  11841   328.9               
dtf2$subject <- rep(LETTERS[1:10], 4)
w_aov2 <- aov(w ~ level + Error(subject/level), data = dtf2)
summary(w_aov2)

Error: subject
          Df Sum Sq Mean Sq F value Pr(>F)
Residuals  9  11632    1292               

Error: subject:level
          Df Sum Sq Mean Sq F value  Pr(>F)    
level      3  569.1  189.69   24.48 7.3e-08 ***
Residuals 27  209.2    7.75                    
---
Signif. codes:  0 '***' 0.001 '**' 0.01 '*' 0.05 '.' 0.1 ' ' 1
  1. Conclusion.

This program has significant effect on weight-loss.

6 Two-way ANOVA

Demo: Rats on diets

A biologist studies the weight gain of lab rats dependently on diets and gender over a 4-week period. Three different diets are applied. . Do diet and gender have an effect on weight gain?

Weight gain of male and female lab rats
Diet 1 Diet2 Diet3
Male 90 120 125
95 125 130
100 130 135
Female 75 100 118
78 118 125
90 112 132

Can we apply a one-way ANOVA for diet effect, then another one-way ANOVA for gender effect?

Transform the data frame as:

dtf <- data.frame(
  w = c(90,95,100,75,78,90,120,125,130,100,118,112,125,130,135,118,125,132),
  diet = rep(c("Diet1", "Diet2", "Diet3"), each = 6),
  gender = rep(c("Male", "Female"), each = 3)
)
w diet gender
90 Diet1 Male
95 Diet1 Male
100 Diet1 Male
75 Diet1 Female
78 Diet1 Female
90 Diet1 Female
120 Diet2 Male
125 Diet2 Male
130 Diet2 Male
100 Diet2 Female
118 Diet2 Female
112 Diet2 Female
125 Diet3 Male
130 Diet3 Male
135 Diet3 Male
118 Diet3 Female
125 Diet3 Female
132 Diet3 Female

Interaction:

The concept of interaction

A
  • No significant effect of diet,
  • No significant effect of gender,
  • No interaction
B
  • No significant effect of diet,
  • Effect of gender,
  • No interaction
C
  • Effect of diet,
  • Effect of gender,
  • No interaction
D
  • Effect of diet in males
  • No significant effect of diet in females,
  • Effect of gender,
  • Interaction (positive)
E
  • Effect of diet in males,
  • No significant effect of diet in females,
  • Effect of gender,
  • Interaction (negative)
F
  • Effect of diet in male,
  • Effect of diet in female,
  • Effect of gender,
  • Interaction (negative)
A data set which has two independent variable with of multiple levels. Suppose we have \(K\) levels for independent variable 1 (columns) and \(M\) categories for independent variable 2.
Level 1 Level 2 Level \(k\) Level \(K\)
Category 1 \(x_{1,1,1}\) \(x_{1,2,1}\) \(x_{1,k,1}\) \(x_{1, K, 1}\)
\(x_{1,1,2}\) \(x_{1,2,2}\) \(x_{1,k,2}\) \(x_{1,K,2}\)
\(x_{1,1, j}\) \(x_{1,2, j}\) \(x_{1,k, j}\) \(x_{1,K,j}\)
\(x_{1,1, J}\) \(x_{1,2, J}\) \(x_{1,k, J}\) \(x_{1,K,J}\)
Category 2 \(x_{2,1,1}\) \(x_{2,2,1}\) \(x_{2,k,1}\) \(x_{2, K, 1}\)
\(x_{2,1,2}\) \(x_{2,2,2}\) \(x_{2,k,2}\) \(x_{2,K,2}\)
\(x_{2,1, j}\) \(x_{2,2, j}\) \(x_{2,k, j}\) \(x_{2,K,j}\)
\(x_{2,1, J}\) \(x_{2,2, J}\) \(x_{2,k, J}\) \(x_{2,K,J}\)
Category m \(x_{m,1,1}\) \(x_{m,2,1}\) \(x_{m,k,1}\) \(x_{m, K, 1}\)
\(x_{m,1,2}\) \(x_{m,2,2}\) \(x_{m,k,2}\) \(x_{m,K,2}\)
\(x_{m,1, j}\) \(x_{m,2, j}\) \(x_{m,k, j}\) \(x_{m,K,j}\)
\(x_{m,1, J}\) \(x_{m,2, J}\) \(x_{m,k, J}\) \(x_{m,K,J}\)
Category M \(x_{M,1,1}\) \(x_{M,2,1}\) \(x_{M,k,1}\) \(x_{M, K, 1}\)
\(x_{M,1,2}\) \(x_{M,2,2}\) \(x_{M,k,2}\) \(x_{M,K,2}\)
\(x_{M,1, j}\) \(x_{M,2, j}\) \(x_{M,k, j}\) \(x_{M,K,j}\)
\(x_{M,1, J}\) \(x_{M,2, J}\) \(x_{M,k, J}\) \(x_{M,K,J}\)
The Entries of Two-Way ANOVA Table
Source \(df\) \(SS\) \(MS\) \(F\)
V1 (row) \(df_r = M-1\) \(SS_r = \sum_{m = 1}^{M} (\bar x_{m} - \bar x)^2\) \(MS_r = \frac{SS_r}{df_r}\) \(F_r = \frac{MS_r}{MS_W}\)
V2 (column) \(df_c = K - 1\) \(SS_c = \sum_{k = 1}^K (\bar x_k - \bar x)^2\) \(MS_c = \frac{SS_c}{df_c}\) \(F_c = \frac{MS_c}{MS_W}\)
Interaction \(df_I=df_r df_c\) \(SS_I = SS_B-SS_c-SS_r\) \(MS_I = \frac{SS_I}{df_I}\) \(F_I = \frac{MS_I}{MS_W}\)
Within samples \(df_W = MK(J-1)\) \(SS_W = \sum_{m=1}^{M} \sum_{k=1}^{K}(x_{m,k,j}-\bar x_{m,k})^2\) \(MS_W = \frac{SS_W}{df_W}\)
Between samples \(df_B = MK - 1\) \(SS_B = J\sum_{m = 1}^M \sum_{k=1}^{K}(\bar x_{m,k} - \bar x)^2\) \(MS_B = \frac {SS_B}{df_B}\)
Total \(df_T = MKJ - 1\) \(SS_T = \sum(x-\bar x) ^2\) \(MS_T = \frac{SS_T}{df_T}\)

Action: Fill in the ANOVA table on the basis of the following data and draw your conclusion (15 minutes).

The ANOVA table for the weight gain experiment
Source \(df\) \(SS\) \(MS\) \(F\)
V1 (row)
V2 (column)
Interaction
Within samples
Between samples
Total
Click to see the results
dtf <- data.frame(
  w = c(90,95,100,75,78,90,120,125,130,100,118,112,125,130,135,118,125,132),
  diet = rep(c("Diet1", "Diet2", "Diet3"), each = 6),
  gender = rep(c("Male", "Female"), each = 3)
)

# df
(M <- nlevels(as.factor(dtf$gender)))
[1] 2
(K <- nlevels(as.factor(dtf$diet)))
[1] 3
(n <- nrow(dtf))
[1] 18
(J <- n / M / K)
[1] 3
(dfr <- M - 1)
[1] 1
(dfc <- K - 1)
[1] 2
(dfI <- dfr * dfc)
[1] 2
(dfW <- M * K * ( J - 1))
[1] 12
(dfB <- M * K - 1)
[1] 5
(dfT <- n - 1)
[1] 17
# mean
(xbar <- mean(dtf$w))
[1] 111
(xmk_bar <- tapply(dtf$w, list(dtf$diet, dtf$gender), mean))
      Female Male
Diet1     81   95
Diet2    110  125
Diet3    125  130
dtf$xmk_bar <- mapply(function(d, g) xmk_bar[d, g], dtf$diet, dtf$gender)
(xm_bar <- colMeans(xmk_bar))
  Female     Male 
105.3333 116.6667 
(xk_bar <- rowMeans(xmk_bar))
Diet1 Diet2 Diet3 
 88.0 117.5 127.5 
# SS
(SSB <- J * sum((xmk_bar - xbar) ^ 2))
[1] 5730
(SSW <- sum((dtf$w - dtf$xmk_bar) ^2))
[1] 542
(SST <- sum((dtf$w - xbar) ^ 2))
[1] 6272
(SSr <- K * J * sum((xm_bar - xbar) ^2))
[1] 578
(SSc <- M * J * sum((xk_bar - xbar) ^ 2))
[1] 5061
(SSI <- SSB - SSc - SSr)
[1] 91
(SSS <- k * sum((xjbar - xbar) ^ 2))
[1] 245874.8
(SSWccor <- SSW-SSS)
[1] -245332.8
# Double check
SSW + SSB
[1] 6272
SST
[1] 6272
# MS
(MSr <- SSr / dfr)
[1] 578
(MSc <- SSc / dfc)
[1] 2530.5
(MSI <- SSI / dfI)
[1] 45.5
(MSW <- SSW / dfW)
[1] 45.16667
# F
(F_r <- MSr / MSW)
[1] 12.79705
qf(0.95, df1 = dfr, df2 = dfW)
[1] 4.747225
pf(F_r, df1 = dfr, df2 = dfW, lower.tail = FALSE)
[1] 0.0038011
(F_c <- MSc / MSW)
[1] 56.02583
qf(0.95, df1 = dfc, df2 = dfW)
[1] 3.885294
pf(F_c, df1 = dfc, df2 = dfW, lower.tail = FALSE)
[1] 8.193548e-07
(F_I <- MSI / MSW)
[1] 1.00738
qf(0.95, df1 = dfI, df2 = dfW)
[1] 3.885294
pf(F_I, df1 = dfI, df2 = dfW, lower.tail = FALSE)
[1] 0.3940701
The ANOVA table for the weight gain experiment
Source \(df\) \(SS\) \(MS\) \(F\)
V1 (row) 1 578 578 12.797048
V2 (column) 2 5061 2530.5 56.0258303
Interaction 2 91 45.5 1.0073801
Within samples 12 542 45.1666667
Between samples 5 5730 189.6916667
Total 17 6272

Conclusion:

Both diet and gender have a significant effect on the weight gain of rats, and there is no significant interaction between gender and diet in weight gain.

One-step:

aov_wg <- aov(w ~ diet * gender, data = dtf)
summary(aov_wg)
            Df Sum Sq Mean Sq F value   Pr(>F)    
diet         2   5061  2530.5  56.026 8.19e-07 ***
gender       1    578   578.0  12.797   0.0038 ** 
diet:gender  2     91    45.5   1.007   0.3941    
Residuals   12    542    45.2                     
---
Signif. codes:  0 '***' 0.001 '**' 0.01 '*' 0.05 '.' 0.1 ' ' 1

7 Readings

  • The R Book - Chapter 11

8 Highlights

  • Carry out a step-by-step one-way ANOVA (repeated or not repeated) for a scientific question.
    • State \(H_0\) and \(H_1\).
    • Calculate the critical value for a given \(\alpha\).
    • Calculate the testing statistics (\(F\) score).
    • Draw a conclusion for the scientific question.