1 Learning objectives

In this lecture, you will

Understand the concept of the analysis of variance (ANOVA), and
Carry out one-way and two-way ANOVA for answering scientific questions.

2 Revisit the t-test

Example: Rats on diets

A biologist studies the weight gain of male lab rats on diets over a 4-week period. Three different diets are applied. The results are shown in the following table.

dtf <- data.frame(diet1 = c(90, 95, 100),
                  diet2 = c(120, 125, 130),
                  diet3 = c(125, 130, 135))

diet1	diet2	diet3
90	120	125
95	125	130
100	130	135

Weight gain (gram) of male lab rats

Are the weight gains of the three treatments are all equal?
In another word, do the diets have influence on the weight gain?

If we do the \(t\)-test between each two treatments…

dtt <- data.frame(Number_of_Samples = 2:10)
dtt$Number_of_Tests <- choose(dtt$Number_of_Samples, 2)
dtt$alpha_overall <- round(1 - (1 - 0.05)^dtt$Number_of_Tests, 2)

Number_of_Samples	Number_of_Tests	alpha_overall
2	1	0.05
3	3	0.14
4	6	0.26
5	10	0.40
6	15	0.54
7	21	0.66
8	28	0.76
9	36	0.84
10	45	0.90

The increasing number of samples and the probability that at least one of the \(t\)-tests results in a significant difference

Instead of \(t\)-test for testing the means, we can transform the data as:

dtf2 <- stack(dtf)
names(dtf2) <- c("wg", "diet")

wg	diet
90	diet1
95	diet1
100	diet1
120	diet2
125	diet2
130	diet2
125	diet3
130	diet3
135	diet3

and analyse the relationship between the weight gain (numerical variable) of the diet treatments (categorical variable).

3 One-way ANOVA

Analysis of variance (ANOVA):: One of the most widely used statistical techniques. The test partitions the total variation present in a set of data into two or more components. Associated with each of these components is a specific source of variation, so that it is possible to ascertain the contributions of each of these sources to the total variation.
One-way ANOVA:: A test that concerns only one independent variable (\(x\)), which is called a factor and has multiple levels (settings, groups).

Hypotheses:
- \(H_0: \mu _1 = \mu _2 = \mu _3 = ... = \mu_k\)
- \(H_1\): at least one mean is different from others
- Reject \(H_0\)? Given \(\alpha\).
Collect data. Suppose we have \(k\) samples. The \(i\)-th sample has \(n_i\) observations.

A data set which has only one independent variable with of multiple levels
	Level 1	Level 2	Level 3	…	Level \(k\)
	\(x_{1,1}\)	\(x_{2, 1}\)	\(x_{3,1}\)	…	\(x_{k,1}\)
	\(x_{1,2}\)	\(x_{2,2}\)	\(x_{3,2}\)	…	\(x_{k,2}\)
	\(x_{1,3}\)	\(x_{2, 3}\)	\(x_{3,3}\)	…	\(x_{k, 3}\)
	.	.	.	.	.
	.	\(x_{2, n_2}\)	.	.	.
	.		\(x_{3,n_3}\)	.	.
	\(x_{1, n_1}\)			…	\(x_{k, n_k}\)
Mean	\(\bar x_1\)	\(\bar x_2\)	\(\bar x_3\)		\(\bar x_k\)

Calculate a test statistic: \(F\)-test.

The Entries of One-Way ANOVA Table
Source	\(df\)	\(SS\)	\(MS\)	\(F\)
Within samples	\(df_W = n - k\)	\(SS_W = \sum_{i=1}^{k} \sum_{j=1}^{n_i}(x_{ij}-\bar x_i)^2\)	\(MS_W = \frac{SS_W}{df_W}\)	\(F = \frac{MS_B}{MS_W}\)
Between samples	\(df_B = k - 1\)	\(SS_B = \sum_{i = 1}^k n_i(\bar x_i - \bar x)^2\)	\(MS_B = \frac {SS_B}{df_B}\)
Total	\(df_T = n - 1\)	\(SS_T = \sum_{i = 1}^k \sum _{j=1}^{n_i} (x_{ij}-\bar x) ^2\)	\(MS_T = \frac{SS_T}{df_T}\)

\(n\): The total number of the observations. \(n = \sum n_i\)
\(x_{ij}\): The \(j\)-th observation in the \(i\)-th sample
\(\bar x\): Overall/grand mean of all the observations. \(\bar x = \frac{\sum x_{ij}}{n}\)
\(MS\): Mean of square~~d deviation from the mean~~
\(SS\): Sum of square~~d deviation from the mean~~
\(df\): degrees of freedom

If \(H_0\) is true, then any difference you see between the \(k\) samples are due to chance, i.e. \(\sigma^2_\mathrm {between} = \sigma^2_\mathrm {withiin} = \sigma^2_\mathrm {total}\), i.e. \(MS_B = MS_T = MS_W\). ==> \(F\)-test.

From the previous table, we can get:

\[df_T = df_W+ df_B\]

\[SS_T = SS_W + SS_B\]

which can be used for double-check.

Decision. Reject or not reject \(H_0\).
Conclusion. Whether the categorical variable has effect on the numerical variable.

Example: Rats on diets

Hypotheses and question:
- \(H_0: \mu _1 = \mu _2 = \mu _3\)
- \(H_1\): at least one mean is different from others
- Reject \(H_0\)?
Collect data:
Calculate a test statistics: \(F\) test.

Action: Fill in the ANOVA table.

The ANOVA table for the weight gain experiment
Source	\(df\)	\(SS\)	\(MS\)	\(F\)
Within samples
Between samples
Total

Code

# df
(k <- ncol(dtf))
(n <- length(unlist(dtf)))
(dfW <- n - k)
(dfB <- k - 1)
(dfT <- n-1)

# mean
(xbar <- mean(unlist(dtf)))
(xibar <- colMeans(dtf))

# SS
(SSW1 <- sum((dtf$diet1 - xibar[1]) ^ 2))
(SSW2 <- sum((dtf$diet2 - xibar[2]) ^ 2))
(SSW3 <- sum((dtf$diet3 - xibar[3]) ^ 2))
(SSW <- SSW1 + SSW2 + SSW3)
(SSB1 <- length(dtf$diet1) * (xibar[1] - xbar) ^ 2)
(SSB2 <- length(dtf$diet2) * (xibar[2] - xbar) ^ 2)
(SSB3 <- length(dtf$diet3) * (xibar[3] - xbar) ^ 2)
(SSB <- SSB1 + SSB2 + SSB3)

# Double check
(SST <- sum((unlist(dtf) - xbar) ^ 2))
SSW + SSB

SSB/SST # Correlation ratio

# F
(MSW <- SSW / dfW)
(MSB <- SSB / dfB)
(F_score <- MSB / MSW)
(F_critical <- qf(0.95, df1 = dfB, df2 = dfW))
pf(F_score, df1 = dfB, df2 = dfW, lower.tail = FALSE)

Mean and within-/between-samples sum of squares
	Diet 1	Diet2	Diet3	Total
	90	120	125
	95	125	130
	100	130	135
\(\bar x_i\)	95	125	130
\(SS_W\)	50	50	50	150
\(SS_B\)	1408.3333333	208.3333333	533.3333333	2150

The ANOVA table for the weight gain experiment
Source	\(df\)	\(SS\)	\(MS\)	\(F\)
Within samples	6	150	25	43
Between samples	2	2150	1075
Total		2300

Decision.

Since the F value is 43, which exceeds the critical value of 5.1432528 at the significance level of \(\alpha = 0.05\), we can reject \(H_0\).

One step:

wg_aov <- aov(wg ~ diet, data = dtf2)
summary(wg_aov)

            Df Sum Sq Mean Sq F value   Pr(>F)    
diet         2   2150    1075      43 0.000277 ***
Residuals    6    150      25                     
---
Signif. codes:  0 '***' 0.001 '**' 0.01 '*' 0.05 '.' 0.1 ' ' 1

Conclusion.

There is a significant effect of diet on the weight again of male laboratory rats.

The connection between ANOVA and t-test:

Code

dtf3 <- dtf2[1:6, ]
t.test(wg~diet, dtf3)
summary(aov(wg~diet, dtf3))

4 Understand ANOVA

4.1 The model

\[x_{i j}=\mu+\tau_{j}+\epsilon_{i j} ; \quad i=1,2, \ldots, n_{j} ; \quad j=1,2, \ldots, k\]

\(\mu\): The grand mean. The mean of all \(k\) population means.
\(\tau_{j}\): The treatment effect. The difference between the mean of the j-th population and the grand mean (\(\bar x_j - \mu\)).
\(\epsilon_{i j}\): The error term. The amount by which an individual measurement differs from the mean of the population to which it belongs (\(x_{ij} - \bar x_j\)).

\(H_{0}: \mu_{1}=\mu_{2}=\cdots=\mu_{k}\)
\(H_{A}:\) not all \(\mu_{k}\) are equal

4.2 Assumptions

The \(k\) sets of observed data constitute \(k\) independent random samples from the respective populations.
Each of the populations from which the samples come is normally distributed with mean \(\mu_{j}\) and variance \(\sigma_{j}^{2}\).
Each of the populations has the same variance. That is, \(\sigma_{1}^{2}=\sigma_{2}^{2}=\cdots=\sigma_{k}^{2}=\sigma^{2},\) the common variance.
The \(\tau_{j}\) are unknown constants and \(\sum \tau_{j}=0\) since the sum of all deviations of the \(\mu_{j}\) from their mean, \(\mu,\) is zero.
The \(\epsilon_{i j}\) have a mean of \(0,\) since the mean of \(x_{i j}\) is \(\mu_{j}\)
The \(\epsilon_{i j}\) have a variance equal to the variance of the \(x_{i j},\) since the \(\epsilon_{i j}\) and \(x_{i j}\) differ only by a constant; that is, the error variance is equal to \(\sigma^{2},\) the common variance specified in the assumption above.
The \(\epsilon_{i j}\) are normally (and independently) distributed.

4.3 Estimating the statistics

The population variance \(\sigma^{2}\) may be estimated in two ways:

\(\sigma ^2 = MS_W = \frac{SS_W}{df_W}\)
\(\sigma ^2 = MS_B = \frac{SS_B}{df_B}\)

Compare the two estimates of the population variance: \(F_\mathrm {score} = MSB/MSW\).

The numerator df: \(k − 1\).
The denominator df: \(n - k\).

5 Repeated measures

Repeated measures (matched samples, randomized blocks, within subjects):

\(k\) samples.
Each sample has \(n\) observations.

A data set which has repeated measures with only one independent variable with of multiple levels
ID	Level 1	Level 2	Level 3	…	Level \(k\)
1	\(x_{1,1}\)	\(x_{2, 1}\)	\(x_{3,1}\)	…	\(x_{k,1}\)
2	\(x_{1,2}\)	\(x_{2,2}\)	\(x_{3,2}\)	…	\(x_{k,2}\)
3	\(x_{1,3}\)	\(x_{2, 3}\)	\(x_{3,3}\)	…	\(x_{k, 3}\)
.	.	.	.	.	.
n	\(x_{1, n}\)	\(x_{2, n}\)	\(x_{3, n}\).	…	\(x_{k, n}\)

The Entries of One-Way ANOVA Table for Repeated Measures
Source	\(df\)	\(SS\)	\(MS\)	\(F\)
Within samples	\(df_W = nk - k\)	\(SS_W = \sum_{i=1}^{k} \sum_{j=1}^{n}(x_{ij}-\bar x_i)^2\)	\(MS_W = \frac{SS_W}{df_W}\)	\(F = \frac{MS_B}{MS_{Wcorr}}\)
Between samples	\(df_B = k - 1\)	\(SS_B = \sum_{i = 1}^k n(\bar x_i - \bar x)^2\)	\(MS_B = \frac {SS_B}{df_B}\)
Subjects (Row)	\(df_S = n-1\)	\(SS_S = k\sum _{j=1}^{n} (\bar x_j - \bar x)\)
Within Corrected	\(df_{Wcorr} = df_W- df_S\)	\(SS_{Wcorr} = SS_W-SS_S\)	\(MS_{Wcorr} = \frac{SS_{Wcorr}}{df_{Wcorr}}\)
Total	\(df_T = nk - 1\)	\(SS_T = \sum_{i = 1}^k \sum _{j=1}^{n} (x_{ij}-\bar x) ^2\)	\(MS_T = \frac{SS_T}{df_T}\)

Demo: Weight-loss program

dtf <- data.frame(
  # id = LETTERS[1:10],
  before = c(198, 201, 210, 185, 204, 156, 167, 197, 220, 186),
  one = c(194, 203, 200, 183, 200, 153, 166, 197, 215, 184),
  two = c(191, 200, 192, 180, 195, 150, 167, 195, 209, 179),
  three = c(188, 196, 188, 178, 191, 145, 166, 192, 205, 175)
)
rownames(dtf) <- LETTERS[1:10]

	before	one month	two months	three months
A	198	194	191	188
B	201	203	200	196
C	210	200	192	188
D	185	183	180	178
E	204	200	195	191
F	156	153	150	145
G	167	166	167	166
H	197	197	195	192
I	220	215	209	205
J	186	184	179	175

Is the weight-loss program effective?

Click to see the transformed data frame

Action: Follow the steps and fill in the ANOVA table.

Hypotheses and question:
- \(H_0: \mu_0 = \mu_1 = \mu_2 = \mu_3\)
- \(H_1:\) Not \(H_0\)
- Question: Reject \(H_0\)? Given \(\alpha\).
Collect data
Calculate a test statistic.

The ANOVA table for the Weight-loss program
Source	\(df\)	\(SS\)	\(MS\)	\(F\)
Within samples
Between samples
Subjects (Row)
Within Corrected
Total

Click to see the results

dtf <- data.frame(
  # id = LETTERS[1:10],
  before = c(198, 201, 210, 185, 204, 156, 167, 197, 220, 186),
  one = c(194, 203, 200, 183, 200, 153, 166, 197, 215, 184),
  two = c(191, 200, 192, 180, 195, 150, 167, 195, 209, 179),
  three = c(188, 196, 188, 178, 191, 145, 166, 192, 205, 175)
)

  # df
(k <- ncol(dtf))
(n <- nrow(dtf))
(N <- length(unlist(dtf)))
(dfW <- N - k)
(dfB <- k - 1)
(dfT <- N - 1)
(dfS <- n - 1)
(dfWcorr <- dfW-dfS)

# mean
(xbar <- mean(unlist(dtf)))
(xibar <- colMeans(dtf))
(xjbar <- rowMeans(dtf))

# SS
(SSW1 <- sum((dtf$before - xibar[1]) ^ 2))
(SSW2 <- sum((dtf$one - xibar[2]) ^ 2))
(SSW3 <- sum((dtf$two - xibar[3]) ^ 2))
(SSW4 <- sum((dtf$three - xibar[4]) ^ 2))
(SSW <- SSW1 + SSW2 + SSW3 + SSW4)
sum(apply(dtf, 2, function(x) (x - mean(x)) ^2))

(SSB1 <- n * (xibar[1] - xbar) ^ 2)
(SSB2 <- n * (xibar[2] - xbar) ^ 2)
(SSB3 <- n * (xibar[3] - xbar) ^ 2)
(SSB4 <- n * (xibar[4] - xbar) ^ 2)
(SSB <- SSB1 + SSB2 + SSB3 + SSB4)
sum(n * (xibar - xbar) ^ 2)

(SSS <- k * sum((xjbar - xbar) ^ 2))
(SSWcorr <- SSW-SSS)

# Double check
(SST <- sum((unlist(dtf) - xbar) ^ 2))
SSW + SSB

# F
(MSB <- SSB / dfB)
(MSWcorr <- SSWcorr / dfWcorr)
(F_score <- MSB / MSWcorr)
(F_critical <- qf(0.95, df1 = dfB, df2 = dfWcorr))
pf(F_score, df1 = dfB, df2 = dfWcorr, lower.tail = FALSE)

The ANOVA table for the Weight-loss program
Source	\(df\)	\(SS\)	\(MS\)	\(F\)
Within samples	36	1.18409^{4}	25	24.4851201
Between samples	3	569.075	189.6916667
Subjects (Row)	9	1.1631725^{4}
Within Corrected	27	209.175	7.7472222
Total	39	1.2409975^{4}

Decision.

With 3 and 27 degrees of freedom, the critical \(F\) for \(\alpha = 0.05\) is 2.9603513, which is smaller than the calculated \(F\) value 24.4851201. Thus, the decision is to reject \(H_0\).

One step:

dtf2 <- stack(dtf)
names(dtf2) <- c("w", "level")
w_aov <- aov(w ~ level, data = dtf2)
summary(w_aov)

            Df Sum Sq Mean Sq F value Pr(>F)
level        3    569   189.7   0.577  0.634
Residuals   36  11841   328.9

dtf2$subject <- rep(LETTERS[1:10], 4)
w_aov2 <- aov(w ~ level + Error(subject/level), data = dtf2)
summary(w_aov2)


Error: subject
          Df Sum Sq Mean Sq F value Pr(>F)
Residuals  9  11632    1292               

Error: subject:level
          Df Sum Sq Mean Sq F value  Pr(>F)    
level      3  569.1  189.69   24.48 7.3e-08 ***
Residuals 27  209.2    7.75                    
---
Signif. codes:  0 '***' 0.001 '**' 0.01 '*' 0.05 '.' 0.1 ' ' 1

Conclusion.

This program has significant effect on weight-loss.

6 Two-way ANOVA

Demo: Rats on diets

A biologist studies the weight gain of lab rats dependently on diets and gender over a 4-week period. Three different diets are applied. . Do diet and gender have an effect on weight gain?

Weight gain of male and female lab rats
	Diet 1	Diet2	Diet3
Male	90	120	125
	95	125	130
	100	130	135
Female	75	100	118
	78	118	125
	90	112	132

Can we apply a one-way ANOVA for diet effect, then another one-way ANOVA for gender effect?

Transform the data frame as:

dtf <- data.frame(
  w = c(90,95,100,75,78,90,120,125,130,100,118,112,125,130,135,118,125,132),
  diet = rep(c("Diet1", "Diet2", "Diet3"), each = 6),
  gender = rep(c("Male", "Female"), each = 3)
)

w	diet	gender
90	Diet1	Male
95	Diet1	Male
100	Diet1	Male
75	Diet1	Female
78	Diet1	Female
90	Diet1	Female
120	Diet2	Male
125	Diet2	Male
130	Diet2	Male
100	Diet2	Female
118	Diet2	Female
112	Diet2	Female
125	Diet3	Male
130	Diet3	Male
135	Diet3	Male
118	Diet3	Female
125	Diet3	Female
132	Diet3	Female

Interaction:

A

No significant effect of diet,
No significant effect of gender,
No interaction

B

No significant effect of diet,
Effect of gender,
No interaction

C

Effect of diet,
Effect of gender,
No interaction

D

Effect of diet in males
No significant effect of diet in females,
Effect of gender,
Interaction (positive)

E

Effect of diet in males,
No significant effect of diet in females,
Effect of gender,
Interaction (negative)

F

Effect of diet in male,
Effect of diet in female,
Effect of gender,
Interaction (negative)

A data set which has two independent variable with of multiple levels. Suppose we have \(K\) levels for independent variable 1 (columns) and \(M\) categories for independent variable 2.
	Level 1	Level 2	…	Level \(k\)	…	Level \(K\)
Category 1	\(x_{1,1,1}\)	\(x_{1,2,1}\)	…	\(x_{1,k,1}\)	…	\(x_{1, K, 1}\)
	\(x_{1,1,2}\)	\(x_{1,2,2}\)	…	\(x_{1,k,2}\)	…	\(x_{1,K,2}\)
	…	…	…	…	…	…
	\(x_{1,1, j}\)	\(x_{1,2, j}\)	…	\(x_{1,k, j}\)	…	\(x_{1,K,j}\)
	…	…	…	…	…	…
	\(x_{1,1, J}\)	\(x_{1,2, J}\)	…	\(x_{1,k, J}\)	…	\(x_{1,K,J}\)
Category 2	\(x_{2,1,1}\)	\(x_{2,2,1}\)	…	\(x_{2,k,1}\)	…	\(x_{2, K, 1}\)
	\(x_{2,1,2}\)	\(x_{2,2,2}\)	…	\(x_{2,k,2}\)	…	\(x_{2,K,2}\)
	…	…	…	…	…	…
	\(x_{2,1, j}\)	\(x_{2,2, j}\)	…	\(x_{2,k, j}\)	…	\(x_{2,K,j}\)
	…	…	…	…	…	…
	\(x_{2,1, J}\)	\(x_{2,2, J}\)	…	\(x_{2,k, J}\)	…	\(x_{2,K,J}\)
…	…	…	…	…	…	…
Category m	\(x_{m,1,1}\)	\(x_{m,2,1}\)	…	\(x_{m,k,1}\)	…	\(x_{m, K, 1}\)
	\(x_{m,1,2}\)	\(x_{m,2,2}\)	…	\(x_{m,k,2}\)	…	\(x_{m,K,2}\)
	…	…	…	…	…	…
	\(x_{m,1, j}\)	\(x_{m,2, j}\)	…	\(x_{m,k, j}\)	…	\(x_{m,K,j}\)
	…	…	…	…	…	…
	\(x_{m,1, J}\)	\(x_{m,2, J}\)	…	\(x_{m,k, J}\)	…	\(x_{m,K,J}\)
…	…	…	…	…	…	…
Category M	\(x_{M,1,1}\)	\(x_{M,2,1}\)	…	\(x_{M,k,1}\)	…	\(x_{M, K, 1}\)
	\(x_{M,1,2}\)	\(x_{M,2,2}\)	…	\(x_{M,k,2}\)	…	\(x_{M,K,2}\)
	…	…	…	…	…	…
	\(x_{M,1, j}\)	\(x_{M,2, j}\)	…	\(x_{M,k, j}\)	…	\(x_{M,K,j}\)
	…	…	…	…	…	…
	\(x_{M,1, J}\)	\(x_{M,2, J}\)	…	\(x_{M,k, J}\)	…	\(x_{M,K,J}\)

The Entries of Two-Way ANOVA Table
Source	\(df\)	\(SS\)	\(MS\)	\(F\)
V1 (row)	\(df_r = M-1\)	\(SS_r = \sum_{m = 1}^{M} (\bar x_{m} - \bar x)^2\)	\(MS_r = \frac{SS_r}{df_r}\)	\(F_r = \frac{MS_r}{MS_W}\)
V2 (column)	\(df_c = K - 1\)	\(SS_c = \sum_{k = 1}^K (\bar x_k - \bar x)^2\)	\(MS_c = \frac{SS_c}{df_c}\)	\(F_c = \frac{MS_c}{MS_W}\)
Interaction	\(df_I=df_r df_c\)	\(SS_I = SS_B-SS_c-SS_r\)	\(MS_I = \frac{SS_I}{df_I}\)	\(F_I = \frac{MS_I}{MS_W}\)
Within samples	\(df_W = MK(J-1)\)	\(SS_W = \sum_{m=1}^{M} \sum_{k=1}^{K}(x_{m,k,j}-\bar x_{m,k})^2\)	\(MS_W = \frac{SS_W}{df_W}\)
Between samples	\(df_B = MK - 1\)	\(SS_B = J\sum_{m = 1}^M \sum_{k=1}^{K}(\bar x_{m,k} - \bar x)^2\)	\(MS_B = \frac {SS_B}{df_B}\)
Total	\(df_T = MKJ - 1\)	\(SS_T = \sum(x-\bar x) ^2\)	\(MS_T = \frac{SS_T}{df_T}\)

Action: Fill in the ANOVA table on the basis of the following data and draw your conclusion (15 minutes).

The ANOVA table for the weight gain experiment
Source	\(df\)	\(SS\)	\(MS\)	\(F\)
V1 (row)
V2 (column)
Interaction
Within samples
Between samples
Total

Click to see the results

dtf <- data.frame(
  w = c(90,95,100,75,78,90,120,125,130,100,118,112,125,130,135,118,125,132),
  diet = rep(c("Diet1", "Diet2", "Diet3"), each = 6),
  gender = rep(c("Male", "Female"), each = 3)
)

# df
(M <- nlevels(as.factor(dtf$gender)))

[1] 2

(K <- nlevels(as.factor(dtf$diet)))

[1] 3

(n <- nrow(dtf))

[1] 18

(J <- n / M / K)

[1] 3

(dfr <- M - 1)

[1] 1

(dfc <- K - 1)

[1] 2

(dfI <- dfr * dfc)

[1] 2

(dfW <- M * K * ( J - 1))

[1] 12

(dfB <- M * K - 1)

[1] 5

(dfT <- n - 1)

[1] 17

# mean
(xbar <- mean(dtf$w))

[1] 111

(xmk_bar <- tapply(dtf$w, list(dtf$diet, dtf$gender), mean))

      Female Male
Diet1     81   95
Diet2    110  125
Diet3    125  130

dtf$xmk_bar <- mapply(function(d, g) xmk_bar[d, g], dtf$diet, dtf$gender)
(xm_bar <- colMeans(xmk_bar))

  Female     Male 
105.3333 116.6667

(xk_bar <- rowMeans(xmk_bar))

Diet1 Diet2 Diet3 
 88.0 117.5 127.5

# SS
(SSB <- J * sum((xmk_bar - xbar) ^ 2))

[1] 5730

(SSW <- sum((dtf$w - dtf$xmk_bar) ^2))

[1] 542

(SST <- sum((dtf$w - xbar) ^ 2))

[1] 6272

(SSr <- K * J * sum((xm_bar - xbar) ^2))

[1] 578

(SSc <- M * J * sum((xk_bar - xbar) ^ 2))

[1] 5061

(SSI <- SSB - SSc - SSr)

[1] 91

(SSS <- k * sum((xjbar - xbar) ^ 2))

[1] 245874.8

(SSWccor <- SSW-SSS)

[1] -245332.8

# Double check
SSW + SSB

[1] 6272

SST

[1] 6272

# MS
(MSr <- SSr / dfr)

[1] 578

(MSc <- SSc / dfc)

[1] 2530.5

(MSI <- SSI / dfI)

[1] 45.5

(MSW <- SSW / dfW)

[1] 45.16667

# F
(F_r <- MSr / MSW)

[1] 12.79705

qf(0.95, df1 = dfr, df2 = dfW)

[1] 4.747225

pf(F_r, df1 = dfr, df2 = dfW, lower.tail = FALSE)

[1] 0.0038011

(F_c <- MSc / MSW)

[1] 56.02583

qf(0.95, df1 = dfc, df2 = dfW)

[1] 3.885294

pf(F_c, df1 = dfc, df2 = dfW, lower.tail = FALSE)

[1] 8.193548e-07

(F_I <- MSI / MSW)

[1] 1.00738

qf(0.95, df1 = dfI, df2 = dfW)

[1] 3.885294

pf(F_I, df1 = dfI, df2 = dfW, lower.tail = FALSE)

[1] 0.3940701

The ANOVA table for the weight gain experiment
Source	\(df\)	\(SS\)	\(MS\)	\(F\)
V1 (row)	1	578	578	12.797048
V2 (column)	2	5061	2530.5	56.0258303
Interaction	2	91	45.5	1.0073801
Within samples	12	542	45.1666667
Between samples	5	5730	189.6916667
Total	17	6272

Conclusion:

Both diet and gender have a significant effect on the weight gain of rats, and there is no significant interaction between gender and diet in weight gain.

One-step:

aov_wg <- aov(w ~ diet * gender, data = dtf)
summary(aov_wg)

            Df Sum Sq Mean Sq F value   Pr(>F)    
diet         2   5061  2530.5  56.026 8.19e-07 ***
gender       1    578   578.0  12.797   0.0038 ** 
diet:gender  2     91    45.5   1.007   0.3941    
Residuals   12    542    45.2                     
---
Signif. codes:  0 '***' 0.001 '**' 0.01 '*' 0.05 '.' 0.1 ' ' 1

7 Readings

The R Book - Chapter 11

8 Highlights

Carry out a step-by-step one-way ANOVA (repeated or not repeated) for a scientific question.
- State \(H_0\) and \(H_1\).
- Calculate the critical value for a given \(\alpha\).
- Calculate the testing statistics (\(F\) score).
- Draw a conclusion for the scientific question.

	before	one month	two months	three months
A	198	194	191	188
B	201	203	200	196
C	210	200	192	188
D	185	183	180	178
E	204	200	195	191
F	156	153	150	145
G	167	166	167	166
H	197	197	195	192
I	220	215	209	205
J	186	184	179	175

	before	one month	two months	three months
A	198	194	191	188
B	201	203	200	196
C	210	200	192	188
D	185	183	180	178
E	204	200	195	191
F	156	153	150	145
G	167	166	167	166
H	197	197	195	192
I	220	215	209	205
J	186	184	179	175

	before	one month	two months	three months
A	198	194	191	188
B	201	203	200	196
C	210	200	192	188
D	185	183	180	178
E	204	200	195	191
F	156	153	150	145
G	167	166	167	166
H	197	197	195	192
I	220	215	209	205
J	186	184	179	175