<- data.frame(diet1 = c(90, 95, 100),
dtf diet2 = c(120, 125, 130),
diet3 = c(125, 130, 135))
Analysis of Variance (ANOVA)
Numerical vs. categorical
Dr. Peng Zhao (✉ peng.zhao@xjtlu.edu.cn)
Department of Health and Environmental Sciences
Xi’an Jiaotong-Liverpool University
1 Learning objectives
In this lecture, you will
- Understand the concept of the analysis of variance (ANOVA), and
- Carry out one-way and two-way ANOVA for answering scientific questions.
2 Revisit the t-test
Example: Rats on diets
A biologist studies the weight gain of male lab rats on diets over a 4-week period. Three different diets are applied. The results are shown in the following table.
diet1 | diet2 | diet3 |
---|---|---|
90 | 120 | 125 |
95 | 125 | 130 |
100 | 130 | 135 |
- Are the weight gains of the three treatments are all equal?
- In another word, do the diets have influence on the weight gain?
If we do the \(t\)-test between each two treatments…
<- data.frame(Number_of_Samples = 2:10)
dtt $Number_of_Tests <- choose(dtt$Number_of_Samples, 2)
dtt$alpha_overall <- round(1 - (1 - 0.05)^dtt$Number_of_Tests, 2) dtt
Number_of_Samples | Number_of_Tests | alpha_overall |
---|---|---|
2 | 1 | 0.05 |
3 | 3 | 0.14 |
4 | 6 | 0.26 |
5 | 10 | 0.40 |
6 | 15 | 0.54 |
7 | 21 | 0.66 |
8 | 28 | 0.76 |
9 | 36 | 0.84 |
10 | 45 | 0.90 |
Instead of \(t\)-test for testing the means, we can transform the data as:
<- stack(dtf)
dtf2 names(dtf2) <- c("wg", "diet")
wg | diet |
---|---|
90 | diet1 |
95 | diet1 |
100 | diet1 |
120 | diet2 |
125 | diet2 |
130 | diet2 |
125 | diet3 |
130 | diet3 |
135 | diet3 |
and analyse the relationship between the weight gain (numerical variable) of the diet treatments (categorical variable).
3 One-way ANOVA
- Analysis of variance (ANOVA):
- One of the most widely used statistical techniques. The test partitions the total variation present in a set of data into two or more components. Associated with each of these components is a specific source of variation, so that it is possible to ascertain the contributions of each of these sources to the total variation.
- One-way ANOVA:
- A test that concerns only one independent variable (\(x\)), which is called a factor and has multiple levels (settings, groups).
- Hypotheses:
- \(H_0: \mu _1 = \mu _2 = \mu _3 = ... = \mu_k\)
- \(H_1\): at least one mean is different from others
- Reject \(H_0\)? Given \(\alpha\).
- Collect data. Suppose we have \(k\) samples. The \(i\)-th sample has \(n_i\) observations.
Level 1 | Level 2 | Level 3 | … | Level \(k\) | |
---|---|---|---|---|---|
\(x_{1,1}\) | \(x_{2, 1}\) | \(x_{3,1}\) | … | \(x_{k,1}\) | |
\(x_{1,2}\) | \(x_{2,2}\) | \(x_{3,2}\) | … | \(x_{k,2}\) | |
\(x_{1,3}\) | \(x_{2, 3}\) | \(x_{3,3}\) | … | \(x_{k, 3}\) | |
. | . | . | . | . | |
. | \(x_{2, n_2}\) | . | . | . | |
. | \(x_{3,n_3}\) | . | . | ||
\(x_{1, n_1}\) | … | \(x_{k, n_k}\) | |||
Mean | \(\bar x_1\) | \(\bar x_2\) | \(\bar x_3\) | \(\bar x_k\) |
- Calculate a test statistic: \(F\)-test.
Source | \(df\) | \(SS\) | \(MS\) | \(F\) |
---|---|---|---|---|
Within samples | \(df_W = n - k\) | \(SS_W = \sum_{i=1}^{k} \sum_{j=1}^{n_i}(x_{ij}-\bar x_i)^2\) | \(MS_W = \frac{SS_W}{df_W}\) | \(F = \frac{MS_B}{MS_W}\) |
Between samples | \(df_B = k - 1\) | \(SS_B = \sum_{i = 1}^k n_i(\bar x_i - \bar x)^2\) | \(MS_B = \frac {SS_B}{df_B}\) | |
Total | \(df_T = n - 1\) | \(SS_T = \sum_{i = 1}^k \sum _{j=1}^{n_i} (x_{ij}-\bar x) ^2\) | \(MS_T = \frac{SS_T}{df_T}\) |
- \(n\)
- The total number of the observations. \(n = \sum n_i\)
- \(x_{ij}\)
- The \(j\)-th observation in the \(i\)-th sample
- \(\bar x\)
- Overall/grand mean of all the observations. \(\bar x = \frac{\sum x_{ij}}{n}\)
- \(MS\)
-
Mean
ofsquared deviation from the mean - \(SS\)
-
Sum
ofsquared deviation from the mean - \(df\)
- degrees of freedom
If \(H_0\) is true, then any difference you see between the \(k\) samples are due to chance, i.e. \(\sigma^2_\mathrm {between} = \sigma^2_\mathrm {withiin} = \sigma^2_\mathrm {total}\), i.e. \(MS_B = MS_T = MS_W\). ==> \(F\)-test.
From the previous table, we can get:
\[df_T = df_W+ df_B\]
\[SS_T = SS_W + SS_B\]
which can be used for double-check.
- Decision. Reject or not reject \(H_0\).
- Conclusion. Whether the categorical variable has effect on the numerical variable.
Example: Rats on diets
Hypotheses and question:
- \(H_0: \mu _1 = \mu _2 = \mu _3\)
- \(H_1\): at least one mean is different from others
- Reject \(H_0\)?
Collect data:
Calculate a test statistics: \(F\) test.
Action: Fill in the ANOVA table.
Source | \(df\) | \(SS\) | \(MS\) | \(F\) |
---|---|---|---|---|
Within samples | ||||
Between samples | ||||
Total |
Code
# df
<- ncol(dtf))
(k <- length(unlist(dtf)))
(n <- n - k)
(dfW <- k - 1)
(dfB <- n-1)
(dfT
# mean
<- mean(unlist(dtf)))
(xbar <- colMeans(dtf))
(xibar
# SS
<- sum((dtf$diet1 - xibar[1]) ^ 2))
(SSW1 <- sum((dtf$diet2 - xibar[2]) ^ 2))
(SSW2 <- sum((dtf$diet3 - xibar[3]) ^ 2))
(SSW3 <- SSW1 + SSW2 + SSW3)
(SSW <- length(dtf$diet1) * (xibar[1] - xbar) ^ 2)
(SSB1 <- length(dtf$diet2) * (xibar[2] - xbar) ^ 2)
(SSB2 <- length(dtf$diet3) * (xibar[3] - xbar) ^ 2)
(SSB3 <- SSB1 + SSB2 + SSB3)
(SSB
# Double check
<- sum((unlist(dtf) - xbar) ^ 2))
(SST + SSB
SSW
/SST # Correlation ratio
SSB
# F
<- SSW / dfW)
(MSW <- SSB / dfB)
(MSB <- MSB / MSW)
(F_score <- qf(0.95, df1 = dfB, df2 = dfW))
(F_critical pf(F_score, df1 = dfB, df2 = dfW, lower.tail = FALSE)
Diet 1 | Diet2 | Diet3 | Total | |
---|---|---|---|---|
90 | 120 | 125 | ||
95 | 125 | 130 | ||
100 | 130 | 135 | ||
\(\bar x_i\) | 95 | 125 | 130 | |
\(SS_W\) | 50 | 50 | 50 | 150 |
\(SS_B\) | 1408.3333333 | 208.3333333 | 533.3333333 | 2150 |
Source | \(df\) | \(SS\) | \(MS\) | \(F\) |
---|---|---|---|---|
Within samples | 6 | 150 | 25 | 43 |
Between samples | 2 | 2150 | 1075 | |
Total | 2300 |
- Decision.
Since the F value is 43, which exceeds the critical value of 5.1432528 at the significance level of \(\alpha = 0.05\), we can reject \(H_0\).
One step:
<- aov(wg ~ diet, data = dtf2)
wg_aov summary(wg_aov)
Df Sum Sq Mean Sq F value Pr(>F)
diet 2 2150 1075 43 0.000277 ***
Residuals 6 150 25
---
Signif. codes: 0 '***' 0.001 '**' 0.01 '*' 0.05 '.' 0.1 ' ' 1
- Conclusion.
There is a significant effect of diet on the weight again of male laboratory rats.
The connection between ANOVA and t-test:
Code
<- dtf2[1:6, ]
dtf3 t.test(wg~diet, dtf3)
summary(aov(wg~diet, dtf3))
4 Understand ANOVA
4.1 The model
\[x_{i j}=\mu+\tau_{j}+\epsilon_{i j} ; \quad i=1,2, \ldots, n_{j} ; \quad j=1,2, \ldots, k\]
- \(\mu\)
- The grand mean. The mean of all \(k\) population means.
- \(\tau_{j}\)
- The treatment effect. The difference between the mean of the j-th population and the grand mean (\(\bar x_j - \mu\)).
- \(\epsilon_{i j}\)
- The error term. The amount by which an individual measurement differs from the mean of the population to which it belongs (\(x_{ij} - \bar x_j\)).
\(H_{0}: \mu_{1}=\mu_{2}=\cdots=\mu_{k}\)
\(H_{A}:\) not all \(\mu_{k}\) are equal
4.2 Assumptions
- The \(k\) sets of observed data constitute \(k\) independent random samples from the respective populations.
- Each of the populations from which the samples come is normally distributed with mean \(\mu_{j}\) and variance \(\sigma_{j}^{2}\).
- Each of the populations has the same variance. That is, \(\sigma_{1}^{2}=\sigma_{2}^{2}=\cdots=\sigma_{k}^{2}=\sigma^{2},\) the common variance.
- The \(\tau_{j}\) are unknown constants and \(\sum \tau_{j}=0\) since the sum of all deviations of the \(\mu_{j}\) from their mean, \(\mu,\) is zero.
- The \(\epsilon_{i j}\) have a mean of \(0,\) since the mean of \(x_{i j}\) is \(\mu_{j}\)
- The \(\epsilon_{i j}\) have a variance equal to the variance of the \(x_{i j},\) since the \(\epsilon_{i j}\) and \(x_{i j}\) differ only by a constant; that is, the error variance is equal to \(\sigma^{2},\) the common variance specified in the assumption above.
- The \(\epsilon_{i j}\) are normally (and independently) distributed.
4.3 Estimating the statistics
The population variance \(\sigma^{2}\) may be estimated in two ways:
- \(\sigma ^2 = MS_W = \frac{SS_W}{df_W}\)
- \(\sigma ^2 = MS_B = \frac{SS_B}{df_B}\)
Compare the two estimates of the population variance: \(F_\mathrm {score} = MSB/MSW\).
- The numerator df: \(k − 1\).
- The denominator df: \(n - k\).
5 Repeated measures
Repeated measures (matched samples, randomized blocks, within subjects):
- \(k\) samples.
- Each sample has \(n\) observations.
ID | Level 1 | Level 2 | Level 3 | … | Level \(k\) |
---|---|---|---|---|---|
1 | \(x_{1,1}\) | \(x_{2, 1}\) | \(x_{3,1}\) | … | \(x_{k,1}\) |
2 | \(x_{1,2}\) | \(x_{2,2}\) | \(x_{3,2}\) | … | \(x_{k,2}\) |
3 | \(x_{1,3}\) | \(x_{2, 3}\) | \(x_{3,3}\) | … | \(x_{k, 3}\) |
. | . | . | . | . | . |
n | \(x_{1, n}\) | \(x_{2, n}\) | \(x_{3, n}\). | … | \(x_{k, n}\) |
Source | \(df\) | \(SS\) | \(MS\) | \(F\) |
---|---|---|---|---|
Within samples | \(df_W = nk - k\) | \(SS_W = \sum_{i=1}^{k} \sum_{j=1}^{n}(x_{ij}-\bar x_i)^2\) | \(MS_W = \frac{SS_W}{df_W}\) | \(F = \frac{MS_B}{MS_{Wcorr}}\) |
Between samples | \(df_B = k - 1\) | \(SS_B = \sum_{i = 1}^k n(\bar x_i - \bar x)^2\) | \(MS_B = \frac {SS_B}{df_B}\) | |
Subjects (Row) | \(df_S = n-1\) | \(SS_S = k\sum _{j=1}^{n} (\bar x_j - \bar x)\) | ||
Within Corrected | \(df_{Wcorr} = df_W- df_S\) | \(SS_{Wcorr} = SS_W-SS_S\) | \(MS_{Wcorr} = \frac{SS_{Wcorr}}{df_{Wcorr}}\) | |
Total | \(df_T = nk - 1\) | \(SS_T = \sum_{i = 1}^k \sum _{j=1}^{n} (x_{ij}-\bar x) ^2\) | \(MS_T = \frac{SS_T}{df_T}\) |
Demo: Weight-loss program
<- data.frame(
dtf # id = LETTERS[1:10],
before = c(198, 201, 210, 185, 204, 156, 167, 197, 220, 186),
one = c(194, 203, 200, 183, 200, 153, 166, 197, 215, 184),
two = c(191, 200, 192, 180, 195, 150, 167, 195, 209, 179),
three = c(188, 196, 188, 178, 191, 145, 166, 192, 205, 175)
)rownames(dtf) <- LETTERS[1:10]
before | one month | two months | three months | |
---|---|---|---|---|
A | 198 | 194 | 191 | 188 |
B | 201 | 203 | 200 | 196 |
C | 210 | 200 | 192 | 188 |
D | 185 | 183 | 180 | 178 |
E | 204 | 200 | 195 | 191 |
F | 156 | 153 | 150 | 145 |
G | 167 | 166 | 167 | 166 |
H | 197 | 197 | 195 | 192 |
I | 220 | 215 | 209 | 205 |
J | 186 | 184 | 179 | 175 |
Is the weight-loss program effective?
Click to see the transformed data frame
Action: Follow the steps and fill in the ANOVA table.
Hypotheses and question:
- \(H_0: \mu_0 = \mu_1 = \mu_2 = \mu_3\)
- \(H_1:\) Not \(H_0\)
- Question: Reject \(H_0\)? Given \(\alpha\).
Collect data
Calculate a test statistic.
Source | \(df\) | \(SS\) | \(MS\) | \(F\) |
---|---|---|---|---|
Within samples | ||||
Between samples | ||||
Subjects (Row) | ||||
Within Corrected | ||||
Total |
Click to see the results
<- data.frame(
dtf # id = LETTERS[1:10],
before = c(198, 201, 210, 185, 204, 156, 167, 197, 220, 186),
one = c(194, 203, 200, 183, 200, 153, 166, 197, 215, 184),
two = c(191, 200, 192, 180, 195, 150, 167, 195, 209, 179),
three = c(188, 196, 188, 178, 191, 145, 166, 192, 205, 175)
)
# df
<- ncol(dtf))
(k <- nrow(dtf))
(n <- length(unlist(dtf)))
(N <- N - k)
(dfW <- k - 1)
(dfB <- N - 1)
(dfT <- n - 1)
(dfS <- dfW-dfS)
(dfWcorr
# mean
<- mean(unlist(dtf)))
(xbar <- colMeans(dtf))
(xibar <- rowMeans(dtf))
(xjbar
# SS
<- sum((dtf$before - xibar[1]) ^ 2))
(SSW1 <- sum((dtf$one - xibar[2]) ^ 2))
(SSW2 <- sum((dtf$two - xibar[3]) ^ 2))
(SSW3 <- sum((dtf$three - xibar[4]) ^ 2))
(SSW4 <- SSW1 + SSW2 + SSW3 + SSW4)
(SSW sum(apply(dtf, 2, function(x) (x - mean(x)) ^2))
<- n * (xibar[1] - xbar) ^ 2)
(SSB1 <- n * (xibar[2] - xbar) ^ 2)
(SSB2 <- n * (xibar[3] - xbar) ^ 2)
(SSB3 <- n * (xibar[4] - xbar) ^ 2)
(SSB4 <- SSB1 + SSB2 + SSB3 + SSB4)
(SSB sum(n * (xibar - xbar) ^ 2)
<- k * sum((xjbar - xbar) ^ 2))
(SSS <- SSW-SSS)
(SSWcorr
# Double check
<- sum((unlist(dtf) - xbar) ^ 2))
(SST + SSB
SSW
# F
<- SSB / dfB)
(MSB <- SSWcorr / dfWcorr)
(MSWcorr <- MSB / MSWcorr)
(F_score <- qf(0.95, df1 = dfB, df2 = dfWcorr))
(F_critical pf(F_score, df1 = dfB, df2 = dfWcorr, lower.tail = FALSE)
Source | \(df\) | \(SS\) | \(MS\) | \(F\) |
---|---|---|---|---|
Within samples | 36 | 1.18409^{4} | 25 | 24.4851201 |
Between samples | 3 | 569.075 | 189.6916667 | |
Subjects (Row) | 9 | 1.1631725^{4} | ||
Within Corrected | 27 | 209.175 | 7.7472222 | |
Total | 39 | 1.2409975^{4} |
Decision.
With 3 and 27 degrees of freedom, the critical \(F\) for \(\alpha = 0.05\) is 2.9603513, which is smaller than the calculated \(F\) value 24.4851201. Thus, the decision is to reject \(H_0\).
One step:
<- stack(dtf)
dtf2 names(dtf2) <- c("w", "level")
<- aov(w ~ level, data = dtf2)
w_aov summary(w_aov)
Df Sum Sq Mean Sq F value Pr(>F)
level 3 569 189.7 0.577 0.634
Residuals 36 11841 328.9
$subject <- rep(LETTERS[1:10], 4)
dtf2<- aov(w ~ level + Error(subject/level), data = dtf2)
w_aov2 summary(w_aov2)
Error: subject
Df Sum Sq Mean Sq F value Pr(>F)
Residuals 9 11632 1292
Error: subject:level
Df Sum Sq Mean Sq F value Pr(>F)
level 3 569.1 189.69 24.48 7.3e-08 ***
Residuals 27 209.2 7.75
---
Signif. codes: 0 '***' 0.001 '**' 0.01 '*' 0.05 '.' 0.1 ' ' 1
- Conclusion.
This program has significant effect on weight-loss.
6 Two-way ANOVA
Demo: Rats on diets
A biologist studies the weight gain of lab rats dependently on diets and gender over a 4-week period. Three different diets are applied. . Do diet and gender have an effect on weight gain?
Diet 1 | Diet2 | Diet3 | |
---|---|---|---|
Male | 90 | 120 | 125 |
95 | 125 | 130 | |
100 | 130 | 135 | |
Female | 75 | 100 | 118 |
78 | 118 | 125 | |
90 | 112 | 132 |
Can we apply a one-way ANOVA for diet effect, then another one-way ANOVA for gender effect?
Transform the data frame as:
<- data.frame(
dtf w = c(90,95,100,75,78,90,120,125,130,100,118,112,125,130,135,118,125,132),
diet = rep(c("Diet1", "Diet2", "Diet3"), each = 6),
gender = rep(c("Male", "Female"), each = 3)
)
w | diet | gender |
---|---|---|
90 | Diet1 | Male |
95 | Diet1 | Male |
100 | Diet1 | Male |
75 | Diet1 | Female |
78 | Diet1 | Female |
90 | Diet1 | Female |
120 | Diet2 | Male |
125 | Diet2 | Male |
130 | Diet2 | Male |
100 | Diet2 | Female |
118 | Diet2 | Female |
112 | Diet2 | Female |
125 | Diet3 | Male |
130 | Diet3 | Male |
135 | Diet3 | Male |
118 | Diet3 | Female |
125 | Diet3 | Female |
132 | Diet3 | Female |
Interaction:
- A
- No significant effect of diet,
- No significant effect of gender,
- No interaction
- B
-
- No significant effect of diet,
- Effect of gender,
- No interaction
- C
-
- Effect of diet,
- Effect of gender,
- No interaction
- D
-
- Effect of diet in males
- No significant effect of diet in females,
- Effect of gender,
- Interaction (positive)
- E
-
- Effect of diet in males,
- No significant effect of diet in females,
- Effect of gender,
- Interaction (negative)
- F
-
- Effect of diet in male,
- Effect of diet in female,
- Effect of gender,
- Interaction (negative)
Level 1 | Level 2 | … | Level \(k\) | … | Level \(K\) | |
---|---|---|---|---|---|---|
Category 1 | \(x_{1,1,1}\) | \(x_{1,2,1}\) | … | \(x_{1,k,1}\) | … | \(x_{1, K, 1}\) |
\(x_{1,1,2}\) | \(x_{1,2,2}\) | … | \(x_{1,k,2}\) | … | \(x_{1,K,2}\) | |
… | … | … | … | … | … | |
\(x_{1,1, j}\) | \(x_{1,2, j}\) | … | \(x_{1,k, j}\) | … | \(x_{1,K,j}\) | |
… | … | … | … | … | … | |
\(x_{1,1, J}\) | \(x_{1,2, J}\) | … | \(x_{1,k, J}\) | … | \(x_{1,K,J}\) | |
Category 2 | \(x_{2,1,1}\) | \(x_{2,2,1}\) | … | \(x_{2,k,1}\) | … | \(x_{2, K, 1}\) |
\(x_{2,1,2}\) | \(x_{2,2,2}\) | … | \(x_{2,k,2}\) | … | \(x_{2,K,2}\) | |
… | … | … | … | … | … | |
\(x_{2,1, j}\) | \(x_{2,2, j}\) | … | \(x_{2,k, j}\) | … | \(x_{2,K,j}\) | |
… | … | … | … | … | … | |
\(x_{2,1, J}\) | \(x_{2,2, J}\) | … | \(x_{2,k, J}\) | … | \(x_{2,K,J}\) | |
… | … | … | … | … | … | … |
Category m | \(x_{m,1,1}\) | \(x_{m,2,1}\) | … | \(x_{m,k,1}\) | … | \(x_{m, K, 1}\) |
\(x_{m,1,2}\) | \(x_{m,2,2}\) | … | \(x_{m,k,2}\) | … | \(x_{m,K,2}\) | |
… | … | … | … | … | … | |
\(x_{m,1, j}\) | \(x_{m,2, j}\) | … | \(x_{m,k, j}\) | … | \(x_{m,K,j}\) | |
… | … | … | … | … | … | |
\(x_{m,1, J}\) | \(x_{m,2, J}\) | … | \(x_{m,k, J}\) | … | \(x_{m,K,J}\) | |
… | … | … | … | … | … | … |
Category M | \(x_{M,1,1}\) | \(x_{M,2,1}\) | … | \(x_{M,k,1}\) | … | \(x_{M, K, 1}\) |
\(x_{M,1,2}\) | \(x_{M,2,2}\) | … | \(x_{M,k,2}\) | … | \(x_{M,K,2}\) | |
… | … | … | … | … | … | |
\(x_{M,1, j}\) | \(x_{M,2, j}\) | … | \(x_{M,k, j}\) | … | \(x_{M,K,j}\) | |
… | … | … | … | … | … | |
\(x_{M,1, J}\) | \(x_{M,2, J}\) | … | \(x_{M,k, J}\) | … | \(x_{M,K,J}\) |
Source | \(df\) | \(SS\) | \(MS\) | \(F\) |
---|---|---|---|---|
V1 (row) | \(df_r = M-1\) | \(SS_r = \sum_{m = 1}^{M} (\bar x_{m} - \bar x)^2\) | \(MS_r = \frac{SS_r}{df_r}\) | \(F_r = \frac{MS_r}{MS_W}\) |
V2 (column) | \(df_c = K - 1\) | \(SS_c = \sum_{k = 1}^K (\bar x_k - \bar x)^2\) | \(MS_c = \frac{SS_c}{df_c}\) | \(F_c = \frac{MS_c}{MS_W}\) |
Interaction | \(df_I=df_r df_c\) | \(SS_I = SS_B-SS_c-SS_r\) | \(MS_I = \frac{SS_I}{df_I}\) | \(F_I = \frac{MS_I}{MS_W}\) |
Within samples | \(df_W = MK(J-1)\) | \(SS_W = \sum_{m=1}^{M} \sum_{k=1}^{K}(x_{m,k,j}-\bar x_{m,k})^2\) | \(MS_W = \frac{SS_W}{df_W}\) | |
Between samples | \(df_B = MK - 1\) | \(SS_B = J\sum_{m = 1}^M \sum_{k=1}^{K}(\bar x_{m,k} - \bar x)^2\) | \(MS_B = \frac {SS_B}{df_B}\) | |
Total | \(df_T = MKJ - 1\) | \(SS_T = \sum(x-\bar x) ^2\) | \(MS_T = \frac{SS_T}{df_T}\) |
Action: Fill in the ANOVA table on the basis of the following data and draw your conclusion (15 minutes).
Source | \(df\) | \(SS\) | \(MS\) | \(F\) |
---|---|---|---|---|
V1 (row) | ||||
V2 (column) | ||||
Interaction | ||||
Within samples | ||||
Between samples | ||||
Total |
Click to see the results
<- data.frame(
dtf w = c(90,95,100,75,78,90,120,125,130,100,118,112,125,130,135,118,125,132),
diet = rep(c("Diet1", "Diet2", "Diet3"), each = 6),
gender = rep(c("Male", "Female"), each = 3)
)
# df
<- nlevels(as.factor(dtf$gender))) (M
[1] 2
<- nlevels(as.factor(dtf$diet))) (K
[1] 3
<- nrow(dtf)) (n
[1] 18
<- n / M / K) (J
[1] 3
<- M - 1) (dfr
[1] 1
<- K - 1) (dfc
[1] 2
<- dfr * dfc) (dfI
[1] 2
<- M * K * ( J - 1)) (dfW
[1] 12
<- M * K - 1) (dfB
[1] 5
<- n - 1) (dfT
[1] 17
# mean
<- mean(dtf$w)) (xbar
[1] 111
<- tapply(dtf$w, list(dtf$diet, dtf$gender), mean)) (xmk_bar
Female Male
Diet1 81 95
Diet2 110 125
Diet3 125 130
$xmk_bar <- mapply(function(d, g) xmk_bar[d, g], dtf$diet, dtf$gender)
dtf<- colMeans(xmk_bar)) (xm_bar
Female Male
105.3333 116.6667
<- rowMeans(xmk_bar)) (xk_bar
Diet1 Diet2 Diet3
88.0 117.5 127.5
# SS
<- J * sum((xmk_bar - xbar) ^ 2)) (SSB
[1] 5730
<- sum((dtf$w - dtf$xmk_bar) ^2)) (SSW
[1] 542
<- sum((dtf$w - xbar) ^ 2)) (SST
[1] 6272
<- K * J * sum((xm_bar - xbar) ^2)) (SSr
[1] 578
<- M * J * sum((xk_bar - xbar) ^ 2)) (SSc
[1] 5061
<- SSB - SSc - SSr) (SSI
[1] 91
<- k * sum((xjbar - xbar) ^ 2)) (SSS
[1] 245874.8
<- SSW-SSS) (SSWccor
[1] -245332.8
# Double check
+ SSB SSW
[1] 6272
SST
[1] 6272
# MS
<- SSr / dfr) (MSr
[1] 578
<- SSc / dfc) (MSc
[1] 2530.5
<- SSI / dfI) (MSI
[1] 45.5
<- SSW / dfW) (MSW
[1] 45.16667
# F
<- MSr / MSW) (F_r
[1] 12.79705
qf(0.95, df1 = dfr, df2 = dfW)
[1] 4.747225
pf(F_r, df1 = dfr, df2 = dfW, lower.tail = FALSE)
[1] 0.0038011
<- MSc / MSW) (F_c
[1] 56.02583
qf(0.95, df1 = dfc, df2 = dfW)
[1] 3.885294
pf(F_c, df1 = dfc, df2 = dfW, lower.tail = FALSE)
[1] 8.193548e-07
<- MSI / MSW) (F_I
[1] 1.00738
qf(0.95, df1 = dfI, df2 = dfW)
[1] 3.885294
pf(F_I, df1 = dfI, df2 = dfW, lower.tail = FALSE)
[1] 0.3940701
Source | \(df\) | \(SS\) | \(MS\) | \(F\) |
---|---|---|---|---|
V1 (row) | 1 | 578 | 578 | 12.797048 |
V2 (column) | 2 | 5061 | 2530.5 | 56.0258303 |
Interaction | 2 | 91 | 45.5 | 1.0073801 |
Within samples | 12 | 542 | 45.1666667 | |
Between samples | 5 | 5730 | 189.6916667 | |
Total | 17 | 6272 |
Conclusion:
Both diet and gender have a significant effect on the weight gain of rats, and there is no significant interaction between gender and diet in weight gain.
One-step:
<- aov(w ~ diet * gender, data = dtf)
aov_wg summary(aov_wg)
Df Sum Sq Mean Sq F value Pr(>F)
diet 2 5061 2530.5 56.026 8.19e-07 ***
gender 1 578 578.0 12.797 0.0038 **
diet:gender 2 91 45.5 1.007 0.3941
Residuals 12 542 45.2
---
Signif. codes: 0 '***' 0.001 '**' 0.01 '*' 0.05 '.' 0.1 ' ' 1
7 Readings
- The R Book - Chapter 11
8 Highlights
- Carry out a step-by-step one-way ANOVA (repeated or not repeated) for a scientific question.
- State \(H_0\) and \(H_1\).
- Calculate the critical value for a given \(\alpha\).
- Calculate the testing statistics (\(F\) score).
- Draw a conclusion for the scientific question.