Use t test for one-sample and two-sample hypothesis testing.
9.2 Limitations of \(z\) distribution and \(z\)-test
When we know \(\sigma\), we can
estimate population parameters with \(z\)-distribution (see Lecture 7.8), and
do the \(z\)-test for the mean.
What if we do not know \(\sigma\)?
9.3 Student’s \(t\)-distribution
Student’s \(t\)-distribution:
If the distribution of a random variable \(x\) is (approximately) normal, then the statistic \(t\) (\(t = \frac{\bar x - \mu}{s / \sqrt n}\)) follows a t-distribution.
Degrees of freedom (\(\nu\) (nu) or \(df\)):
The number of free choices left after a sample statistic such as \(\bar x\) is calculated. For the \(t\)-distribution, when your sample size is \(n\), \(df = n - 1\).
In Graphs:
Code
mycol <-c("black", "darkblue", "blue", "green", "red")curve(dt(x, df =3), -4, 4, ylim =c(0, 0.4), col = mycol[1], xlab ="t", ylab ="f(t)", las =1)i <-2for (df inc(5, 10, 30)) {curve(dt(x, df = df), add =TRUE, col = mycol[i]) i <- i +1}curve(dnorm(x), add =TRUE, col = mycol[i])legend("topright", lty =1, col = mycol, title ="d.f.",legend =c(3, 5, 10, 30, "Inf"), bty ="n")
Symmetric and bell shaped, like the normal distribution, but has heavier tails.
Mean = median = 0.
The larger the sample size (\(n\)) is, the more the distribution resembles a normal distribution. When \(\nu = \infty\) , \(f(t)=\frac{1}{\sqrt{2 \pi }} e^{-t^{2} / 2}\).
9.4 Estimating Population Parameters with \(t\)-distribution
Example: Battery life
A manufacturer develops a new battery and wants to know how long each battery lasts averagely before it burns out. They test a sample of 30 batteries and find that the sample mean is 90 hours with a standard deviation of 25 hours. Estimate the population mean and the 95 % confidence interval for the mean.
Example: Heavy metal concentration (mg/L) in sediments of a river
A researcher sampled sediments of a river. She took 10 observations as a random sample. The Fe concentrations (mg/L) are 24.29, 21.47, 28.58, 22.41, 23.12, 23.91, 28.51, 16.25, 22.57, 23.17.
c(sample_mean - t_critical * se, sample_mean + t_critical * se)
[1] 20.91987 25.93613
Use t.test():
t.test(x)
One Sample t-test
data: x
t = 21.13, df = 9, p-value = 5.588e-09
alternative hypothesis: true mean is not equal to 0
95 percent confidence interval:
20.91987 25.93613
sample estimates:
mean of x
23.428
How do we get the t score?
t_score <- (sample_mean -0)/set_score
[1] 21.13037
9.5 One-sample test
One-sample test
Test whether a population mean is significantly different from some hypothesized value.
9.5.1 One-tailed
Example: Battery life
A manufacturer develops a new battery and claims that the average battery life is not shorter than 100 hours. A consumer group believes this average is shorter. They test a sample of 10 batteries: 64, 100, 80, 137, 80, 125, 96, 127, 105, 90 hours. Can they prove that this average life is shorter than the manufacturer claims?
the absolute value of \(t\) is between 1.383 and 1.833.
Thus, the \(p\) value is between 0.05 and 0.10.
Decision
The consumer group cannot reject \(H_0\) at a significant level of \(\alpha = 0.05\).
One step in R:
t.test(x, mu =100, alternative ="less")
One Sample t-test
data: x
t = -1.5478, df = 9, p-value = 0.07804
alternative hypothesis: true mean is less than 100
95 percent confidence interval:
-Inf 101.3459
sample estimates:
mean of x
92.7
Conclusion
The average battery life is not shorter than 100 hours.
9.5.2 Two-tailed
Example: Zn in the water
A previous study reported that the mean concentration of zinc (Zn) in the water of a river is 0.02 mg/L. A scientist thinks the real Zn concentration is not 0.02 mg/L. She collects 8 samples and tests them. The concentrations are 0.011, 0.021, 0.001, 0.007, 0.031, 0.023, 0.026, 0.019. Is there evidence to support her hypothesis? Use \(\alpha = 0.10\) to carry out the analysis.
the absolute value of \(t\) is between 0.711 and 0.896.
Thus, the \(p\) value is between 0.4 (0.2 * 2) and 0.5 (0.25 * 2).
Decision.
She cannot reject \(H_0\).
One step in R:
t.test(x, mu =0.02, alternative ="two.sided", conf.level =0.1)
One Sample t-test
data: x
t = -0.73012, df = 7, p-value = 0.489
alternative hypothesis: true mean is not equal to 0.02
10 percent confidence interval:
0.01690656 0.01784344
sample estimates:
mean of x
0.017375
Conclusion.
The mean concentration of Zn is 0.02 mg/L.
9.6 Two-sample test
9.6.1 Sampling distribution of the difference
Two-sample Test
Test the difference between two population means.
The sampling distribution of the difference between means
the distribution of all possible values of differences between pairs of sample means with the sample sizes held constant from pair to pair.
Example: Two 3D printers
A scientist wants to choose between two 3D printers to produce a 3D model. The printing time is 24.58, 22.09, 23.70, 18.89, 22.02, 28.71, 24.44, 20.91, 23.83, 20.83 hours for Printer 1, and 21.61, 19.06, 20.72, 15.77, 19, 25.88, 21.48, 17.85, 20.86, 17.77 hours for Printer 2. Do these two printers have the same mean printing time?
Before we compare the two sample means, we have to test wheter the two population variances are equal: \(F\)-test (see Lecture 8.6).
F test to compare two variances
data: x2 and x1
F = 1.0586, num df = 9, denom df = 9, p-value = 0.9338
alternative hypothesis: true ratio of variances is not equal to 1
95 percent confidence interval:
0.2629492 4.2620464
sample estimates:
ratio of variances
1.058632
Conclusion: These two printers have the same variances in printing time.
pt(t_score, df = n -1+ m -1, lower.tail =FALSE) *2
[1] 0.02528077
Decision
As \(t_{\mathrm{score}} > t_{\mathrm{critical}, 1-\alpha}\) at \(\alpha = 0.05\), we can reject \(H_0\).
One step:
t.test(x1, x2, var.equal =TRUE, alternative ="two.sided", mu =0)
Two Sample t-test
data: x1 and x2
t = 2.4396, df = 18, p-value = 0.02528
alternative hypothesis: true difference in means is not equal to 0
95 percent confidence interval:
0.4164695 5.5835305
sample estimates:
mean of x mean of y
23 20
In a data frame:
dtf <-data.frame(x =c(x1, x2), printer =rep(c("Printer1", "Printer2"), c(length(x1), length(x2))))t.test(x ~ printer, var.equal =TRUE, alternative ="two.sided", mu =0, data = dtf)
Two Sample t-test
data: x by printer
t = 2.4396, df = 18, p-value = 0.02528
alternative hypothesis: true difference in means between group Printer1 and group Printer2 is not equal to 0
95 percent confidence interval:
0.4164695 5.5835305
sample estimates:
mean in group Printer1 mean in group Printer2
23 20
Conclusion.
The two printers have different mean printing times.
9.6.3 Two-sample test with unequal variances
If the variances of the two populations are unequal (based on \(F\)-test):
pt(t_score, df = n -1+ m -1, lower.tail =FALSE) *2
[1] 0.02528077
t.test(x ~ printer, var.equal =FALSE, alternative ="two.sided", mu =0, data = dtf)
Welch Two Sample t-test
data: x by printer
t = 2.4396, df = 17.985, p-value = 0.02529
alternative hypothesis: true difference in means between group Printer1 and group Printer2 is not equal to 0
95 percent confidence interval:
0.4163193 5.5836807
sample estimates:
mean in group Printer1 mean in group Printer2
23 20
9.6.4 Paired samples
When the samples are matched, treat the differences as a one-sample \(t\) test.
Example: Weight-loss program.
Ten people take part in a weight-loss program. They weigh before starting the program, and weigh again after the one-month program. Do they really lose weight?
As \(t_\mathrm{score} > t_\mathrm{critical, 1-\alpha}\), we can reject \(H_0\) at \(\alpha = 0.05\).
One step:
t.test(wl$before, wl$after, alternative ="greater", paired =TRUE)
Paired t-test
data: wl$before and wl$after
t = 2.8241, df = 9, p-value = 0.009956
alternative hypothesis: true mean difference is greater than 0
95 percent confidence interval:
1.017647 Inf
sample estimates:
mean difference
2.9
Conclusion.
They really lose weight.
9.7 Readings
Applied Environmental Statistics with R - Chapter 9
9.8 Highlights
Carry out a step-by-step \(t\)-test for a scientific question.
State \(H_0\) and \(H_1\).
Calculate the critical value for a given \(\alpha\).