ENV221 L08

Author

Peng Zhao

1 Overview of the module and R

2 R Basic Operations

3 R Programming

4 Statistical Graphs

5 Basic Concepts

6 Descriptive statistics

7 Distributions and the Central Limit Theorem

8 Hypothesis test

8.1 Learning objectives

In this lecture, you will

Understand the objective and procedure of a hypothesis test.
Use the z test for testing the population mean.
Use the F test for comparing two population variances.

8.2 Definitions

8.2.1 A story of tea

Book: The Lady Testing Tea: How statistics Revolutionized Science in the Twentieth Century, David Salsburg

Statistical hypothesis:: A statement about a population parameter.
Null hypothesis (H₀):: a statistical hypothesis that contains a statement of equality, such as \(\ge\), =, or \(\le\).
Alternative hypothesis (H_a, or H₁):: The complement of H₀. It contains a statement of strict inequality, such as >, \(\ne\), or <.
Hypothesis test:: a process that uses sample statistics to test a claim about the value of a population parameter.

8.2.2 A fair coin

Question: Is the coin fair?

H₀: The coin is fair. \(\mu_p = 0.5\).
H₁: The coin is unfair. \(\mu_p \ne 0.5\).

Collect evidence: toss a coin for 100 times.

If you get 20 heads and 80 tails, it is easy (significant) to see that the coin is unfair (reject H₀, claim H₁).
If you get 35 heads, still significant. But how significant is it?
If 45 heads…
If 48 heads…
Type I and Type II errors. Significance level. Rejection region. Critical value.

8.2.3 Type I and Type II errors

A hypothesis test compared with a criminal trial.

Step	Hypothesis Test	Criminal trial
1	Reject or not reject H₀?	Guilty or not guilty?
2	Gather data.	Gather evidence.
3	Calculate a test statistic.	Jury’s discussion.
4	Reject or not reject H₀.	Guilty or not guilty.

	\(H_0\) is actually true	\(H_0\) is actually false
We reject \(H_0\)	Type I Error. False positive. \(\alpha\)	Correct decision
We do not reject \(H_0\)	Correct decision	Type II Error. False negative. \(\beta\)

	Defendant is actually not guilty	Defendant is actually guilty
Jury finds defendant guilty	Type I Error. False positive. \(\alpha\)	Correct decision
Jury finds defendant not guilty	Correct decision	Type II Error. False negative. \(\beta\)

Figure 1: Andy Dufresne (Type I Error) and O.J. Simpton (Type II Error)

\(\alpha\) (Significance level):: The maximum allowable probability of making a type I error. 0.10, 0.05, 0.01.
\(\beta\):: The probability of making a Type II error.
\(P\)-value (Probability value):: The probability of obtaining a sample statistic with a value as extreme or more extreme than the one determined from the sample data. If \(P \le \alpha\), reject \(H_0\).
Rejection region, critical region:: The range of values for which H₀ is not probable.
Critical value:: The value that separates the rejection region from the nonrejection region.

8.2.4 IQs

The population mean of IQ is 100 and the standard deviation is 15.
Take a random sample of 9 XJTLUers and give them IQ tests.
Sample mean: 109. Obviously, 109 > 100.

Questions:

Q1. Do the XJTLUers really have a different mean IQ from normal people? Or does it happen by chance? (Two-tailed z-test)
Q2. Do XJTLUers really have higher mean IQ than normal people? Or does it happen by chance? (One-tailed z-test)

\(H_0\)	\(H_1\)	Tails
\(\mu = M\)	\(\mu \ne M\)	2
\(\mu \ge M\)	\(\mu < M\)	1
\(\mu \le M\)	\(\mu > M\)	1

8.3 Two-tailed z-test

Q1. Do the XJTLUers really have a different mean IQ from normal people? Or does it happen by chance?

CLT: What is the sample distribution of the mean IQ of normal people for \(n = 9\)?

Code

# The population mean
mu <- 100
# The population sd
sigma <- 15

# According to CLT:
# The mean of the sampling distribution of the mean
mu_xbar <- mu
# The sample size
n <- 9
# The standard error of the mean
se <- sigma / sqrt(n)

x <- seq(70, 130, 0.1)
y <- dnorm(x, mean = mu_xbar, sd = se)
plot(x, y, type = "l", las = 1, ylab = 'f(x)', las = 1)
abline(v = c(mu_xbar + se * (-2):2), 
       col = c('red', 'blue', 'grey', 'blue', 'red'))

If the population (all the XJTLUers) really have the same mean IQ as normal people (null hypothesis, \(H_0\)), the probability of \(90 \le \bar x \le 110\) is 95% - Why? CLT.
Therefore, \(\bar x = 109\) is possible.
Conclusion: We do not have confidence to say that the XJTLUers have a different mean IQ from normal people.

8.4 One-tailed z-test

Q2. Do XJTLUers really have a higher mean IQ than normal people? Or does it happen by chance? (One-tailed z-test)

If XJTLUers’ mean IQ is the same as or lower than normal people (\(H_0\)), then, there is a probability of 95% or larger for \(\bar x \le 108.2\).
In another word, the chance of \(\bar x > 108.2\) is less than 5%, which means that it almost won’t happen.
We have confidence to say that XJTLUers have a higher mean IQ than normal people. (Conclusion)

8.5 Rephrase the procedure

Five steps:

Hypotheses:
- \(H_0: \mu_c \le 100\). Their mean IQ is not higher than the normal mean. The high mean IQ in the sample happens by chance.
- \(H_1: \mu_c > 100\). Their mean IQ is higher than the normal mean.
- Reject or not reject \(H_0\)? Given \(\alpha = 0.05\).
Gather data: You take a sample of 9 XJTLUers and give them IQ tests. You get a sample mean of 109.
Find the critical value and the rejection region. Calculate a test statistic for decision: \(z_\mathrm{score}\).

\[z_\mathrm{score} = \frac{x - \mu}{\sigma}\]

xbar <- 109
z_score <- (xbar - mu_xbar)/se
z_score

[1] 1.8

alpha <- 0.05
z_critical <- qnorm(1 - alpha)
z_critical

[1] 1.644854

pnorm(z_score, lower.tail = FALSE)

[1] 0.03593032

Decision (Decide whether to reject or not reject \(H_0\) based on the statistic)

As \(z_\mathrm{score} > z_\mathrm{critical}\), we can reject \(H_0\) at \(\alpha = 0.05\).

Might we be wrong? Yes, but the probability (\(p\)) is less than 5% (\(\alpha\), the significance level). Thus we can reject \(H_0\).

Conclusion (Answer to the original question):

The XJTLUers have a higher mean IQ than normal people.

Example: Nitrogen Dioxide (NO₂)¹

¹ A trace gas which is generally considered a useful indicator for measuring and judging air pollution stemming from motor vehicle sources.

A scientist estimates that the mean NO₂ level in a city is greater than 29 ppb². You want to test this estimate. To do so, you determine the NO₂ levels for 34 randomly selected days. The results (in ppb) are:

² parts per billion

24, 36, 44, 35, 44, 34, 29, 40, 39, 43, 41, 32, 33, 29, 29, 43, 25, 39, 25, 42, 29, 22, 22, 25, 14, 15, 14, 29, 25, 27, 22, 24, 18, 17

Assume the population standard deviation is 9 ppb. At \(\alpha = 0.05\), can you support the scientist’s estimate?

Hypotheses:
- \(H_0\): \(\mu \le 29\)
- \(H_1\): \(\mu > 29\)
- Reject or not reject \(H_0\)? Given \(\alpha = 0.05\).
Gather data.

x <- c(24, 36, 44, 35, 44, 34, 29, 40, 39, 43, 
       41, 32, 33, 29, 29, 43, 25, 39, 25, 42, 
       29, 22, 22, 25, 14, 15, 14, 29, 25, 27, 
       22, 24, 18, 17)

Calculate a test statistic.

mu <- 29
sd_p <- 9
x_bar <- mean(x)
se <- sd_p / sqrt(length(x))
z_score <- (x_bar - mu) / se
z_score

[1] 0.4382742

z_critical <- qnorm(1 - 0.05)
z_critical

[1] 1.644854

p_value <- pnorm(z_score)
p_value

[1] 0.6694062

Rejection region: \(z >\) 1.6448536.

Decide whether to reject \(H_0\) based on the statistic: Do not reject \(H_0\).
Conclusion: There is not enough evidence at the 5% level of significance to support the scientist’s claim that the mean NO₂ is greater than 29 ppb (\(p > 0.05\)). Thus, the mean NO₂ is smaller than or equal to 29 ppb.

Exercises: Rephrase the previous IQ \(z\)-tests in the five steps.

8.6 F-test

8.6.1 F-distribution

\[F = \frac{s_1^2}{s_2^2}\]

where \(s^2_1\) and \(s^2_2\) are the sample variances of two different populations. If both populations are normal and the population variances \(\sigma_1^2 = \sigma_2^2\), then the sampling distribution of F is an F-distribution.

PDF:

\[f(x) = \frac{\Gamma(\frac{\nu_{1} + \nu_{2}} {2}) (\frac{\nu_{1}} {\nu_{2}})^{\frac{\nu_{1}} {2}} x^{\frac{\nu_{1}} {2} - 1 }} {\Gamma(\frac{\nu_{1}} {2}) \Gamma(\frac{\nu_{2}} {2}) (1 + \frac{\nu_{1}x} {\nu_{2}})^{\frac{\nu_{1} + \nu_{2}} {2}} }\]

\[\Gamma(a) = \int_{0}^{\infty} {t^{a-1}e^{-t}dt}\]

Graphs:

Code

par(mfrow = c(1, 2))
n <- c(5, 10, 15, 20)
m <- n

coln <- 1
curve(df(x, df1 = n[1], df2 = m[1]), xlim = c(0, 5), ylim = c(0, 0.7), 
      xlab = "F", ylab = "Density", col = coln, las = 1)
for (nn in n[2:4]) {
  coln <- coln + 1
  curve(df(x, df1 = nn, df2 = m[1]), add = TRUE, col = coln)
}
legend("topright", legend = paste(n, m[1]), col = 1:4, lty = 1, bty = "n")

coln <- 1
curve(df(x, df1 = n[1], df2 = m[1]), xlim = c(0, 5), ylim = c(0, 0.7),
      xlab = "F", ylab = "Density", col = coln, las = 1)
for (mm in m[2:4]) {
  coln <- coln + 1
  curve(df(x, df1 = n[1], df2 = mm), add = TRUE, col = coln)
}
legend("topright", legend = paste(n[1], m), col = 1:4, lty = 1, bty = "n")

Properties:

A family of curves.
Each curve is determined by two types of degrees of freedom: the degrees of freedom corresponding to the variance in the numerator \(df_N\), and the degrees of freedom corresponding to the variance in the denominator \(df_D\).
Positively skewed.
The total area under each curve is 1.
\(F\ge0\).
\(\mu_F \approx 1\).

8.6.2 Compare two variances

Demo: Two 3D printers

A scientist wants to choose between two 3D printers to produce a 3D model. The printing time is 24.58, 22.09, 23.70, 18.89, 22.02, 28.71, 24.44, 20.91, 23.83, 20.83 hours for Printer 1, and 21.61, 19.06, 20.72, 15.77, 19, 25.88, 21.48, 17.85, 20.86, 17.77 hours for Printer 2. Do these two printers have the same variances in printing time?

Hypotheses:

\(H_0: \frac{\sigma_1^2}{\sigma_2^2} = 1\)
\(H_1: \frac{\sigma_1^2}{\sigma_2^2} \ne 1\)
Reject \(H_0\)? Given \(\alpha = 0.05\).

Collect data:

x1 <- c(24.58, 22.09, 23.70, 18.89, 22.02, 28.71, 24.44, 20.91, 23.83, 20.83)
x2 <- c(21.61, 19.06, 20.72, 15.77, 19, 25.88, 21.48, 17.85, 20.86, 17.77)

Calculate a test statistic:

n <- length(x1)
m <- length(x2)
xbar1 <- mean(x1)
xbar2 <- mean(x2)
var1 <- var(x1)
var1

[1] 7.345622

var2 <- var(x2)
var2

[1] 7.776311

Calculate a test statistic:

\[F = \frac{s_\mathrm{large}^2}{s_\mathrm{small}^2}\]

F_score <- var2 / var1
F_score

[1] 1.058632

F_critical <- qf(0.975, df1 = m-1, df2 = n-1)
F_critical

[1] 4.025994

# df1 numerator, df2 denominator
pf(F_score, df1 = m-1, df2 = n-1, lower.tail = FALSE) * 2

[1] 0.9337528

Decision:

As \(F_\mathrm{score} < F_\mathrm{critical, 1 - \alpha}\), we cannot reject \(H_0\) at a 5% significance level.

Conclusion:

These two printers have the same variances in printing time.

One step:

var.test(x2, x1, ratio = 1, alternative = "two.sided", conf.level = 0.95)


    F test to compare two variances

data:  x2 and x1
F = 1.0586, num df = 9, denom df = 9, p-value = 0.9338
alternative hypothesis: true ratio of variances is not equal to 1
95 percent confidence interval:
 0.2629492 4.2620464
sample estimates:
ratio of variances 
          1.058632

8.7 Readings

Elementary Statistics, Chapters 7.1, 7.2, and 10.3

8.8 Highlights

Carry out a step-by-step \(z\)-test and/or \(F\)-test for a scientific question.
- State \(H_0\) and \(H_1\).
- Calculate the critical value for a given \(\alpha\).
- Calculate the testing statistics (\(z\) score for \(z\)-test and \(F\) score for \(F\)-test).
- Draw a conclusion for the scientific question.