Given two sets of samples
and
drawn
from their respective normal distributions, we can carry out either Z-test
or t-test to either accept or reject the null hypothesis
: the two
populations have the same mean i.e.,
.
The two-sample t-test is widely used to compare two populations and see
if they are significantly different from each other. Examples include
comparison of public and private schools in test scores, American and
Japanese cars in fuel efficiency, treatment (drug) and control (placebo)
groups in effectiveness of a drug, etc.
We first find the sample means

(
37)
Consider

(
38)
which as an algebraic sum of two normally distributed variables is
also normally distributed. The test statistic below has a
standard normal distribution under the null hypothesis
:

(
39)
Given a significance level
, we can find the critical value
from the z-Table,
and the null hypothesis
is rejected if
.
If
and
are unknown, we need to estimate
them by the sample variances:

(
40)
and construct a test statistic

(
41)
This is called Welch's t-test, where the pdf of test
statistic
is approximately a t-distribution with the following
degrees of freedom (Welch-Satterthwaite equation):

(
42)
If the sample variances
and
are similar to
each other, we can assume that the two data sets have the same
variance, and find the pooled variance of
to approximate both
and
:

(
43)
and test statistic becomes

(
44)
with a t-distribution of
degrees of freedom.
Substituting the calculated
,
and ESE into
this expression of
, we get the specific value
, based on
which we can further fine the p-value from the
t-Table:

(
45)
If this p-value is smaller than a pre-determined significance level
, we conclude that the data provide significant evidence to
reject the null hypothesis
.
We can also find the lower and upper limits of the confidence interval
of the test statistic
given a significant
:

(
46)
so that
.
The Matlab function ttest2
can be used to carry out two-sample
t-tests.
Example: Given two sets of data below and
,

(
47)
We get
,
,
,
- If we assume the variances of the two data sets are different,
then
,
is in the critical
region, and
.
The confidence interval of the difference between the two means
is
.
- If we assume the variances of the two data sets are the same,
then
,
is in the critical
region, and
.
The confidence interval of the difference between the two means is
.
In either case, the null hypothesis claiming that the two data sets
have the same mean is rejected.