t-Test for Linear Regression

The null hypothesis for the linear regression model parameters is the same as the hypothesized true value $w_0=w_{00}$ and $w_1=w_{10}$. If the residual $r_i=y_i-\hat{y_i}=y_i-(w_0+w_1x_i)$ is assumed to be normally distributed with pdf ${\cal N}(0,\sigma^2)$, then the estimated intercept and slope have normal pdfs:

  $\displaystyle w_0 \sim {\cal N}\left(w_{00}, \;
\left(\frac{1}{N}+\frac{\bar{x}...
...
w_1 \sim {\cal N}\left(w_{10},\;\frac{\sigma^2}{\sum_i(x_i-\bar{x})^2}\right)
$ (87)
The unbiased mean square error is
  $\displaystyle S_e^2=\frac{\sum_i r_i^2}{N-2},\;\;\;\;\;\;S_e=\sqrt{\frac{\sum_i r_i^2}{N-2}}
$ (88)
and the standard errors of $w_0$ and $w_1$ are:
  $\displaystyle SE_{w_0}=S_e
\sqrt{\frac{1}{N}+\frac{\bar{x}^2}{\sum_{i=1}^N(x_i-...
...right)
\left(\frac{1}{N}+\frac{\bar{x}^2}{\sum_{i=1}^N(x_i-\bar{x})^2}\right)}
$ (89)
and
  $\displaystyle SE_{w_1}=S_e \sqrt{\frac{1}{\sum_{i=1}^N(x_i-\bar{x})^2}}
=\sqrt{ \frac{\frac{1}{N-2}\sum_{i=1}^N r_i^2}
{\sum_{i=1}^N(x_i-\bar{x})^2}}
$ (90)
The test statistic for $w_j\;(j\in\{0,\,1\})$ is
  $\displaystyle t=\frac{w_j-w_{j0}}{SE_{w_j}}
$ (91)
which has a t-distribution $\nu=N-2$ degrees of freedom. Given a significance level $\alpha$, we can find either reject or accept the null hypothesis that $w_j=w_{j0}$.

Typically we assume $w_{10}=0$ and test the null hypothesis $H_0:\;w_1=0$, i.e., there is no relationship between variables $x$ and $y$. The alternative hypothesis is $H_a:\; w_1\ne 0$, i.e., there is some relationship between $x$ and $y$.

We can also find the upper and lower limits $\pm t_{\alpha/2}$ from the t-table (with $\nu=N-2$) so that

  $\displaystyle P(-t_{\alpha/2}\le t_j\le t_{\alpha/2})
=\int_{-t_{\alpha/2}}^{t_{\alpha/2}} {\cal T}_{N-2}(\tau)\;d\tau =1-\alpha
$ (92)
which is equivalent to
    $\displaystyle P(-t_{\alpha/2}\le t_j\le t_{\alpha/2})
=P\left(-t_{\alpha/2}\le \frac{w_j-w_{j0}}{SE_{w_j}}
\le t_{\alpha/2}\right)$  
  $\textstyle =$ $\displaystyle P(w_j-t_{\alpha/2} SE_{w_j}\le w_{j0} \le w_1+t_{\alpha/2} SE_{w_j} )
=1-\alpha$  

i.e., the confidence interval for $w_{j0}$ is $w_j\pm t_{\alpha/2}\,SE_{w_j}$.

For multivariate linear regression, we can also carry out some tests to answer questions such as which subset of the independent variables $\{ x_1,\cdots,x_D\}$ is most important in affecting $y$. For simplicity, we let $\{x_1,\cdots,x_p\}$ be a subset of $p<N$ variables out of the $D$ variables. Then the corresponding null hypothesis $H_o:\;w_1=\cdots=w_p=0$, i.e., variable $y$ is not related to any variables in the subset $\{x_1,\cdots,x_p\}$. The corresponding test statistic is

  $\displaystyle f=\frac{(SSR_p-SSR)/p}{SSR/(N-D-1)}
$ (93)
with an f-distribution ${\cal F}_{p,N-D-1}(f)$. This null hypothesis can be either rejected or accepted depending on whether the p-value is smaller than a pre-spedified significant level $\alpha$.