Newton-Raphson Method (Univariate)

To solve equation $f(x)=0$ , we first consider the Taylor series expansion of $f(x)$ at any point $x_0$ :

$\displaystyle f(x)=f(x_0)+f'(x_0)(x-x_0)+\frac{1}{2!}f''(x)(x-x_0)^2+\cdots +\frac{1}{n!} f^{(n)}(x_0)(x-x_0)^n +\cdots$

(66)

is linear, i.e., its slope $f'(x)$

is a constant for any $x$

, then the second and higher order terms are all zero, and the equation $f(x)=0$

becomes

$\displaystyle f(x)=f(x_0)+f'(x)(x-x_0)=0$

(67)

Solving this equation we get the root $x^*$

at which

$\displaystyle x^*=x=x_0-\frac{f(x_0)}{f'(x_0)}=x_0+\Delta x_0$

(68)

where $\Delta x_0=-f(x_0)/f'(x_0)$ is the step we need to take to go from the initial point $x_0$

to the root $x^*$

$\displaystyle \Delta x_0=-\frac{f(x_0)}{f'(x_0)}\;\left\{\begin{array}{ll} <0 &... ... & \mbox{if $f(x_0)$\ and $f'(x_0)$\ are of different signs} \end{array}\right.$

(69)

If $f(x)$ is nonlinear, the sum of the first two terms of the Taylor expansion is only an approximation of $f(x)$ , and the resulting $x$ found above can be treated as an approximation of the root, which can be improved iteratively to move from the initial point $x_0$ towards the root, as illustrated in the figure below:

$\displaystyle x_{n+1}=x_n+\Delta x_n =x_n-\frac{f(x_n)}{f'(x_n)}=g(x_n),\;\;\;\;\;\; n=0,\,1,\,2,\cdots$

(70)

The Newton-Raphson method can be considered as the fixed point iteration $x_{n+1}=g(x_n)$ based on

$\displaystyle g(x)=x-f(x)/f'(x)$

(71)

The root

at which

is also the fixed point of $g(x)$

, i.e.,

. For the iteration to converge, $g(x)$

needs to be a contraction with $\vert g'(x)\vert<1$ . Consider

$\displaystyle g'(x)=\left(x-\frac{f(x)}{f'(x)}\right)' =1-\frac{(f'(x))^2-f(x)f''(x)}{(f'(x))^2} =\frac{f(x)f''(x)}{(f'(x))^2}$

(72)

We make the following comments:

At the root where , if $f'(x^*)\ne 0$ , then , i.e., is a contraction and the iteration $x_{n+1}=g(x_n)$ converges quadratically when is close to .
If (the tangent is horizontal), the iteration cannot proceed, but we can modify by adding a small value $\epsilon$ to so that $f'(x+\epsilon)\ne 0$ .
It is difficult to know the number of roots of a nonlinear equation (unlike the case of a linear equation), which can vary from zero (e.g., ) to infinity (e.g., $f(x)=\cos(x)$ ). One can try different initial guesses in the range of interest to see if different roots can be found.
Sometime a parameter $\delta$ can be used to control the step size of the iteration:

$\displaystyle x_{n+1}=x_n-\delta\,\frac{f(x_n)}{f'(x_n)}$ (73)
- If $\delta<1$ , the iteration is de-accelerated. Although the convergence becomes slower, this may be desirable if the function is not smooth with many local variations.
- If $\delta>1$ , the iteration is accelerated. The convergence may or may not be accelerated. Due to the greater step size, the root may be skipped and missed. Sometimes the convergence may become significantly slowed or even oscillate around the true root, such as the example shown in the figure below with $\delta=2$ .

The order of convergence of the Newton-Raphson iteration can be found based on the Taylor expansion of $f(x)$ at the neighborhood of the root $x^*=x_n+e_n$ :

$\displaystyle 0=f(x^*)=f(x_n)+f'(x_n)e_n+\frac{f''(x_n)}{2}e_n^2+O(e_n^3)$

(74)

where

is the error at the nth step. Substituting the Newton-Raphson's iteration

$\displaystyle x_{n+1}=x_n-\frac{f(x_n)}{f'(x_n)},\;\;\;\;\;$ i.e. $\displaystyle \;\;\;\; f(x_n)=f'(x_n)(x_n-x_{n+1})$

(75)

into the equation above, we get

0	$\displaystyle =$	$\displaystyle f'(x_n)(x_n-x_{n+1})+f'(x_n)(x^*-x_n)+\frac{f''(x_n)}{2}e_n^2+O(e_n^3)$
	$\displaystyle =$	$\displaystyle f'(x_n)(x^*-x_{n+1})+\frac{f''(x)}{2}e_n^2+O(e_n^3) =f'(x_n)e_{n+1}+\frac{f''(x_n)}{2}e_n^2+O(e_n^3)$	(76)

i.e.

$\displaystyle e_{n+1}=-\frac{f''(x_n)}{2f'(x_n)}e_n^2+O(e_n^3)$

(77)

When $n\rightarrow\infty$ all the higher order terms disappear, and the above can be written as

$\displaystyle \lim\limits_{n\rightarrow\infty}\frac{\vert e_{n+1}\vert}{\vert e_n\vert^2} \le \frac{\vert f''(x^*)\vert}{2\vert f'(x^*)\vert}=\mu$

(78)

Alternatively, we can get the Taylor expansion in terms of $g(x)$

$\displaystyle x_{n+1}=x^*-e_{n+1}=g(x_n)=g(x^*-e_n) =g(x^*)-g'(x^*)e_n+\frac{g''(x^*)}{2}e_n^2+O(e_n^3)$

(79)

Subtracting $g(x^*)=x^*$

from both sides we get:

$\displaystyle e_{n+1}=g'(x^*)e_n-\frac{g''(x^*)}{2}e_n^2+O(e_n^3)$

(80)

Now we find $g'(x)$

and

$\displaystyle g'(x)=\left(x-\frac{f(x)}{f'(x)}\right)' =1-\frac{(f'(x))^2-f(x)f''(x)}{(f'(x))^2} =\frac{f(x)f''(x)}{(f'(x))^2}$

(81)

and

$\displaystyle g''(x)$	$\displaystyle =$	$\displaystyle \left(\frac{f(x)f''(x)}{(f'(x))^2}\right)' =\frac{(f'(x)f''(x)+f(x)f'''(x))(f'(x))^2-f(x)f''(x)2f'(x)f''(x)}{(f'(x))^4}$
	$\displaystyle =$	$\displaystyle \frac{(f'(x))^3f''(x)}{(f'(x))^4} =\frac{f''(x)}{f'(x)}$	(82)

Evaluating these at $x=x^*$

at which

, and substituting them back into the expression for $e_{n+1}$ above, the linear term is zero as $g'(x^*)=0$

, i.e., the convergence is quadratic, and we get the same result:

$\displaystyle \lim\limits_{n\rightarrow\infty}\frac{\vert e_{n+1}\vert}{\vert e... ...e \frac{\vert f''(x^*)\vert}{2\vert f'(x^*)\vert}=\frac{\vert g''(x^*)\vert}{2}$

(83)

We see that, if $f'(x^*)\ne 0$ , then the order of convergence of the Newton-Raphson method is $p=2$ , and the rate of convergence is $\mu=\vert f''(x)\vert/2\vert f'(x)\vert$ . However, if $f'(x^*)=0$ , the convergence is linear rather than quadratic, as shown in the example below.

Example: Consider solving the equation

$\displaystyle f(x)=x^3-4x^2+5x-2=(x-1)^2(x-2)=0$

(84)

which has a repeated root $x=1$

as well as a single root $x=2$

. We have

$\displaystyle f'(x)=3x^2-8x+5=(3x-5)(x-1),\;\;\;\;\;\;\;f''(x)=6x-8$

(85)

Note that at the root $x^*=1$

we have

. We further find:

$\displaystyle g(x)=x-\frac{f(x)}{f'(x)}=x-\frac{(x-1)^2(x-2)}{(3x-5)(x-1)} =x-\frac{(x-1)(x-2)}{3x-5}$

(86)

and

$\displaystyle g'(x)$	$\displaystyle =$	$\displaystyle \frac{f(x)f''(x)}{(f'(x))^2} =\frac{(x-1)^2(x-2)(6x-8)}{(3x-5)^2(x-1)^2}$
	$\displaystyle =$	$\displaystyle \frac{(x-2)(6x-8)}{(3x-5)^2} =\left\{\begin{array}{ll}1/2 & x=1\\ 0 & x=2\end{array}\right.$

We therefore have

$\displaystyle e_{n+1}=g'(x)e_n-\frac{1}{2}g''(x)e_n^2+O(e_n^3) \stackrel{n\righ... ...} \left\{\begin{array}{ll}e_n/2 & x=1 \\ -g''(x)e_n^2/2 & x=2\end{array}\right.$

(87)

We see the iteration converges quadratically to the single root $x=2$

, but only linearly to the repeated root $x=1$

We consider in general a function with a repeated root at $x=a$ of multiplicity $k$ :

$\displaystyle f(x)=(x-a)^kh(x)$

(88)

its derivative is

$\displaystyle f'(x)=k(x-a)^{k-1}h(x)+(x-a)^kh'(x)=(x-a)^{k-1}[kh(x)+(x-a)h'(x)]$

(89)

, the convergence of the Newton-Raphson method to $x=a$

is linear, rather than quadratic.

In such case, we can accelerate the iteration by using a step size $\delta=k>1$ :

$\displaystyle g(x)=x-k\;\frac{f(x)}{f'(x)}$

(90)

and

$\displaystyle g'(x)=1-k\;\frac{(f'(x))^2-f(x)f''(x)}{(f'(x))^2} =1-k+k\,\frac{f(x)f''(x)}{(f'(x))^2}$

(91)

Now we show that this $g'(x)$

is zero at the repeated root $x=a$

, therefore the convergence to this root is still quadratic.

We substitute $f(x)=(x-a)^kh(x)$ , $f'(x)=k(x-a)^{k-1}[kh+(x-a)h']$ , and

	$\displaystyle f''(x)=[(x-a)^{k-1}[kh+(x-a)h']]'$
$\displaystyle =$	$\displaystyle 1-k+k\,(k-1)(x-a)^{k-2}[kh+(x-a)h']$
	$\displaystyle +(x-a)^{k-1}[(k+1)h'+(x-a)h'']$	(92)

into the expression for $g'(x)$

above to get (after some algebra):

$\displaystyle g'(x)$	$\displaystyle =$	$\displaystyle 1-k+k\frac{f(x)f''(x)}{(f'(x))^2}$
	$\displaystyle =$	$\displaystyle 1-k+k\,\frac{h(k-1)(kh+(x-a)h')+h(x-a)((k+1)h'+(x-a)h')}{(kh+(x-a)h')^2}$	(93)

, we get

$\displaystyle g'(x)\bigg\vert _{x=a}=1-k+k\frac{(k-1)kh^2}{(kh)^2}=0$

(94)

i.e., the convergence to the repeated root at $x=a$

is no longer linear but quadratic.

The difficulty, however, is that the multiplicity $k$ of a root is unknown ahead of time. If $\delta>1$ is used blindly some root may be skipped, and the iteration may oscillate around the real root.

Example: Consider solving $f(x)=x^3-4x^2+5x-2=(x-1)^2(x-2)=0$ , with a double root $x=1$ and a single root $x=2$ . In the following, we compare the performance of both $g_1(x)=x-f(x)/f'(x)$ and $g_2(x)=x-2f(x)/f'(x)$ .

First use an initial guess .
- When is used, the convergence is linear around the double root . It takes 16 iterations to get $x_{16}=0.999983$ with $\vert e_{16}\vert<10^{-9}$ :
  
  $\begin{displaymath}\begin{array}{r\vert c\vert c\vert c}\hline n & x_n & f(x_n) ... ...0069 \\ \hline 16 & 0.999983& -0.000000 & \\ \hline \end{array}\end{displaymath}$
- When is used, the convergence is quadratic around the double root . It takes only 3 iterations to get with $\vert e_3\vert<10^{-9}$ :
  
  $\begin{displaymath}\begin{array}{r\vert c\vert c\vert c}\hline n & x_n & f(x_n) ... ...2222 \\ \hline 3 & 0.999982 & -0.000000 & \\ \hline \end{array}\end{displaymath}$
Next use a different initial guess .
- If is used, it takes 7 iterations to get with $\vert e_7\vert<10^{-9}$ , the convergence is quadratic.
  
  $\begin{displaymath}\begin{array}{r\vert c\vert c\vert c}\hline n & x_n & f(x_n) ... ...00000 \\ \hline 8 & x=2.00000 & 0.00000 & \\ \hline \end{array}\end{displaymath}$
- If is used, oscillation happens as shown in the figure below. However, if a better initial guess is used instead of , it takes only one step to get with $\vert e_1\vert<10^{-9}$ , the convergence is significantly accelerated.
  
  $\begin{displaymath}\begin{array}{r\vert c\vert c\vert c}\hline n & x_n & f(x_n) ... ...8.00000 \\ \hline 2 & 2.00000 & 0.00000 & \\ \hline \end{array}\end{displaymath}$