Newton-Raphson Method (Univariate)

To solve equation $f(x)=0$, we first consider the Taylor series expansion of $f(x)$ at any point $x_0$:

$\displaystyle f(x)=f(x_0)+f'(x_0)(x-x_0)+\frac{1}{2!}f''(x)(x-x_0)^2+\cdots
+\frac{1}{n!} f^{(n)}(x_0)(x-x_0)^n +\cdots$ (66)

If $f(x)$ is linear, i.e., its slope $f'(x)$ is a constant for any $x$, then the second and higher order terms are all zero, and the equation $f(x)=0$ becomes

$\displaystyle f(x)=f(x_0)+f'(x)(x-x_0)=0$ (67)

Solving this equation we get the root $x^*$ at which $f(x^*)=0$:

$\displaystyle x^*=x=x_0-\frac{f(x_0)}{f'(x_0)}=x_0+\Delta x_0$ (68)

where $\Delta x_0=-f(x_0)/f'(x_0)$ is the step we need to take to go from the initial point $x_0$ to the root $x^*$:

$\displaystyle \Delta x_0=-\frac{f(x_0)}{f'(x_0)}\;\left\{\begin{array}{ll}
<0 &...
... & \mbox{if $f(x_0)$\ and $f'(x_0)$\ are of different signs}
\end{array}\right.$ (69)

If $f(x)$ is nonlinear, the sum of the first two terms of the Taylor expansion is only an approximation of $f(x)$, and the resulting $x$ found above can be treated as an approximation of the root, which can be improved iteratively to move from the initial point $x_0$ towards the root, as illustrated in the figure below:

$\displaystyle x_{n+1}=x_n+\Delta x_n =x_n-\frac{f(x_n)}{f'(x_n)}=g(x_n),\;\;\;\;\;\;
n=0,\,1,\,2,\cdots$ (70)

NewtonRaphson.png

The Newton-Raphson method can be considered as the fixed point iteration $x_{n+1}=g(x_n)$ based on

$\displaystyle g(x)=x-f(x)/f'(x)$ (71)

The root $x^*$ at which $f(x^*)=0$ is also the fixed point of $g(x)$, i.e., $g(x^*)=x^*$. For the iteration to converge, $g(x)$ needs to be a contraction with $\vert g'(x)\vert<1$. Consider

$\displaystyle g'(x)=\left(x-\frac{f(x)}{f'(x)}\right)'
=1-\frac{(f'(x))^2-f(x)f''(x)}{(f'(x))^2}
=\frac{f(x)f''(x)}{(f'(x))^2}$ (72)

We make the following comments:

The order of convergence of the Newton-Raphson iteration can be found based on the Taylor expansion of $f(x)$ at the neighborhood of the root $x^*=x_n+e_n$:

$\displaystyle 0=f(x^*)=f(x_n)+f'(x_n)e_n+\frac{f''(x_n)}{2}e_n^2+O(e_n^3)$ (74)

where $e_n=x^*-x_n$ is the error at the nth step. Substituting the Newton-Raphson's iteration

$\displaystyle x_{n+1}=x_n-\frac{f(x_n)}{f'(x_n)},\;\;\;\;\;$i.e.$\displaystyle \;\;\;\;
f(x_n)=f'(x_n)(x_n-x_{n+1})$ (75)

into the equation above, we get
0 $\displaystyle =$ $\displaystyle f'(x_n)(x_n-x_{n+1})+f'(x_n)(x^*-x_n)+\frac{f''(x_n)}{2}e_n^2+O(e_n^3)$  
  $\displaystyle =$ $\displaystyle f'(x_n)(x^*-x_{n+1})+\frac{f''(x)}{2}e_n^2+O(e_n^3)
=f'(x_n)e_{n+1}+\frac{f''(x_n)}{2}e_n^2+O(e_n^3)$ (76)

i.e.

$\displaystyle e_{n+1}=-\frac{f''(x_n)}{2f'(x_n)}e_n^2+O(e_n^3)$ (77)

When $n\rightarrow\infty$ all the higher order terms disappear, and the above can be written as

$\displaystyle \lim\limits_{n\rightarrow\infty}\frac{\vert e_{n+1}\vert}{\vert e_n\vert^2}
\le \frac{\vert f''(x^*)\vert}{2\vert f'(x^*)\vert}=\mu$ (78)

Alternatively, we can get the Taylor expansion in terms of $g(x)$:

$\displaystyle x_{n+1}=x^*-e_{n+1}=g(x_n)=g(x^*-e_n)
=g(x^*)-g'(x^*)e_n+\frac{g''(x^*)}{2}e_n^2+O(e_n^3)$ (79)

Subtracting $g(x^*)=x^*$ from both sides we get:

$\displaystyle e_{n+1}=g'(x^*)e_n-\frac{g''(x^*)}{2}e_n^2+O(e_n^3)$ (80)

Now we find $g'(x)$ and $g''(x)$:

$\displaystyle g'(x)=\left(x-\frac{f(x)}{f'(x)}\right)'
=1-\frac{(f'(x))^2-f(x)f''(x)}{(f'(x))^2}
=\frac{f(x)f''(x)}{(f'(x))^2}$ (81)

and
$\displaystyle g''(x)$ $\displaystyle =$ $\displaystyle \left(\frac{f(x)f''(x)}{(f'(x))^2}\right)'
=\frac{(f'(x)f''(x)+f(x)f'''(x))(f'(x))^2-f(x)f''(x)2f'(x)f''(x)}{(f'(x))^4}$  
  $\displaystyle =$ $\displaystyle \frac{(f'(x))^3f''(x)}{(f'(x))^4}
=\frac{f''(x)}{f'(x)}$ (82)

Evaluating these at $x=x^*$ at which $f(x^*)=0$, and substituting them back into the expression for $e_{n+1}$ above, the linear term is zero as $g'(x^*)=0$, i.e., the convergence is quadratic, and we get the same result:

$\displaystyle \lim\limits_{n\rightarrow\infty}\frac{\vert e_{n+1}\vert}{\vert e...
...e \frac{\vert f''(x^*)\vert}{2\vert f'(x^*)\vert}=\frac{\vert g''(x^*)\vert}{2}$ (83)

We see that, if $f'(x^*)\ne 0$, then the order of convergence of the Newton-Raphson method is $p=2$, and the rate of convergence is $\mu=\vert f''(x)\vert/2\vert f'(x)\vert$. However, if $f'(x^*)=0$, the convergence is linear rather than quadratic, as shown in the example below.

Example: Consider solving the equation

$\displaystyle f(x)=x^3-4x^2+5x-2=(x-1)^2(x-2)=0$ (84)

which has a repeated root $x=1$ as well as a single root $x=2$. We have

$\displaystyle f'(x)=3x^2-8x+5=(3x-5)(x-1),\;\;\;\;\;\;\;f''(x)=6x-8$ (85)

Note that at the root $x^*=1$ we have $f'(x^*)=f'(1)=0$. We further find:

$\displaystyle g(x)=x-\frac{f(x)}{f'(x)}=x-\frac{(x-1)^2(x-2)}{(3x-5)(x-1)}
=x-\frac{(x-1)(x-2)}{3x-5}$ (86)

and
$\displaystyle g'(x)$ $\displaystyle =$ $\displaystyle \frac{f(x)f''(x)}{(f'(x))^2}
=\frac{(x-1)^2(x-2)(6x-8)}{(3x-5)^2(x-1)^2}$  
  $\displaystyle =$ $\displaystyle \frac{(x-2)(6x-8)}{(3x-5)^2}
=\left\{\begin{array}{ll}1/2 & x=1\\ 0 & x=2\end{array}\right.$  

We therefore have

$\displaystyle e_{n+1}=g'(x)e_n-\frac{1}{2}g''(x)e_n^2+O(e_n^3)
\stackrel{n\righ...
...}
\left\{\begin{array}{ll}e_n/2 & x=1 \\ -g''(x)e_n^2/2 & x=2\end{array}\right.$ (87)

We see the iteration converges quadratically to the single root $x=2$, but only linearly to the repeated root $x=1$.

We consider in general a function with a repeated root at $x=a$ of multiplicity $k$:

$\displaystyle f(x)=(x-a)^kh(x)$ (88)

its derivative is

$\displaystyle f'(x)=k(x-a)^{k-1}h(x)+(x-a)^kh'(x)=(x-a)^{k-1}[kh(x)+(x-a)h'(x)]$ (89)

As $f'(a)=0$, the convergence of the Newton-Raphson method to $x=a$ is linear, rather than quadratic.

In such case, we can accelerate the iteration by using a step size $\delta=k>1$:

$\displaystyle g(x)=x-k\;\frac{f(x)}{f'(x)}$ (90)

and

$\displaystyle g'(x)=1-k\;\frac{(f'(x))^2-f(x)f''(x)}{(f'(x))^2}
=1-k+k\,\frac{f(x)f''(x)}{(f'(x))^2}$ (91)

Now we show that this $g'(x)$ is zero at the repeated root $x=a$, therefore the convergence to this root is still quadratic.

We substitute $f(x)=(x-a)^kh(x)$, $f'(x)=k(x-a)^{k-1}[kh+(x-a)h']$, and

    $\displaystyle f''(x)=[(x-a)^{k-1}[kh+(x-a)h']]'$  
  $\displaystyle =$ $\displaystyle 1-k+k\,(k-1)(x-a)^{k-2}[kh+(x-a)h']$  
    $\displaystyle +(x-a)^{k-1}[(k+1)h'+(x-a)h'']$ (92)

into the expression for $g'(x)$ above to get (after some algebra):
$\displaystyle g'(x)$ $\displaystyle =$ $\displaystyle 1-k+k\frac{f(x)f''(x)}{(f'(x))^2}$  
  $\displaystyle =$ $\displaystyle 1-k+k\,\frac{h(k-1)(kh+(x-a)h')+h(x-a)((k+1)h'+(x-a)h')}{(kh+(x-a)h')^2}$ (93)

At $x=a$, we get

$\displaystyle g'(x)\bigg\vert _{x=a}=1-k+k\frac{(k-1)kh^2}{(kh)^2}=0$ (94)

i.e., the convergence to the repeated root at $x=a$ is no longer linear but quadratic.

The difficulty, however, is that the multiplicity $k$ of a root is unknown ahead of time. If $\delta>1$ is used blindly some root may be skipped, and the iteration may oscillate around the real root.

Example: Consider solving $f(x)=x^3-4x^2+5x-2=(x-1)^2(x-2)=0$, with a double root $x=1$ and a single root $x=2$. In the following, we compare the performance of both $g_1(x)=x-f(x)/f'(x)$ and $g_2(x)=x-2f(x)/f'(x)$.