Biological and Artificial Neural Networks

In machine learning, the artificial neural networks are a category of algorithms that are inspired by the biological neural networks in the brain, and designed to carry out both supervised and unsupervised learning tasks, such as classification and clustering. To understand how such neural network algorithms work, we first consider some basic concepts in biological neural system.

The human brain consists of $10^{11}$ neurons interconnected through about $10^{14}$ to $10^{15}$ synaptic junctions to form millions of neural networks. Hundreds specialized cortical areas are formed based on these networks for different information processing tasks.

Functionally, a neuron consists of the following three parts:

The cell body or soma containing the nucleus of the cell;
The dendrites that receive electrochemical stimulations (input impulses) from other neurons and propagate them to the cell body;
The axon: that conducts the impulses (output) away from the cell body to other cells;
The synapse is the point at which impulses pass from one cell to another.

The function of a neuron can be modeled mathematically. Each neuron, modeled as a node in the neural network, receives input signal or stimulus from $n$ neurons and its activation or the net input is the weighted sum of all such inputs:

$\displaystyle a=\sum_{j=1}^d w_j x_j+b$

(1)

where

is the offset or bias, $x_j$

is the input signal from the jth node, $w_j$

is the synaptic connectivity to the jth input node:

$\displaystyle w_j\;\left\{ \begin{array}{ll} > 0 & \mbox{excitatory input} \\ < 0 & \mbox{inhibitory input} \\ = 0 & \mbox{no connection} \end{array} \right.$

(2)

Same as in the case of linear regression, we define $x_0=1$

and

, so that both the weight and pattern vectors are augmented to become ${\bf x}=[x_0=1,\,x_1,\cdots,x_d]^T$ and ${\bf w}=[w_0=b,\,,w_1,\cdots,w_d]^T$ , and Eq. (1) above can now be conveniently written as

$\displaystyle a=\sum_{j=1}^d w_j x_j+b=\sum_{j=0}^d w_j x_j={\bf w}^T{\bf x}$

(3)

The output signal or response of the neuron is a function of its activation:

$\displaystyle y=g(a)=g\left(\sum_{j=0}^d w_j x_j+b\right)=g({\bf w}^T{\bf x})$

(4)

Here $g(x)$ is an activation function, which typically take one of the following forms:

Logistic sigmoid function:

$\displaystyle g(x,a)=\frac{1}{1+e^{-ax}}=\frac{e^{ax}}{1+e^{ax}}=\left\{\begin{... ...array}\right., \;\;\;\;\;\; \frac{d\,g(x)}{dx}=\frac{a\,e^{-ax}}{(1+e^{-ax})^2}$ (5)
Tanh (hyperbolic tangent) function:

$\displaystyle g(x,a)=\frac{2}{1+e^{-ax}}-1=\frac{e^{ax}-1}{e^{ax}+1} =\left\{\b... ...rray}\right., \;\;\;\;\;\; \frac{d\,g(x)}{dx}=\frac{2a\,e^{-ax}}{(1+e^{-ax})^2}$ (6)

where is a parameter that controls the slop of . Specially, when $a\rightarrow 0$ , becomes linear, but whn $a\rightarrow \infty$ , becomes a threshold function:

$\displaystyle \lim_{a\rightarrow\infty} g(x,a)=\left\{ \begin{array}{cl} 0\mbox{ or }-1 & x<0\\ 1 & x>0 \end{array} \right.$ (7)
Rectified linear unit (ReLU):

$\displaystyle g(x)=\max(0,\,x)=\left\{\begin{array}{ll}0 & x<0\\ x & x>0\end{array}\right.$ (8)

The function of a neural network can be modeled mathematically as a hierarchical structure shown below containing multiple layers of neurons, called nodes in the context of artificial neural networks:

The input layer: receives inputs from external sources;
The output layer: generates output to the external world;
The hidden layer(s): between of the input and output layers, not visible from outside the network.

The purpose is to train the network according to certain mathematical rules, the learning rules or learning laws, by modifying the weights of a network iteratively based on the inputs (and the desired outputs if the learning is supervised), so that given the input of the network as the stimulus, the network will produce the desired output as the response.

The learning paradigms of the neural networks are listed below, depending on the interpretations of the input and output of the neural network.

Pattern Associator
This is the most general form of neural networks that learns and stores the associative relationship between two sets of patterns represented by vectors.
- Training: A set of pairs of patterns $\{ ({\bf x}_n,{\bf y}_n),\;n=1,\cdots,N\}$ is presented to the network which then learns to establish the associative relationship between two sets of patterns:
  
  $\displaystyle f: {\bf x} \in {\cal R}^d \Longrightarrow {\bf y} \in {\cal R}^m$ (9)
- Testing: When a pattern ${\bf x}_n$ in a pair is presented as the input, the network produces an output pattern ${\bf y}_n$ associated to the output.
Human memory is associative in the sense that given one pattern, some associated pattern(s) may be produced. Examples include: (Evolution, Darwin), (Einstein, ), (food, sounding bell, salivation).
Auto-associator
As a special pattern associator, auto-associator associates a prestored pattern to an incomplete or noisy version of the pattern.
- Training: A set of patterns $\{{\bf x}_1,\cdots,{\bf x}_N\}$ is presented to the network for it to learn and remember, i.e., the patterns are stored in the network.
- Testing: When an incomplete or noisy version of one of the patterns stored in the network is presented as the input to the network, the original pattern is retrieved by the network as the outpupt.
Regression
This is another special kind of pattern associator which takes a vector input ${\bf x}\in{\cal R}^d$ and produces a real value $y\in{\cal R}$ as a multivariable function $y=f({\bf x})$ at its only output node.
- Training: trained by a set of observed data samples, the independent vectors and their corresponding function values $\{ ({\bf x}_n,\;y_n),\; n=1,\cdots,N\}$ , the network is model the function.
- Testing: given any vector ${\bf x}$ , the output value produced by the single output node is an estimated function value $y=f({\bf x})$ .
Classification
This is a variation of the pattern associator of which the output patterns are a set of categorical symbols representing different classes $\{C_1,\cdots,C_K\}$ , i.e., each input pattern is classified by the network into one of the classes

$\displaystyle f: {\bf x} \in {\cal R}^d \Longrightarrow y \in \{C_1,\cdots,C_K\}$ (10)
Regularity Detector
This is an unsupervised learning process. The network discovers automatically the regularity in the inputs so that similar patterns are automatically detected and grouped together in the same cluster or class.