Radial-Basis Function (RBF) Networks

The radial basis function (RBF) networks are inspired by biological neural systems, in which neurons are organized hierarchically in various pathways for signal processing, and they are tuned to respond selectively to different features/characteristics of the stimuli within their respective fields. In general, neurons in higher layers have larger receptive fields and they selectively respond to more global and complex patterns.

VisualPathways.png VisualPathways1.png

For example, neurons at different levels along the visual pathway respond selectively to different types of visual stimuli:

Moreover, neurons in the auditory cortex respond selectively to different frequencies.

The tuning curves, the local response functions, of these neurons are typically Gaussian, i.e., the level of response is reduced when the stimulus becomes less similar to what the cell is most sensitive and responsive to (the most preferred).

These Gaussian-like functions can also be treated as a set of basis functions (not necessarily orthogonal and over-complete) that span the space of all input patters. Based on such local features represented by the nodes, a node in a higher layer can be trained to selectively so that it is specialized to respond to some specific patterns/objects (e.g., “grandmother cell”), based on the outputs of the nodes in the lower layer.

Applications:

As seen in the examples above, an RBF network is typically composed of three layers, the input layer composed of $N$ nodes that receive the input signal ${\bf x}$, the hidden layer composed of $L$ nodes that simulate the neurons with selective tuning to different features in the input, and the output layer composed of $M$ nodes simulating the neurons at some higher level that respond to features at a more global level, based on the output from the hidden layer representing different features at a local level. (This could be considered as a model for the visual signal processing in the pathway $retina \Rightarrow V1 \Rightarrow MT \Rightarrow MST$.)

Upon receiving an input pattern vector ${\bf x}$, the jth hidden node reaches the activation level:

$\displaystyle h_j({\bf x})=exp[-({\bf x}-{\bf c}_j)^T {\bf\Sigma}_j^{-1}({\bf x}-{\bf c}_j)]
$
where ${\bf c}_j$ and ${\bf\Sigma}_j$ are respectively the mean vector and covariance matrix associated with the jth hidden node. In particular, the covariance matrix is a special diagonal matrix ${\bf\Sigma}=\sigma^2{\bf I}=diag(\cdots,
\sigma^2,\cdots)$, then the Gaussian function becomes isotropic and we have
$\displaystyle h_j({\bf x})=exp[-({\bf x}-{\bf c}_j)^2/\sigma^2]
$
We see that ${\bf c}_j$ represents the preferred feature (orientation, motion direction, frequency, etc.) of the jth neuron. When ${\bf x}={\bf c}_j$, the response of the neuron is maximized due to the selectivity of the neuron.

In the output layer, each node receives the outputs of all nodes in the hidden layer, and the output of the ith output node is the linear combination of the net activation:

$\displaystyle f_i({\bf x})=\sum_{j=1}^L w_{ij} h_j({\bf x})
=\sum_{j=1}^L w_{ij}\; exp[-({\bf x}-{\bf c}_j)^T {\bf\Sigma}_j^{-1}({\bf x}-{\bf c}_j)]
$
Note that the computation at the hidden layer is non-linear but that at the output layer is linear, i.e., this is a hybrid training scheme.

Learning Rules

Through the training stage, various system parameters of an RBF network will be obtained, including the ${\bf c}_j$ and ${\bf\Sigma}_j$ ($j=1,\cdots,L$) of the $L$ nodes of the hidden layer, as well as the weights $w_{ij}$ ( $j=1,\cdots,L,\;\;i=1,\cdots,M$) for the $M$ nodes of the output layer, each fully connected to all $L$ hidden nodes.