next up previous
Next: Gaussian channel Up: Neural Signaling III - Previous: Neuronal response as a

Neural coding of information

Conditional probabilities

Let A and B be two random events and P(A) and P(B) be their probabilities. The probability of the joint event of both A and B, represented by P(A,B), can be obtained as

P(A,B)=P(A) P(B/A)=P(B) P(A/B)

where P(A/B) is the conditional probability of event A given the condition that event B, and P(B/A) is defined similarly.

If the two events are independent of each other, i.e., how likely event A will occur is independent of whether event B occurs, and vice versa, then

\begin{displaymath}P(A/B)=P(A),\;\;\;\;\;\;P(B/A)=P(B) \end{displaymath}

and the joint probability becomes

P(A,B)=P(A) P(B)

Stimuli and responses as random variables

The response property of a neuron can be characterized by the tuning curve which represents how the neuron responds differently to stimulus with a varying parameter. When there are two varying parameter, the tuning surface is used. The shape of one dimensional tuning curves usually fall into one of two categories, sigmoidal shaped, as the tuning for contrasts or near and far stereo cells for distances, and bell shaped (Gaussian), as for orientation or direction of motion.

The a neuron is treated as a communication channel which receives the stimuli s(t) as the input and generates the spike trains $x(t)=\sum_k \delta(t-t_k)$ as the output. Here both s(t) and x(t) are treated as random variables and the observing x(t) as the response to s(t) is considered as a process in which certain information (e.g., about the external world from the sensory stimuli) is gained. To describe the relationship between them, that is, how the stimulus affects the response and how the response reflects the stimulus, we define

These probabilities are all related to the joint probability of both s(t) and x(t)

\begin{displaymath}p[ s(t),\;x(t) ]=p(s(t)) p(x(t) / s(t)) =p(x(t)) p(s(t) / x(t)) \end{displaymath}

Neuronal response as communication channel

p(s) is also called the a priori probability of input s, as it represents the likelihood of s before the neuron responding to it; and p(s/x) is called the post priori probability of s, as it represents the likelihood of s given that the neuron has responded to it by of x. As observing x will always more or less gain some information about s, we have

\begin{displaymath}p(s/x) \geq p(s) \end{displaymath}

where the equal sign is for the worst case where x is completely irrelevant to s therefore nothing about s can be learned from x.

The total information gained from this process is quantitatively given by

\begin{displaymath}I=log\frac{p(s/x)}{p(s)} \geq 0 \end{displaymath}

if the base of log is 2, then the unit of information is bit.

Information I - ideal case

In the ideal case without noise, complete information about the stimuli can be obtained in the sense that any given stimulus s is responded to with a specific x, i.e,, p(s/x)=1,

\begin{displaymath}I=log\frac{1}{p(s)}=-log\;p(s) \end{displaymath}

Note that the less likely an event (small p(s)), the more information is gained by learning this event occurs. (e.g., snow vs. fine whether in Los Angeles.) In particular, if p(s)=1, no information is gained (I=0), due to the log operation.

If the input is not a binary random event (either happens or not with probability P), but a random variable of varying values s with probability distribution p(s), then the information gained in an ideal communication channel is the weighted average of information for all possible values of s:

\begin{displaymath}I=-\sum_i p(s_i) log\;p(s_i) \end{displaymath}

if s is discrete, or

\begin{displaymath}I=-\int p(s) log\;p(s) ds \end{displaymath}

if s is continuous.

Information II - with noise

If the system is noisy, the stimulus-response relationship is no longer definite, as the same s may be responded to with different x due to noise, i.e., p(s/x) < 1. Here we first find the information about s gained when observing a particular x:


\begin{displaymath}I(x)=\int p(s/x) log \frac{p(s/x)}{p(s)} ds \end{displaymath}

Then we find the total information about s gained when observing all possible x:
I = $\displaystyle \int p(x) I(x) dx$  
  = $\displaystyle \int [ \int p(s/x) log \frac{p(s/x)}{p(s)} ds ] dx$  
  = $\displaystyle \int p(x) [ \int p(s/x) log p(s/x) ds ] dx -\int p(x) [\int p(s/x) log p(s) ds] dx$  
  = $\displaystyle \int p(x) [ \int p(s/x) log p(s/x) ds ] dx -\int [\int p(x) p(s/x) dx] log p(s) ds$  
  = H(s)-H(s/x)  

where

\begin{displaymath}H(s)\stackrel{\triangle}{=}-\int p(s) log p(s) \end{displaymath}

and

\begin{displaymath}H(s/x)\stackrel{\triangle}{=}\int p(x) [ \int p(s/x) log p(s/x) ds ] dx \end{displaymath}

Entropy

We define H(s) as the entropy of s. Note that as H(s) is the same as the complete information gained in the ideal case, it represents the maximum possible information available about s that can be gained. Since information gaining can be considered the same process as uncertainty reduction, H(s) also represents the uncertainty about s before observing x. And H(s/x) is the conditional entropy, representing the uncertainty of s after observing x. If the base of logarithm is 2, the unit of entropy is bit.

The information gained I=H(s)-H(s/x) is thus called mutual information and represents the uncertainty reduction from H(s) before observing x to H(s/x) after observing x. In the ideal case p(s/x)=1, after observing s the uncertainty becomes zero H(s/x)=0, and the complete information about s $I=H(s)=-\int p(s) log\;p(s) ds$ is gained.

As an example, consider an experiment with n equally likely possible outcomes si ( $i=1,\cdots,n$). The probability for a particular si to occur is p(si)=1/n. Now the total uncertainty, the entropy, of this experiment is

\begin{displaymath}H(s)=-\sum_{i=1}^n p(s_i) log\;p(s_i) =log\;n \end{displaymath}

We see that under the condition of equal likelihood, the entropy is simply the logarithm of the total number of possible outcomes. If n=8, then the entropy, i.e, the maximum possible information available about this experiment, is log2 8=3 bits.

Entropy under given conditions

Here we find the probability distribution p(x) to maximize the entropy $H(x)=\int p(x) log\;p(x) dx$ subject to some given conditions, as well as

\begin{displaymath}\int_{x=-\infty}^{\infty} p(x) dx=1 \end{displaymath}

or

\begin{displaymath}\sum_i p(x_i)=1 \end{displaymath}

This kind of constrained optimization problem can be solved by Lagrange multiplier method.

Entropy of spike trains

We now know that the amount of information available about the external world is limited by the entropy of the input sensory signals. And, on the other hand, how much information about the sensory input the spike trains can provide is limited by the entropy of these spike trains, as considered below.


next up previous
Next: Gaussian channel Up: Neural Signaling III - Previous: Neuronal response as a
Ruye Wang
1999-09-12