next up previous
Next: Simulation results Up: Modeling motion integration in Previous: The network structure

Training

To train the network, global motion patterns of different velocities (speeds and directions) represented by $n \times n$ local motion velocities are presented to the input V1 layer of the network. Each local velocity is represented by 32 nodes in a patch in the corresponding location of the visual field. These nodes are either excited or inhibited, depending on their speed and directional tuning. We assume the speed and direction tuning of a V1 cell can be modeled as illustrated in Fig. 2 based on our discussion on V1 cells in the previous section (especially the spatiotemporal energy model). This tuning surface is approximated by a 2D Gaussian function with a positive (excitatory) peak for the preferred velocity of speed s=2 and direction d=6 (representing $270^{\circ}$). The responses to all other velocities are very low. It also has a negative (inhibitory) response to the null direction d=2 (representing $90^{\circ}$, the opposite to the preferred one).


  
Figure 2: The assumed velocity tuning surface of a V1 node
\begin{figure}
The horizonal axes represent, respectively, 4 speeds (0 through 3...
... with $45^{\circ}$\space increment). The direction axis is periodic.\end{figure}

To account for the noises of different types that exist in the real visual signal, random noises are added to the input layer of V1 nodes. First, one third to half randomly chosen input nodes are set to zero to simulate the areas in the visual field where no motion is detected due to homogeneous luminance. Next, the directions of over two thirds of the remaining input nodes randomly chosen are modify by a random value in the range of $-90^{\circ}$ to $90^{\circ}$. This is to simulate the aperture problem, caused by the random orientations of the local line features (textures, edges, boundary contours, etc.) of the global pattern. With these noises added, a global stimulus will now be responded to by more than just one type of V1 nodes, of which the overall average still favors the velocity of the global motion, and therefore the response selectivity remains. The performance of the competitive learning will get worse as a higher percentage of input MT nodes are contaminated by the two types of noise introduced. However, with a longer training time, the learning can still reach a stable state where each input motion pattern is represented by basically the same output nodes.

The competitive learning algorithm performs the best when the input patterns form a set of nicely separable clusters in the feature space, each containing a group of similar patterns (see Appendix). However, this is not the case for the MT model, as here the input patterns, with various types of noises added, form a continuum without natural boundaries. In this case, we need to avoid an undesirable situation where most input patterns are represented by a few, or even a single node, while other nodes never respond to any input and become dead nodes. The technique also described in the Appendix is used to balance the competition among all nodes to guarantee that every node has some chance to win, so that the continuum of the input patterns is broken evenly.

The network is trained in two steps. First, the middle layer of the component MT nodes are trained by repeatedly presenting a set of global motion patterns of different velocities to the input layer. In this process all groups in the middle layer are trained in parallel, each by the part of the global motion patterns inside the local region composed of 3 by 3 patches of the group. The Training terminates when all groups in the middle layer has reached a stable state in which each local input pattern with a certain motion velocity is consistently responded to by a certain node in each group. Now the training of the output layer can begin. The same input patterns are presented to the input layer and propagated to the middle layer through the weights between the input and middle layers already obtained in the first step. The training of the output layer terminates when a stable state of the output layer is achieved. Due to the balancing mechanisms of the learning, almost all nodes respond to (becomes the winner of) only one input pattern, except a few dead node which do not respond to any input. A one-to-one correspondence is thus established between the input motion patterns and the pattern MT nodes. In other words, the pattern MT nodes in the output layer of the network have now learned to recognize global motion patterns of different directions and speeds.


next up previous
Next: Simulation results Up: Modeling motion integration in Previous: The network structure
Ruye Wang
2000-04-25