To account for the noises of different types that exist in the real visual
signal, random noises are added to the input layer of V1 nodes. First, one
third to half randomly chosen input nodes are set to zero to simulate the areas
in the visual field where no motion is detected due to homogeneous luminance.
Next, the directions of over two thirds of the remaining input nodes randomly
chosen are modify by a random value in the range of
to
.
This is to simulate the aperture problem,
caused by the random orientations of the local line features (textures, edges,
boundary contours, etc.) of the global pattern. With these noises added, a
global stimulus will now be responded to by more than just one type of V1
nodes, of which the overall average still favors the velocity of the global
motion, and therefore the response selectivity remains. The performance of the
competitive learning will get worse as a higher percentage of input MT nodes
are contaminated by the two types of noise introduced. However, with a longer
training time, the learning can still reach a stable state where each input
motion pattern is represented by basically the same output nodes.
The competitive learning algorithm performs the best when the input patterns form a set of nicely separable clusters in the feature space, each containing a group of similar patterns (see Appendix). However, this is not the case for the MT model, as here the input patterns, with various types of noises added, form a continuum without natural boundaries. In this case, we need to avoid an undesirable situation where most input patterns are represented by a few, or even a single node, while other nodes never respond to any input and become dead nodes. The technique also described in the Appendix is used to balance the competition among all nodes to guarantee that every node has some chance to win, so that the continuum of the input patterns is broken evenly.
The network is trained in two steps. First, the middle layer of the component MT nodes are trained by repeatedly presenting a set of global motion patterns of different velocities to the input layer. In this process all groups in the middle layer are trained in parallel, each by the part of the global motion patterns inside the local region composed of 3 by 3 patches of the group. The Training terminates when all groups in the middle layer has reached a stable state in which each local input pattern with a certain motion velocity is consistently responded to by a certain node in each group. Now the training of the output layer can begin. The same input patterns are presented to the input layer and propagated to the middle layer through the weights between the input and middle layers already obtained in the first step. The training of the output layer terminates when a stable state of the output layer is achieved. Due to the balancing mechanisms of the learning, almost all nodes respond to (becomes the winner of) only one input pattern, except a few dead node which do not respond to any input. A one-to-one correspondence is thus established between the input motion patterns and the pattern MT nodes. In other words, the pattern MT nodes in the output layer of the network have now learned to recognize global motion patterns of different directions and speeds.