The middle and output layers are trained in sequence. The middle layer is trained first to reach a stable state where each input motion pattern is responded to uniquely by one of the 30 nodes in each group of the middle layer (but a node may respond to more than one input pattern). The responses of one group randomly chosen from the middle layer are shown in Fig. 5. Here numbers 0 through 29 represent the 30 MST nodes in each group. The 8 winners which responded most strongly to the translational motions of 8 different directions are listed on top, followed by 8 arrays containing winning nodes which responded most strongly to the circular, radial and spiral motions. Each number of the k by k array (k=7 in the simulation test) represents the winner for the motion pattern with its COM located in the corresponding position in the visual field. Three types of MST nodes can be found in the figure: single-component nodes (nodes 0, 1, 10, 12, 15, 16, 17, 22, 24, 25), double-component nodes (nodes 2, 3, 4, 6, 7, 8, 13, 14, 18, 19, 20, 27, 28, 29), and triple-component nodes (nodes 5, 9, 11, 21, 23, 26). It is also obvious from the figure that the single-component nodes are more position independent as they respond to larger areas, while the multi-component nodes tend to be more position dependent as they respond to smaller areas. This feature is consistently observed in more examples. The underlying mechanism responsible for this feature is the balancing nature of the learning process. As all nodes respond to about the same number of input patterns, those which respond to more types of motion can only respond to fewer COM locations for each type, and they are therefore more position dependent.
Fig. 6 shows the analog responses of the six nodes (26, 9, 19, 0, 4, and 20) all responding favorably to the expansion motion (motion type 8). For each node, its responses to the expansions of different COM locations over the visual field are plotted (response tuning surface for COM location). We see that a node responds most strongly and became the winner when the COM of the stimulus was located within the local area preferred by this node, while it also responds (less strongly and no longer the winner) to the stimuli whose COMs were located elsewhere. In other words, they do have sloping response profiles.
As shown in Fig. 5, eight nodes (21, 23, 5, 20, 28, 11, 9, and 26) respond favorably to translational motions of different directions. For each node, its direction tuning, the responses to all translational directions, are plotted in Fig. 7. We see that a node responds most strongly to its preferred translational motion directions and became the winner, but it also responds, less strongly and no longer the winner) to other directions. The direction tuning can be closely fitted with a Gaussian curve.
Next the output layer of the network was trained and tested. The responses of the 200 nodes in a group of the output layer are shown in Fig. 8. Comparing this figure with Fig. 5 we see that a middle-layer node may respond strongly (become the winner) to the motion patterns whose COMs are located in a relatively large area, while an output-layer node responds strongly only if the COMs of the input patterns are located in a much smaller area. In other words, the nodes in the output layer have much more sharply tuned response selectivity for the COM locations than the nodes in the middle layer. This difference can be further demonstrated by comparing the analog outputs of the two layers. Fig. 9 shows the responses of 9 output nodes all favoring expansions with their COMs located in the central area of the visual field. Their selectivities for COM locations are much more sharply tuned than those middle-layer nodes as shown in Fig. 6.
Summarizing the above simulation results, we see that the behaviors of the MST nodes in both middle and output layer of the network model resemble closely the properties of the real MSTd neurons listed previously.
Moreover, real image data were used to test the model. Presented with an optic flow pattern obtained from a sequence of images taken by a moving camera, the network model successfully detected the focus of expansion (FOE) of the optic flow as the COM of the expansion motion. This result suggests a possible way the biological visual system detects the heading direction.