There are many depth cues requiring only monocular vision that are possible to be utilized by the visual system to acquire depth information.
The optics of image formation can be described by perspective projection
(rather than parallel orthographic projection), which dictates that the size
of the image of an object is inversely proportional to the distance of the
object to the lens.
The simplest and most reliable means to determine the relative depths of objects. The occluding object is definitely closer than the occluded. But some prior knowledge of the shape of the objects may be necessary to determine occlusion.
For the image-forming lens of a visual system, more focusing power is required to form a sharp image of a close-by object than a far-away object (emitting almost parallel light). The effort of focusing the lens to obtain a chear image reflects the depth of the object.
The blurness or sharpness of an image of an object is related to the object's distance for different reasons. Other than the blur caused by out-of focuse, atmospheric blur may be caused by particles in the air (mist, fog, smog, etc.) that scatter the light emitted from a distant object. The blurness of the image can be represented by the amount of fine details, or equivalently by the relative amount of energy in the high spatial fequency components. This spatial frequency changes could be detected by the frequency sensitive cells such as the simple (sensitive to phase) and complex (insensitive to phase) cells in primary cortex.
Many natural scenes have fine textures, such as the sandy beaches, grass lands, leaves and twigs in the woods and wavy water surface. Same as the size of an object, the fineness of the texture also changes inversely with the distance. A certain texture pattern can be represented by the frequency band it occupies in the power spectrum of the image. This frequency band shift toward either high or low frequency end depending on whether the diestance is getting smaller or bigger.
Many natural (river banks, tree trunks) or man-made objects (roads, railroad tracks) are characterized by two parallel lines. So long as the parallel lines are not on an equal-distance plane to the eye, they will converge in the image plane through perspective projection, thereby giving a powerful clue for depth. The orientation difference as the clue for depther coiuld be easily detected by the orientation sensitive cells in striate cortex.
Parallax is the apparent displacement or the difference in apparent direction of an object as seen from two different points not on a straight line with the object. This could be implemented by a binocular system (to be discussed later) or a monocular system going through a motion. The difference between the views before and after the motion gives strong indication of depth. This is sometimes subconsciously used by humans and other animals such as monkeys. As monkeys have smaller separation between their two eyes, they have less binocular disparity and therefore less effective stereopsis. They rely more on motion parallax by moving their heads side-to-side to determine the distance of an object of interest.