Depth Perception


Aims To introduce the various cues to depth and discuss how they might be combined


Objectives You should be able to describe pictorial, binocular and dynamic cues for depth perception and understand the issues involved in the combination of depth cues.



The 3D dimensional world projects onto the curved surface at the back of the eye. Though curved, the surface is two dimensional. Since we do not have direct access to the third dimension of visual space the visual system has to utilise various sources of information in the projection of the 3D world onto both eyes to recover depth, distance and the 3D shape of objects.


Optical cues


Convergence and accommodation


The eyes are separated (6 cm) laterally in the head and face forwards in animals that place more emphasis on accurate depth judgement than vigilance. One source of information comes from convergence of the eyes. In principle the eyes could act like a range finder from the position (fixed) and convergence of the eyes we could work out 3D distance. The lens in the eye is made thicker by the action of the ciliary muscles to allow near objects to be focussed on the back of the retina. Hence the degree of activation of the ciliary muscle provides a cue to depth.


Convergence and accommodation are only effective at close distances and can only tell you about the distance to a single object in the visual field. Blur on the other hand varies with depth and could in principle be used as a depth cue. One problem with this cue is that points nearer and further from the depth plane both look blurred. Also pupil size affects depth of field and hence the degree of blur.


Convergence plays a greater role in the scaling of retinal disparity. Evidence for this comes from the "wallpaper illusion". Convergence in front of (or behind) a repeating pattern, such that a pattern in one eye binocularly overlaps the neighbouring pattern in the other eye gives the appearance of a single surface at the convergence distance. The size of the pattern elements varies accordingly patterns that appear nearer appear smaller, since they subtend the same visual angle.


Pictorial cues to depth and shape:


Remarkably we can get a good impression of depth in 2D pictures even though other depth cues indicate the surface is flat.


Atmospheric perspective particles and vapour in the atmosphere cause scattering of light that makes very distant surface appear hazy.

Occlusion near surfaces overlap far surfaces. Occlusion gives information about depth order rather than distance.

Relative height and size Objects that are placed further away from the horizon are seen as nearer. Larger objects are seen as closer.

Linear perspective provides a strong cue to distance that can affect apparent size. Gibson pointed to the role of texture gradients as a cue to distance along surfaces.

Shading provides a cue for shape rather than distance.


Motion based monocular cues


Helmholtz pointed to the role of motion in depth perception. As the observer moves relative to the environment nearer surfaces move further and faster in the retinal image than do distant surface and this motion parallax cue provides information about distance. Rogers & Graham (1979) showed that human observers do indeed use this cue. The isolated the motion parallax cue by constructing a random dot display that was yoked to the movement of the head. The movement of the dots on the display varied in way that was consistent with a sinusoidal corrugated surface viewed from the current head position. Peak sensitivity for motion parallax defined surfaces was 0.2 - 0.4 cycles/degree, much lower than peak sensitivity for luminance contrast (6 cycles/degree).


Gibson pointed to the global patterns of retinal motion generated by an observers movement in the environment - optic flow. Motion towards a surface generates an expanding pattern whereas motion away from a surface generates a contracting pattern. The focus of expansion shows our direction of movement (if the eye is stationary). Lee and Aronson (1974) placed subjects in a swinging room, suspended from the ceiling. They showed that subjects swayed in phase with the room indicating that the subjects were using the optic flow cues to control balance.


Motion also provides information about 3D shape. Wallach and O'Connell (1953) showed that subjects could see the 3D shape of wire frame object from moving projections the kinetic depth effect. This indicates the visual system employs a rigidity assumption. We see a rigid 3D rotation rather than a 2D non-rigid distortion.


Disparity-based binocular cues


Each eye gets a slightly different view of the world combining the two views allows stereoscopic vision. Points on the horoptor project onto corresponding locations on the two retinae. The retinal disparity increases with object distance from the fixation plane. Unlike blur retinal disparity is a signed quantity. Object nearer than the horopter give rise to crossed disparity whereas object beyond the horopter give rise to uncrossed disparity. Uncrossed retinal disparity is inversely proportional the square of the distance of the object thus disparity is not an effective cue for small depth differences at large distances.


Evidence that retinal disparity gives rise to the experience of depth was provided by Wheatstone who created the first stereoscope. Slightly different images presented to the two eyes results in a depth percept. To recover depth we need to measure retinal disparity. One way of doing this is to identify corresponding features in the two images and then work out the difference in retinal position. However Julesz (1971) showed using random dot stereograms (RDS) that this was an unlikely strategy. RDSs isolate the disparity cue. The difficulty in matching features in the two eyes is called the stereo correspondence problem. For random dot stereograms there are two many possible matches to consider. The idea that we might match micro-patterns can be rejected because the dot patterns can be made dynamic (a new random pattern on each frame) and one of the patterns can be filtered without loss of stereopsis.


Occlusion-based binocular cues


When one looks at a raised 3D square parts of the scene at the left edge of the square are visible by the left eye but not the right and the symmetrical relationship holds for the right side of the object. The object occludes the background in a different way for the two eyes and a binocular correspondence cannot be made yet this binocular occlusion cue can itself give rise to a depth percept as shown by Shimojo and Nakayama (1990). Unlike ordinary stereograms switching views does not give rise to a depth percept because the images at the eye no longer mirror occlusion effects.


Cue combination


We have considered a range of depth cues but we do not see multiple copies of the world. The cues must be combined in some way. Marr (1982) proposed we represent the world using what he termed the 2.5 D sketch (a set of intrinsic images representing distance, surface orientation and curvature) in which information from multiple sources are combined but what are the combination rules. This can be investigated by combining disparate cues and measuring the resulting combined depth percept. Young, Landy and Moloney (1993) provide evidence for a weighted linear combination rule.



Anderson, B.L. and Nakayama, K. (1994) Towards a general theory of stereopsis: binocular matching, occluding contours and fusion. Psych. Rev., 101, 414-445.

Bruce, V., Green, P.R. & Georgeson, M.A. (1996) Visual Perception: Physiology, Psychology and Ecology 3rd Ed. Psychology Press, Hove.