Theories of Visual
Perception: Problems and Perspectives
Aims The aim of this lecture is to introduce you to problems
in visual perception and general theoretical approaches to these problems.
Objectives You should be able to understand, comment on and compare
the key concepts that have led to various schools of thought in the study of
visual perception.
Greek
theories of visual perception
The
Greeks had two clearly opposing views on the way visual perception works -
intromission theories and extramission theories. Intromission theorists, such
as Democritus (c. 425 B.C.) and Epicurus (342-270 B.C.), believed that objects
cast of resemblances of themselves, called eidola, rather in the way that
snakes cast off their skins. These eidola are captured by the eye. It is the
entry of eidola into the eye that allow us to see their shape. They took as
evidence the fact that objects can be seen to be mirrored in the cornea of the
observer. However this approach leads to unanswered questions - How do eidola
pass through one another without interference? How do eidola of large objects
shrink to enter the eye? How do eidola from a single object reach many people
simultaneously? Extramission theorists, such as Plato (c. 427-347 B.C.)
believed that visual fire emanated from the eye and coalesced with light to
form a conduit that allows "motions" of the object to pass to the
sensorium. However as Aristotle (384-322 B.C.) points out it is unreasonable to
think that a ray from the eye could reach as far as the stars.
These
theories demonstrate a lack of a modern understanding of physics and optics but
the idea that perception involves the presence of copy of the object in the eye
or brain is represented in modern theories of template matching.
Johannes
Kepler and the retinal image
Modern
theories of vision start with Johannes Kepler who in Ad Vitellionem
paralipomena (1604) first correctly described the formation of the retinal
image in the eye. A few years later Christoph Scheiner (1619) observed the
retinal image by scraping away the sclera of the eye of an Ox which was placed
in a hole in a shutter (reported by Descartes, 1637). However the was a problem
- the retinal image was upside down. Why do we not see the world upside down?
The answer to this problem is that the retinal image is not observed. If there
existed a small man in the brain (a homunculus) looking at the retinal image
then we would still need to explain how he sees the world and so on to an
infinite regress.
Kepler's
theory of the retinal is pivotal. Old problems a not solved they are explained
away and new problems arise which still set the agenda today. Since the retinal
image is two dimensional, how do we see a three dimensional world? How do we
work out the real size of objects from their retinal size? How do we recognise
an object is the same from different views? How can we see features that are
not present in the retinal image?
Perspective
ambiguity
Perspective
drawing in art was developed by the C15th Italian artists/architects
Brunelleschi and Alberti. A convenient way of thinking about perspective
derives from Leonardo's
window. This is a technique for perspective drawing in which the artist
views a scene though a glass from a fixed vantage point. The artist then simply
copies what he sees in the window on canvas. However there are many possible
three-dimensional scenes that can give rise to the same two-dimensional image.
This
was forcibly brought home by Albert Ames demonstrations in the 1940's. The Ames
chair demonstration involves a collection of rods and shapes in 3D space, which
looks like a chair from one vantage point. The point of the demonstration is
that the visual input to a single eye is ambiguous. We cannot know the true 3D
layout of surfaces in a scene from a single viewpoint.
Perceptual
hypotheses
Constructivists
such as Hermann von Helmholtz and Richard Gregory start with the position that
external world cannot be directly perceived because of the poverty of the
information in the retinal images. Since information is not directly given, we
have to interpret the sensory data in order to construct percepts. Images are
interpreted on the basis of stored knowledge acquired through learning.
Helmholtz
believed the visual system drew "unconscious inferences" which he
later referred to as "inductive conclusions". Induction is the
process of drawing a general conclusion from individual instances - if all the
swans we ever see are white we draw the conclusion that "all swans are
white". This is same process as is used in the formation of scientific
hypotheses. Gregory takes this further and argues that perception is a
collection of hypotheses about the world. Evidence for this view comes from
analysis of many visual illusions that can be attributed to calibration errors
(e.g. the tilt illusion) or misplaced assumptions (Kanisza's triangle) and to
the top-down influence of knowledge and expectation.
The
ecological approach to perception
In
the 1950 James Gibson challenged this view of visual processing. He referred to
his theory as an ecological approach because, rather than emphasising the
poverty of the retinal image, he emphasised the information available in the
visual environment to an active observer. He believed that perception was
direct, by which he meant that perception is not mediated by a process of
inference, and percepts are not constructed from sensations. Gibson emphasised
relations in the environment. Whereas the constructivists argue size constancy
requires us to scale the retinal image by the viewing distance, Gibson argues
we judge size in relation to the amount of background texture covered by the
object. Motion of the observer gives rise to optic flow, which specifies how
the observer is moving in relation to the environment. Theories of direct
perception however do not provide very satisfactory explanations of visual
illusions.
Gestalt
psychologists, such as Wertheimer, Koffka and Köhler also rejected the
structuralist ideas that perceptions were constructed from sensations. They
addressed the question "Why do things look as they do?" (Koffka).
They noted the spontaneous tendency to split scenes into figure and
ground. They also studied the rules by which material is grouped and
segmented. The so-called laws of grouping include good continuation, proximity,
symmetry, similarity and common fate. These laws may simply reflect the
statistical regularities of the natural visual environment - similar patterns
normally arise from the same surface. The core Gestalt idea, that the whole is
greater that the sum of the parts, emphasises relations between parts. The
melody of a tune is still recognisable though it is played on different
instruments. Köhler attempted to explain perception through neural isomorphism i.e.
what we see reflects isomorphic patterns in the brain. A good example of this
kind of theorising is Köhler's explanation of phi motion. If two spatially
separated lights flash on and off in sequence, one experiences continuous
motion from the first position to the second position. Köhler supposed that
each flash sets up an electric field in the brain and the interaction of these
fields caused the perception of motion. Recently, there has been a resurgence
of interest in the difficult problems raised by grouping, segmentation and
perceptual constancy studied by the Gestalt school.
The
computational approach
Illustrated
well by the work of David Marr, computational psychologists aim to understand
visual processes by building computer models of these processes. Vision is seen
as the process of forming a description of what is in the scene from the
retinal images. This process is sometimes referred to as inverse graphics. From
the starting point of a description of the geometry of a scene, the reflectances
of surfaces, the position of light sources and the position of a viewer, it is
possible to construct a realistic image of a scene. The task of the visual
system is to reverse this process and recover the causes of the scene from the
images on the retina. Computational vision aims to specify mathematically how
this is done and to assign a functional role to neural components involved in
this computation.
Reading:
Gordon,
I.E. (1997) Theories of Visual Perception, John Wiley, Chichester.
Lindberg,
D.C. (1976) Theories of Vision from Al-Kindi to Kepler, U. of Chicago Press.
Prof. Alan Johnston