Several theoretical problems are raised by multimodal perception. After all, the world is a “blooming, buzzing world of confusion” that constantly bombards our perceptual system with light, sound, heat, pressure, and so forth. To make matters more complicated, these stimuli come from multiple events spread out over both space and time. To return to our example: Let’s say the car crash you observed happened on Main Street in your town. Your perception during the car crash might include a lot of stimulation that was not relevant to the car crash. For example, you might also overhear the conversation of a nearby couple, see a bird flying into a tree, or smell the delicious scent of freshly baked bread from a nearby bakery (or all three!). However, you would most likely not make the mistake of associating any of these stimuli with the car crash. In fact, we rarely combine the auditory stimuli associated with one event with the visual stimuli associated with another (although, under some unique circumstances—such as ventriloquism—we do). How is the brain able to take the information from separate sensory modalities and match it appropriately, so that stimuli that belong together stay together, while stimuli that do not belong together get treated separately? In other words, how does the perceptual system determine which unimodal stimuli must be integrated, and which must not?
Once unimodal stimuli have been appropriately integrated, we can further ask about the consequences of this integration: What are the effects of multimodal perception that would not be present if perceptual processing were only unimodal? Perhaps the most robust finding in the study of multimodal perception concerns this last question. No matter whether you are looking at the actions of neurons or the behavior of individuals, it has been found that responses to multimodal stimuli are typically greater than the combined response to either modality independently. In other words, if you presented the stimulus in one modality at a time and measured the response to each of these unimodal stimuli, you would find that adding them together would still not equal the response to the multimodal stimulus. This superadditive effect of multisensory integrationindicates that there are consequences resulting from the integrated processing of multimodal stimuli.
The extent of the superadditive effect (sometimes referred to as multisensory enhancement) is determined by the strength of the response to the single stimulus modality with the biggest effect. To understand this concept, imagine someone speaking to you in a noisy environment (such as a crowded party). When discussing this type of multimodal stimulus, it is often useful to describe it in terms of its unimodal components: In this case, there is an auditory component (the sounds generated by the speech of the person speaking to you) and a visual component (the visual form of the face movements as the person speaks to you). In the crowded party, the auditory component of the person’s speech might be difficult to process (because of the surrounding party noise). The potential for visual information about speech—lipreading—to help in understanding the speaker’s message is, in this situation, quite large. However, if you were listening to that same person speak in a quiet library, the auditory portion would probably be sufficient for receiving the message, and the visual portion would help very little, if at all (Sumby & Pollack, 1954). In general, for a stimulus with multimodal components, if the response to each component (on its own) is weak, then the opportunity for multisensory enhancement is very large. However, if one component—by itself—is sufficient to evoke a strong response, then the opportunity for multisensory enhancement is relatively small. This finding is called the Principle of Inverse Effectiveness (Stein & Meredith, 1993) because the effectiveness of multisensory enhancement is inversely related to the unimodal response with the greatest effect.
Another important theoretical question about multimodal perception concerns the neurobiology that supports it. After all, at some point, the information from each sensory modality is definitely separated (e.g., light comes in through the eyes, and sound comes in through the ears). How does the brain take information from different neural systems (optic, auditory, etc.) and combine it? If our experience of the world is multimodal, then it must be the case that at some point during perceptual processing, the unimodal information coming from separate sensory organs—such as the eyes, ears, skin—is combined. A related question asks where in the brain this integration takes place. We turn to these questions in the next section.