Bioacoustics and Statistics
Perception of sound identification is organized along three major percepts: rhythm, pitch and timbre. We believe that the nature of these distinct percepts is closely related to the statistics of natural sounds; these acoustical perceptual dimensions encode information bearing features in sound that are invariant across sound sources or particular environmental degradations. For example, the pitch of a note and the resulting melody when notes are strung together are constant across musical instruments or human or animal singers. Similarly, differential frequency attenuation as sound travels long distances or particular frequency masking from other sources will have a relatively minor effect on the rhythmic qualities of the sound.
The auditory system that mediates these percepts must therefore be sensitive to particular classes of sound features that are organized along dimensions that are closely related to the major perceptual dimensions and the dimensions that characterize the statistics of natural sounds. Such a neural representation must be the result of computations along the auditory processing stream because it is absent at the auditory periphery where the sound pressure waveform is decomposed into frequency channels by the inner ear. This decomposition results in an efficient representation for transmitting a reliable copy of the sound pressure waveform but, by itself, does not provide any of the synthesis required for extracting the information bearing features in the complex sounds.
The major hypotheses that we are investigating can therefore be framed around a central idea: the three separate dimensional spaces characterizing the statistics of natural sounds, the major auditory perceptual features and the neural representation at the higher levels in the auditory system are related in the strong sense of being similar and interdependent. This central idea lead us to the following more specific hypotheses :
i) the role of the auditory computations is to extract information bearing features in natural sounds;
ii) these information bearing features are organized along major acoustical percepts;
iii) the information bearing features and the corresponding percepts correspond to invariant characteristics of natural sounds.
In addition, we are interested in investigating the ontogeny of the auditory representations. We believe that, to a large extent, the higher level representations of sounds are learned and that experience during development plays a particular important role in shaping the auditory circuitry.
The auditory system that mediates these percepts must therefore be sensitive to particular classes of sound features that are organized along dimensions that are closely related to the major perceptual dimensions and the dimensions that characterize the statistics of natural sounds. Such a neural representation must be the result of computations along the auditory processing stream because it is absent at the auditory periphery where the sound pressure waveform is decomposed into frequency channels by the inner ear. This decomposition results in an efficient representation for transmitting a reliable copy of the sound pressure waveform but, by itself, does not provide any of the synthesis required for extracting the information bearing features in the complex sounds.
The major hypotheses that we are investigating can therefore be framed around a central idea: the three separate dimensional spaces characterizing the statistics of natural sounds, the major auditory perceptual features and the neural representation at the higher levels in the auditory system are related in the strong sense of being similar and interdependent. This central idea lead us to the following more specific hypotheses :
i) the role of the auditory computations is to extract information bearing features in natural sounds;
ii) these information bearing features are organized along major acoustical percepts;
iii) the information bearing features and the corresponding percepts correspond to invariant characteristics of natural sounds.
In addition, we are interested in investigating the ontogeny of the auditory representations. We believe that, to a large extent, the higher level representations of sounds are learned and that experience during development plays a particular important role in shaping the auditory circuitry.
Discoveries: natural sound in the brain
Environmental sounds and animal vocalizations mostly occupy a subset of all physically plausible sounds. In this subset, natural sounds can be categorized into three coarse functional groups: noisy, tonal, and click-like sounds; the nature of the sound being a direct consequence of the physical properties of the sound emitter. In speech signals, by combining voiced and unvoiced sounds, and fast and slower lip and tongue motions, humans make use of these three sound types. The combination of the properties of the sound emitter and effects of the environment determine what we call the statistics of natural sounds. These statistics also determine the sound features that can carry information in communication signals. Acoustical structure that is invariant to sound degradations due to sound propagation or corruption by environmental noise or other sound source will be particularly informative.
We are interested in characterizing this informative acoustical structure. The relationship between the tuning of the neurons in different functional classes and the acoustical features that are present in natural sounds is more clearly analyzed in the spectro-temporal modulation space (Sing and Theunissen, JASA 2003). The modulation space describes the oscillations in power across frequency and time. Spectral modulations (measured cycles/kHz or cycles/Octave) are oscillations in power across a short-time frequency spectrum at particular times, such as those found in harmonic stacks. Temporal modulations (measured in Hz) are oscillations in power (amplitude) over time. Overall, we found that the overall gain of the STRFs in the modulation space (called the ensemble modulation transfer function) is maximal for features in natural sounds that are maximally informative in the sense of being the most variable across different types of natural sounds (Woolley et al, Nature Neurosci, 2006). In addition, across the different functional neuronal types, we found specializations for representing sound features that are essential for mediating different percepts. The faster broadband neurons have gain for high temporal modulations, a sound feature that is critical for rhythm perception. The slower spectral narrowband neurons are sensitive to fine spectral shape, a sound feature important for rhythm. Slower broadband neurons and wideband neurons are sensitive to lower spectral and temporal modulations a sound feature important for timbre (Woolley et al, J. Neurosci, 2009). This specialization of the neural representation for information bearing sound features in natural sounds and the relationship with perception is some of the stronger support that we have obtained so for our core hypotheses. In previous work, we had shown neural selectivity in the avian auditory cortex both in terms of overall neural activity, measured in spike rates, (Grace et al, J. Neurophys, 2003) and in terms of information transmitted, measured in bits/s, (Hsu et al., J. Neurosciences, 2006) for natural sounds over matched synthetic sounds.