Human Speech and Music
We have recently published a series of studies in human speech and music perception with the primary goal of relating our neurophysiological studies in songbirds to human perception. We have shown that avian auditory cortical neurons tile the acoustical space in such a way that invariant features that are particularly informative are emphasized and, we argued, that the neurons were functionally organized into subsets that were specialized at extracting acoustical structure that was particularly important for distinct percepts. In our human studies, we tested whether we could obtain psychophysical tuning curves (or transfer functions) that would explicitly show us which acoustical features were important for human perception of speech and music. To relate these studies to the transfer functions obtained in our neurophysiological experiments, we used manipulations of sound in the modulation space. To do so we designed a novel filtering method that allowed us to selective remove particular spectral or temporal modulations in sound signals.
When we applied this filtering technique to speech to obtained degraded speech signals, we found that comprehension was significantly impaired when temporal modulations <12 Hz or spectral modulations <4 cyc/kHz were removed. More specifically, the modulation transfer function was band-pass in temporal modulations and low-pass in spectral modulations (Elliott and Theunissen, PLOS Comp Bio 2009). This speech transfer functions showed similarities with the gain function of the subset of neurons tuned from sound features important for timbre. This is not surprising since speech formants and formant transitions which are critical for speech intelligibility are timbral qualities. This research could provide insights into the neural pathways that are critical in humans for speech perception. In addition, the determination of the human speech modulation transfer function furnishes an additional method for producing speech signals with reduced bandwidth but high intelligibility. Such compression could be used for audio applications such as file compression or noise removal, and for clinical applications such as signal processing for cochlear implants. We are currently continuing these studies by determining whether we could improve intelligibility in noisy environments by a matching modulation filtering.
In music perception experiments, we have investigated the dimensionality of the perceptual timbre space and the dependence of these timbral dimensions to specific acoustical features. We have found that the timbre of western orchestral instruments has five dimensions. Our percept of timbre depends on both the spectral and temporal envelope of the physical sound but the perceptually relevant distinction was not based on temporal vs spectral features but on slower vs faster dynamics and on pitch dependent vs pitch independent features (Elliot et al, 2013).
When we applied this filtering technique to speech to obtained degraded speech signals, we found that comprehension was significantly impaired when temporal modulations <12 Hz or spectral modulations <4 cyc/kHz were removed. More specifically, the modulation transfer function was band-pass in temporal modulations and low-pass in spectral modulations (Elliott and Theunissen, PLOS Comp Bio 2009). This speech transfer functions showed similarities with the gain function of the subset of neurons tuned from sound features important for timbre. This is not surprising since speech formants and formant transitions which are critical for speech intelligibility are timbral qualities. This research could provide insights into the neural pathways that are critical in humans for speech perception. In addition, the determination of the human speech modulation transfer function furnishes an additional method for producing speech signals with reduced bandwidth but high intelligibility. Such compression could be used for audio applications such as file compression or noise removal, and for clinical applications such as signal processing for cochlear implants. We are currently continuing these studies by determining whether we could improve intelligibility in noisy environments by a matching modulation filtering.
In music perception experiments, we have investigated the dimensionality of the perceptual timbre space and the dependence of these timbral dimensions to specific acoustical features. We have found that the timbre of western orchestral instruments has five dimensions. Our percept of timbre depends on both the spectral and temporal envelope of the physical sound but the perceptually relevant distinction was not based on temporal vs spectral features but on slower vs faster dynamics and on pitch dependent vs pitch independent features (Elliot et al, 2013).