Theunissen Logo


3 Spaces

The long term goal of the research in the laboratory is to understand the neural computations that underlie auditory perception with a focus on how the brain of animals and humans decipher behaviorally relevant sounds and, in particular, vocalizations used in communication.  We believe that the neural representation of sound, the major auditory perceptual dimensions (pitch, timbre, rhythm) and the statistics of natural sounds are intertwined and that understanding this 3-way relationship is critical for explaining audition.   Our research approach is multifaceted and interdisciplinary; we combine animal behavior, human psychophysics, sensory neurophysiology and theoretical and computational neuroscience.


Summary of recent research activities.

  1. Statistics of natural sounds and perception.
  2. Neural representations for complex sounds. 
  3. Developement of auditory system.
  4. Human Psychophysics and Neurosciences
  5. Animal communication.
  6. Computational Neurosciences.

Statistics of natural sounds and perception

 Natural stimuli (visual or auditory scenes) have a particular statistical signature that is a consequence of physical propreties of objects composing the scene (visual or auditory sound source). Some aspects of this structure carry the information about the identity and state of the object. In audition, the informative structure is found in particular temporal patterns (the rythmn of a song, the chirp rate of a cricket) or spectral patterns (the melody in a song, the formants in speech). Acoustical structure that is invariant to sound degradations due to sound propagation or corruption by enviromental noise or other sound source will be particular informative. We are interrested in characterizing this informative acoustical structure patIn audition, are characterized by their unique structure. The relationship between the tuning of the neurons in different functional classes and the acoustical features that are present in natural sounds is more clearly analyzed in the spectro-temporal modulation space (Sing and Theunissen, JASA 2003).  The modulation space describes the oscillations in power across frequency and time.  Spectral modulations (measured cycles/kHz or cycles/Octave) are oscillations in power across a short-time frequency spectrum at particular times, such as those found in harmonic stacks. Temporal modulations (measured in Hz) are oscillations in power (amplitude) over time.  Overall, we found that the overall gain of the STRFs in the modulation space (called the ensemble modulation transfer function) is maximal for features in natural sounds that are maximally informative in the sense of being the most variable across different types of natural sounds (Woolley et al, Nature Neurosci, 2006).  In addition, across the different functional neuronal types, we found specializations for representing sound features that are essential for mediating different percepts.  The faster broadband neurons have gain for high temporal modulations, a sound feature that is critical for rhythm perception.  The slower spectral narrowband neurons are sensitive to fine spectral shape, a sound feature important for rhythm.  Slower broadband neurons and wideband neurons are sensitive to lower spectral and temporal modulations a sound feature important for timbre (Woolley et al, J. Neurosci, 2009).  This specialization of the neural representation for information bearing sound features in natural sounds and the relationship with perception is some of the stronger support that we have obtained so for our core hypotheses. In previous work, we had shown neural selectivity in the avian auditory cortex both in terms of overall neural activity, measured in spike rates, (Grace et al, J. Neurophys, 2003) and in terms of information transmitted, measured in bits/s, (Hsu et al., J. Neurosciences, 2006) for natural sounds over matched synthetic sounds. 


Neural representation of complex sounds

Our neurophysiological studies are currently carried out in songbirds.  We have chosen this avian animal model for two reasons.  First, songbirds use a rich repertoire of communication calls in complex social behaviors, such as territorial defense and long-term pair bonding.  Second, songbirds are one of the few families of vertebrates that are capable of vocal learning.  In part because of this unique trait, birdsong, the neuroethological study of vocal learning in songbirds, has been a very active and productive area of research.  Many laboratories are studying the song system, a specialized set of avian forebrain areas that are involved in vocal production and learning.  For our research goals, the rich vocal repertoire allows us to investigate how behaviorally relevant complex communication sounds are represented in the auditory system. In addition, we can also study how the auditory system and the vocal system interact; more specifically, we can examine the input to the song system from auditory areas when the bird hears his tutor’s song or its own vocalizations. 
In recent years, we have made significant progress in deciphering the neural representation for complex sounds in the higher auditory areas of songbirds.  To do so, we have estimated the stimulus-response function of a large number of neurons in the auditory midbrain and in the primary and secondary avian auditory cortex.  In most of our studies the stimulus-response function has been described in terms of the spectro-temporal receptive field (STRF), the linear gain between the sound stimulus in its spectrographic representation and the time varying mean firing rate of a particular neuron. In the avian auditory cortex, we found that, based on their STRFs, neurons could be classified into three large functional groups (narrowband, wideband and broadband) as well as two other smaller groups (offset and hybrid) . We showed how these distinct functional classes encode features of sounds that are essential for different percepts(Woolley et al., J. Neurosci. 2009).
We have also examined the coding of natural sounds at different levels of auditory processing. We have recorded and analyzed the neural activity in the auditory midbrain (area MLd – analogous to the mammalian inferior colliculus), in the primary auditory pallium (area Field L – analogous to primary auditory cortical areas A1 and AAF), and, in a secondary auditory cortical area that has been implicated in the processing of familiar conspecific vocalizations (area CM ).  We found clear evidence for hierarchical processing.  STRFs in Field L make a more heterogeneous set and are more complex than in MLd (Woolley et al., J. Neurosci. , 2009).  At the next level of processing in CM, the neural responses should further complexity and the simple STRF description fails to describe the stimulus-response function (Gill et al,  J. Neurophys. 2008).  On one hand, this analysis revealed to us how neural representations in higher levels could be obtained by step-wise computations.  On the other hand, a simple feed-forward model did not explain all the observed differences in these inter-connected areas.  Neurons in MLd appeared to be particular sensitive to the temporal structure in the sound (more so than higher level neurons) and this sensitivity increased while processing conspecific vocalizations versus matched noise (Woolley et al, J. Neurosci. 2006).  In CM, neurons were selective to complex sound features in an unusual way.  In lower areas, feature selective neurons increase their firing rates as the intensity (or contrast) of the feature increased.  But in CM, neurons increased their firing rates proportionally to the degree with which this intensity is surprising given the past stimulus (Gill et al, J. Neurophys. 2008).  We believe that this neural representation for detecting unexpected complex features is a result of experience with behaviorally relevant sounds and yields an efficient code for incorporating new auditory memories.  In summary, while comparing the neural coding at three levels of the auditory processing stream that are known to feed into each other (MLd to Field L and Field L to CM), we found evidence for hierarchical processing where we can identify intermediate steps towards building neurons that are sensitive for distinct subsets of complex acoustical features.  These subsets tile a region of the acoustical space that is highly informative when processing natural sounds, with each subset extracting invariant features that are important for different percepts.  But we also found evidence for unique coding properties in each area, suggesting that parallel channels with specialized functions are already in place starting at the level of the auditory midbrain.



Development of the auditory system

            Our research has shown that the auditory system appears to be selective or tuned for behaviorally informative sound features in natural sounds.  We are researching to what extent this specialized processing is dependent on sensory exposure during early development.  We first performed the same characterization that we did in the auditory forebrain of adult birds in young birds at an age where their audiogram is normal and where they are beginning to produce song.  We found that, although these young birds, could hear perfectly well, the responses in the higher level neurons in the young animal lacked the specificity found in the adult (Amin et al, J. Neurophys 2007).   This study did not exclude the possibility that this difference was due to a maturation effect versus an experiential effect but it showed that neural responses in auditory cortex changed significantly during development.
We then began a series of experiments where we manipulated the early acoustical environment of developing birds.   In this research we used both males and females (who don’t sing) and changed the environment in three different ways: we isolated birds from their singing father, we cross-fostered zebra finches with Bengalese finches and we raised birds in continuous white noise.  
We failed to observe any differences in responses in the auditory cortex between adult male and female birds, suggesting that the experience of producing a song is not necessary for generating a highly selective auditory cortex (Hauber et al, J Comp Phys, 2007).  Moreover, neurons in father absent females preserved the selectivity for social complex song over matched synthetic sounds that was observed in control females (Hauber et al, J of Ornithology, 2007).  As a result of those studies, we concluded that the experience with song per se is not necessary as long as birds experience an otherwise normal acoustical environment: non-singing young birds and female birds produce many other complex vocalizations used in communication.  
Our results with the cross-fostered bird and birds raised in white-noise were more revealing.  Cross-fostered zebra finches showed a depressed neural response to both zebra finch and Bengalese finch song and we are able to show that this depressed responses leads to a decrease in neural discriminability for songs (Woolley et al, Dev Neurobiology, 2010).  The results from the white noise treatment are even more drastic.  Birds raised in white noise fail to develop the selectivity for conspecific song that is observed in socially raised animals. An information theoretic analysis also shows an increase in redundancy in the ensemble neural code for the representation of song but not complex synthetic sounds (Amin et al, submitted).  We can therefore conclude that both social and sound experience is critical for the development of the avian auditory cortex.  In future work, we plan on further investigating potential mechanisms for this developmental plasticity using a combination of modeling and pharmacological studies.


Human Psychophysics and Neurosciences. 

We have recently begun a series of studies in human speech and music perception with the primary goal of relating our neurophysiological studies in songbirds to human perception.  We have shown that avian auditory cortical neurons tile the acoustical space in such a way that invariant features that are particularly informative are emphasized and, we argued, that the neurons were functionally organized into subsets that were specialized at extracting acoustical structure that was particularly important for distinct percepts.  In our human studies, we tested whether we could obtain psychophysical tuning curves (or transfer functions) that would explicitly show us which acoustical features were important for human perception of speech and music.  To relate these studies to the transfer functions obtained in our neurophysiological experiments, we used manipulations of sound in the modulation space.  To do so we designed a novel filtering method that allowed us to selective remove particular spectral or temporal modulations in sound signals.
When we applied this filtering technique to speech to obtained degraded speech signals, we found that comprehension was significantly impaired when temporal modulations <12 Hz or spectral modulations <4 cyc/kHz were removed.  More specifically, the modulation transfer function was band-pass in temporal modulations and low-pass in spectral modulations (Elliott and Theunissen, PLOS Comp Bio 2009).  This speech transfer functions showed similarities with the gain function of the subset of neurons tuned from sound features important for timbre.  This is not surprising since speech formants and formant transitions which are critical for speech intelligibility are timbral qualities.  This research could provide insights into the neural pathways that are critical in humans for speech perception.  In addition, the determination of the human speech modulation transfer function furnishes an additional method for producing speech signals with reduced bandwidth but high intelligibility.  Such compression could be used for audio applications such as file compression or noise removal, and for clinical applications such as signal processing for cochlear implants.  We are currently continuing these studies by determining whether we could improve intelligibility in noisy environments by a matching modulation filtering. 

In music perception experiments, we have investigated the dimensionality of the perceptual timbre space and the dependence of these timbral dimensions to specific acoustical features.  We have found that the timbre of western orchestral instruments has five dimensions.  Our percept of timbre depends on both the spectral and temporal envelope of the physical sound but the perceptually relevant distinction was not based on temporal vs spectral features but on slower vs faster dynamics and on pitch dependent vs pitch independent features (Elliott et al, submitted).

Auditory and Vocal Behavior in animals.

In our description of behaviorally relevant features of environmental sounds and vocalizations, we have relied extensively on the statistical analysis of natural sounds.  In this line of research, we use a comparative approach to refine this statistical analysis and to more directly assess the validity of our statistical analysis with behavioral experiments.
 For our studies with songbirds and in human speech, we performed statistical analyses of zebra finch song, Bengalese song and human speech spoken by adult American speakers.   In collaboration with Y. Cohen (U Penn), we did an initial study of the acoustical structure of macaque vocalizations.  In an approach similar to the one that is performed our laboratory, Cohen and his collaborators used that information to relate this acoustical structure to the response properties of auditory neurons in the ventral prefrontal cortex of the macaque, an area which has been implicated in semantic processing of communication sounds both in humans and in macaques (Cohen et al, J Neurophys 2007). 
In collaboration with S. Glickman (UCB, psychology) and N. Mathevon (Univ. St. Etienne, France), we are also studying the rich repertoire of the spotted hyena.   We have recorded a large set of vocalizations produced in different contexts: groans and growls produced in non antagonistic approach behavior, loud whoops produced during separation and displacements and the giggle sounds produced when the animals are frustrated.   We found that the acoustical structure of the groan changed when the animals where approaching an inanimate (bone) versus an animate object (cub) and we found that the structure of the hyena giggle bouts carried information not only about the identity of the sender but also about its social rank (Mathevon et al., BMC Ecology, 2010).
We have also begun playback experiments in both songbirds and hyenas.  In these experiments, we are systematically manipulating the communication calls and testing behavioral responses in various playback and operant conditioning experiments.  Our statistical analyses of the sounds, our neurophysiological data in songbirds and our psychophysical data in humans suggest that birds will be particularly sensitive to certain degradations in song in recognition, discrimination and preference tasks.  

Computational Neuroscience.

In collaboration with J. Gallant (UC Berkeley), we are developing a suite of algorithms that we are also implementing for release to the neurosciences community at large.  We are working on two software packages STRFPAK and STRFLAB.  The core routine of STRFPAK is based on a ridge regression algorithm that was customized for natural sounds and images (Theunissen et al, 2001).  STRFLAB is a much more extensive set of routines that include additional regularization methods and solutions for the generalized linear model.
 In parallel, we have examined the effect of different types of stimulus pre-processing for the estimation of STRFs in audition.   We found that pre-processing steps that included compressive non-linearities and automatic gain control increased the quality of the STRF fit but that the use of a wavelet decomposition was not superior to the use of a regular spectrographic decomposition (Gill et al, J Comp Neuro, 2006).  We also tested a much more complex representation of the stimulus that was based on the probability of finding particular features given prior expectations and recent past.  The use of this stimulus representation allowed us to describe the response properties of secondary auditory neurons that failed to be described which the more classical STRF approach (Gill et al,  J Neurophys, 2008).  Finally, we also developed methods that allowed us to compare the STRFs obtained from different stimulus ensembles.  To do so, one find a common sub-space for the normalization and regularization step in the multiple linear regression.  We used this method to compare the STRF obtained when the auditory system is processing song to those obtained when the auditory system is processing matched synthetic sounds that lacked the behavioral meaning (Woolley et al, J Neurosci, 2006).       

Also as part of this effort, we are interested in information theoretic approaches for studying the neural code (Borst and Theunissen, Nat Neuro, 1999).  The use of information theory has the advantage that it is model independent: in other words, one can quantify the amount of information transmitted about certain stimulus parameters without having to extract the stimulus-response function of the neuron.  Information theoretic approaches are therefore complementary to the system analysis techniques described above.  Information theory can also be use to analyze the nature of the neural code; for example, to assess the importance of temporal patterns in spike trains or the potential synergy in a population code.  One of the difficulties for the use of information theoretic measures is that it requires very large data sets, which often makes it unpractical in sensory physiology, in particular when the space of stimuli being explored is large.  Our major contribution to the field has been to develop estimation methods to address this data limitation issue.  In previous work, we had shown that single neurons could be modeled as inhomogenous gamma processes and that information values could then be estimated from these models (Hsu et al, J Neurosci, 2004).  More recently, we have developed an approximation to deal with under-sampling issues of the stimulus space (Gastpar et al, IEEE Inf Theo, 2010).   



Last Updated Jan 2012