16 research outputs found
PSI vectors of 173 active units in layer C6.
(a) PSI vectors of phonemes. Each column corresponds to a unit. (b) Hierarchical clustering across phonemes. (c) Hierarchical clustering across units. (d) PSI vectors of six phonetic features.</p
Average decoding accuracies of the acoustic parameters in layers S5, S6, and S7 with different kernel sizes.
Note that S5 with kernel size 10×10 and S6 with kernel size 10×10 are layers in the original network. Layer S5 with kernel size 20×20 was obtained by fixing layers S1 to C4 of the original network; layer S6 with size 5×5 and layer S7 with kernel size 5×5 were obtained by fixing layers S1 to C5 of the original network. The STRF sizes in these layers are indicated in parentheses.</p
Influence of the pooling method used in the model.
(a, b) STRFs of all units in layers S2 and S3 without pooling. (c) PSI vectors of 77 active units in layer S6 without pooling. (d) PSI vectors of 96 active units in layer C6 with average pooling.</p
Visualization of the representative bases in layer S2 along with typical STRFs of the inferior colliculus neurons in animals.
(a–d) STRFs of several typical layer S2 units. Curves denote the spectral and temporal profiles obtained by SVD. (a) Two ON-type units. (b) Two OFF-type units. (c) Two localized checkerboard units. (d) Two spectral motion units. Similar STRFs of typical inferior colliculus neurons have been observed in physiological experiments. One can compare (a) with Fig 3E in [34], (b) with Fig 6A in [23], (c) with Fig 7A in [23] and (d) with Fig 6C in [29].</p
Encoding of the acoustic parameters F0, F1, F2, VOT, and spectral peak in higher layers.
The mean and standard deviation of decoding accuracies in 20-fold training and testing experiments are shown. (a) Decoding accuracies of F0, F1, and F2 based on the response amplitudes of all active layer C6 units. These accuracies are significantly higher than that of a random decoder (p−5). (b) Decoding accuracies of VOT and spectral peak based on the response amplitudes of all active layer C6 units. These accuracies are significantly higher than that of a random decoder (p−5). (c) Decoding accuracies of acoustic parameters based on the response amplitudes of active layer C6 units in six different groups. These accuracies are significantly higher than that of a random decoder (p−5). (d) Average decoding accuracies of the acoustic parameters in layers S4, C4, S5, C5, S6, and C6. The six groups of units are presented in the same order as in (c) (from left to right: plosive, fricative, nasal, low back, low front, and high front). In all panels, error bars indicate standard deviation over 20 accuracies. To avoid clutter, error bars in (d) are not shown.</p
Encoding of dynamic properties of phonemes in higher layers.
(a) Spectrograms of two phonemes. Two instances of each phoneme are shown. The first two formant (F1 and F2) contours of these instances are denoted by red and yellow curves, respectively. The formant contours of a phoneme was defined as the averaged contours of different instances of the phoneme. (b) Principal components (PCs) of the F1 and F2 contours calculated over 33 phonemes. F1 or F2 TVI of a phoneme is defined as the projection of the phoneme’s F1 or F2 contour onto the F1 or F2 PC. (c) Encoding of the dynamic properties of phonemes in different layers. Each dot indicates the correlations between the responses of a unit to the phonemes and their F1 (horizontal axis) and F2 (vertical axis) TVIs. In each layer, 200 units were randomly selected.</p
Distributions of STRF parameters of layer S2 units.
(a) Best temporal modulation frequency. (b) Response duration. (c) Center frequencies. (d) Spectral bandwidth. These four parameters respectively correspond to the peak and bandwidth with 90% power of the temporal and spectral profiles shown in Fig 3. (e) Tradeoff between temporal modulation (Best T) and spectral modulation. (f–i) Probability distribution of STRF parameters normalized from the corresponding histograms. For comparison, the normalized probability distributions in layers S1 and S3 and the reference distributions of inferior colliculus neurons in cats [30] are also plotted. The horizontal axis in each panel is normalized to [0, 1] by dividing all values by the maximum value.</p
Calculation of STRFs and example STRFs.
(a) Illustration of the visualization of an S2 unit whose basis has size 2 × 2 × u1, where u1 denotes the total number of S1 bases. The size of each S1 basis is 3 × 3. Suppose that there is a down-sampling operation with ratio 2 between layer S1 and layer S2, which could be a convolution with stride 2 in layer S2 (the case in this study) or a max pooling with ratio 2 and stride 2. In that case, we first need to expand each slice of the S2 unit, a 2 × 2 matrix, to a 4 × 4 matrix. Because there is a max pooling layer with pooling ratio 2 and stride 1 between layers S1 and S2, the first two dimensions of the S2 feature maps are 1 smaller than those of the S1 feature maps. To account for this effect, we pad zeros around the 4 × 4 matrices to obtain 5 × 5 matrices. Each 5×5 slice can be viewed as learned on the feature map, which is obtained by convolving an S1 basis on its previous layer, the input image. Then, the effect of this 5 × 5 slice in layer S1 is roughly equivalent to that of a 7 × 7 matrix (shown on the right in the dashed box) formed by summing the same S1 basis centered at 25 locations and weighted by the corresponding elements in the slice. For illustration, on the left in the dashed box, the sum of the S1 basis weighted by two elements (red and green) in each slice is shown. The STRF of the example S2 unit is the sum of all u1 7 × 7 matrices. (b) Example STRFs in layer S1. (c) Example STRFs in layer S2. (d) Example STRFs in layer S3.</p
STRF sizes of the units in different layers.
STRF sizes of the units in different layers.</p