Articulatory feature modeling in Automatic Speech Recognition (ASR), while not (yet) mainstream, has received a significant amount of attention in recent research ([1, 2, 3, 4] inter alia). One study in particular  has provided evidence that hierarchical articulatory feature models can potentially significantly outperform their non-hierarchical counterparts. In such a system, the probability of an articulatory feature is conditional upon some other feature – for example, the classifier for place of articulation may depend on the manner of articulation. In this work, we seek to further the studies in  by changing the assumption of perfect recognition of the conditioning class made in that study. The gains shown over non-hierarchical classification are minimized; our analysis shows that this is in part because the errors in different acoustic feature streams are in fact correlated. We conclude the study by observing that joint acoustic feature modeling, rather than conditional modeling, may provide better gains. 1
To submit an update or takedown request for this paper, please submit an Update/Correction/Removal Request.