Intermediate representations between the speech signal and phones may be used to improve discrimination
among phones that are often confused. These representations are usually found according to broad phonetic
classes, which are defined by a phonetician. This article proposes an alternative data-driven method to generate
these classes. Phone confusion information from the analysis of the output of a phone recognition system is used
to find clusters at high risk of mutual confusion. A metric is defined to compute the distance between phones. The
results, using TIMIT data, show that the proposed confusion-driven phone clustering method is an attractive
alternative to the approaches based on human knowledge. A hierarchical classification structure to improve phone
recognition is also proposed using a discriminative weight training method. Experiments show improvements in
phone recognition on the TIMIT database compared to a baseline system