Partitioning algorithms play a key role in many scientific and engineering
disciplines. A partitioning algorithm divides a set into a number of disjoint
subsets or partitions. Often, the quality of the resulted partitions is
measured by the amount of impurity in each partition, the smaller impurity the
higher quality of the partitions. In general, for a given impurity measure
specified by a function of the partitions, finding the minimum impurity
partitions is an NP-hard problem. Let M be the number of N-dimensional
elements in a set and K be the number of desired partitions, then an
exhaustive search over all the possible partitions to find a minimum partition
has the complexity of O(KM) which quickly becomes impractical for many
applications with modest values of K and M. Thus, many approximate
algorithms with polynomial time complexity have been proposed, but few provide
bounded guarantee. In this paper, an upper bound and a lower bound for a class
of impurity functions are constructed. Based on these bounds, we propose a
low-complexity partitioning algorithm with bounded guarantee based on the
maximum likelihood principle. The theoretical analyses on the bounded guarantee
of the algorithms are given for two well-known impurity functions Gini index
and entropy. When Kβ₯N, the proposed algorithm achieves state-of-the-art
results in terms of lowest approximations and polynomial time complexity
O(NM). In addition, a heuristic greedy-merge algorithm having the time
complexity of O((NβK)N2+NM) is proposed for K<N. Although the greedy-merge
algorithm does not provide a bounded guarantee, its performance is comparable
to that of the state-of-the-art methods. Our results also generalize some
well-known information-theoretic bounds such as Fano's inequality and
Boyd-Chiang's bound.Comment: 13 pages, 6 figure