PROBLEMI DI CLUSTERING CON VINCOLI: ALGORITMI E COMPLESSIT\uc0

Abstract

This thesis introduces and studies the problem of 1-dimensional bounded clustering: for any fixed p 65 1, given reals x1, x2\u2026, xn, and integers k1, k2.., km, determine the partition (A1, A2\u2026 Am) of {1, 2, ..., n} with |A1| = k1, |A2| = k2 , \u2026 , |Am| = km which minimizes \u3a3k \u3a3i\uf0ce Ak |xi - \u3bck |p where \u3bck is the p-centroid of Ak First, we prove that the optimum partition is contiguous (String Property), that is if i,j \uf0ce Ak, and xi < xs < xj, then s \uf0ce Ak . As a consequence, we determine an efficient algorithm for bi-clustering (if p is an integer); however, we show that the general problem is NP-complete, while a relaxed version of it admits a polynomial-time algorithm. When p is not an integer, we prove that the problem of deciding if the centroid \u3bc is less than a given integer is in the Counting Hierarchy CH. As an application, the relaxed clustering algorithm used as a step for solving a problem in the field of Bioinformatics: the Localization of promoter regions in genomic sequences. The results are compared with those obtained through another methodology (MADAP)

    Similar works