2 research outputs found

    MADAP, a flexible clustering tool for the interpretation of one-dimensional genome annotation data

    Get PDF
    A recurring task in the analysis of mass genome annotation data from high-throughput technologies is the identification of peaks or clusters in a noisy signal profile. Examples of such applications are the definition of promoters on the basis of transcription start site profiles, the mapping of transcription factor binding sites based on ChIP-chip data and the identification of quantitative trait loci (QTL) from whole genome SNP profiles. Input to such an analysis is a set of genome coordinates associated with counts or intensities. The output consists of a discrete number of peaks with respective volumes, extensions and center positions. We have developed for this purpose a flexible one-dimensional clustering tool, called MADAP, which we make available as a web server and as standalone program. A set of parameters enables the user to customize the procedure to a specific problem. The web server, which returns results in textual and graphical form, is useful for small to medium-scale applications, as well as for evaluation and parameter tuning in view of large-scale applications, requiring a local installation. The program written in C++ can be freely downloaded from ftp://ftp.epd.unil.ch/pub/software/unix/madap. The MADAP web server can be accessed at http://www.isrec.isb-sib.ch/madap/

    PROBLEMI DI CLUSTERING CON VINCOLI: ALGORITMI E COMPLESSIT\uc0

    Get PDF
    This thesis introduces and studies the problem of 1-dimensional bounded clustering: for any fixed p 65 1, given reals x1, x2\u2026, xn, and integers k1, k2.., km, determine the partition (A1, A2\u2026 Am) of {1, 2, ..., n} with |A1| = k1, |A2| = k2 , \u2026 , |Am| = km which minimizes \u3a3k \u3a3i\uf0ce Ak |xi - \u3bck |p where \u3bck is the p-centroid of Ak First, we prove that the optimum partition is contiguous (String Property), that is if i,j \uf0ce Ak, and xi < xs < xj, then s \uf0ce Ak . As a consequence, we determine an efficient algorithm for bi-clustering (if p is an integer); however, we show that the general problem is NP-complete, while a relaxed version of it admits a polynomial-time algorithm. When p is not an integer, we prove that the problem of deciding if the centroid \u3bc is less than a given integer is in the Counting Hierarchy CH. As an application, the relaxed clustering algorithm used as a step for solving a problem in the field of Bioinformatics: the Localization of promoter regions in genomic sequences. The results are compared with those obtained through another methodology (MADAP)
    corecore