2 research outputs found

    A multiple-instance scoring method to predict tissue-specific cis-regulatory motifs and regions

    Get PDF
    Transcription is the central process of gene regulation. In higher eukaryotes, the transcription of a gene is usually regulated by multiple cis-regulatory regions (CRRs). In different tissues, different transcription factors bind to their cis-regulatory motifs in these CRRs to drive tissue-specific expression patterns of their target genes. By combining the genome-wide gene expression data with the genomic sequence data, we proposed multiple-instance scoring (MIS) method to predict the tissue-specific motifs and the corresponding CRRs. The method is mainly based on the assumption that only a subset of CRRs of the expressed gene should function in the studied tissue. By testing on the simulated datasets and the fly muscle dataset, MIS can identify true motifs when noise is high and shows higher specificity for predicting the tissue-specific functions of CRRs

    Motif Discovery as a Multiple-Instance Problem

    No full text
    Motif discovery from biosequences, a challenging task both experimentally and computationally, has been a topic of immense study in recent years. In this paper, we for-mulate the motif discovery problem as a multiple-instance problem and employ a multiple-instance learning method, the MILES method, to identify motif from biological se-quences. Each sequence is mapped into a feature space defined by instances in training sequences with a novel instance-bag similarity measure. We employ 1-norm SVM to select important features and construct classifiers simul-taneously. These high-ranked features correspond to dis-covered motifs. We apply this method to discover transcrip-tional factor binding sites in promoters, a typical motif find-ing problem in biology, and show that the method is at least comparable to existing methods.
    corecore