20 research outputs found

    Scaling success: Linking public breeding with private enterprise

    Get PDF
    <p>The known Downstream Promoter Element and Initiator site motifs are shown in boldface.</p

    PeakRegressor Identifies Composite Sequence Motifs Responsible for STAT1 Binding Sites and Their Potential rSNPs

    Get PDF
    How to identify true transcription factor binding sites on the basis of sequence motif information (e.g., motif pattern, location, combination, etc.) is an important question in bioinformatics. We present “PeakRegressor,” a system that identifies binding motifs by combining DNA-sequence data and ChIP-Seq data. PeakRegressor uses L1-norm log linear regression in order to predict peak values from binding motif candidates. Our approach successfully predicts the peak values of STAT1 and RNA Polymerase II with correlation coefficients as high as 0.65 and 0.66, respectively. Using PeakRegressor, we could identify composite motifs for STAT1, as well as potential regulatory SNPs (rSNPs) involved in the regulation of transcription levels of neighboring genes. In addition, we show that among five regression methods, L1-norm log linear regression achieves the best performance with respect to binding motif identification, biological interpretability and computational efficiency

    Apprentissage automatique pour l'extraction de caractéristiques (application au partitionnement de documents, au résumé automatique et au filtrage collaboratif)

    No full text
    PARIS-BIUSJ-Thèses (751052125) / SudocAVIGNON-Bibl. IUP-IUT (840072201) / SudocPARIS-BIUSJ-Mathématiques rech (751052111) / SudocSudocFranceF

    Une extension du modèle sémantique latent probabiliste pour le partitionnement non-supervisé de documents textuels

    No full text
    International audienceDans cet article, nous proposons une extension du modèle sémantique latent probabiliste (PLSA) pour la tâche de partitionnement de documents (clustering). Nous montrons que ce modèle étendu est équivalent à une combinaison linéaire de modèles de factorisation matricielle non-négative au sens de la fonction objective KL-divergence. Nous validons notre modèle sur les trois collections de documents et, montrons empiriquement que notre approche est statistiquement plus performante que le modèle PLSA de base pour la tâche de clustering

    An Extension of PLSA for Document Clustering

    No full text
    International audienceIn this paper we propose an extension of the PLSA model in which an extra latent variable allows the model to co-cluster documents and terms simultaneously. We show on three datasets that our extended model produces statistically significant improvements with respect to two clustering measures over the original PLSA and the multinomial mixture MM models
    corecore