20 research outputs found
Scaling success: Linking public breeding with private enterprise
<p>The known Downstream Promoter Element and Initiator site motifs are shown in boldface.</p
PeakRegressor Identifies Composite Sequence Motifs Responsible for STAT1 Binding Sites and Their Potential rSNPs
How to identify true transcription factor binding sites on the basis of sequence motif information (e.g., motif pattern, location, combination, etc.) is an important question in bioinformatics. We present “PeakRegressor,” a system that identifies binding motifs by combining DNA-sequence data and ChIP-Seq data. PeakRegressor uses L1-norm log linear regression in order to predict peak values from binding motif candidates. Our approach successfully predicts the peak values of STAT1 and RNA Polymerase II with correlation coefficients as high as 0.65 and 0.66, respectively. Using PeakRegressor, we could identify composite motifs for STAT1, as well as potential regulatory SNPs (rSNPs) involved in the regulation of transcription levels of neighboring genes. In addition, we show that among five regression methods, L1-norm log linear regression achieves the best performance with respect to binding motif identification, biological interpretability and computational efficiency
Apprentissage automatique pour l'extraction de caractéristiques (application au partitionnement de documents, au résumé automatique et au filtrage collaboratif)
PARIS-BIUSJ-Thèses (751052125) / SudocAVIGNON-Bibl. IUP-IUT (840072201) / SudocPARIS-BIUSJ-Mathématiques rech (751052111) / SudocSudocFranceF
Une extension du modèle sémantique latent probabiliste pour le partitionnement non-supervisé de documents textuels
International audienceDans cet article, nous proposons une extension du modèle sémantique latent probabiliste (PLSA) pour la tâche de partitionnement de documents (clustering). Nous montrons que ce modèle étendu est équivalent à une combinaison linéaire de modèles de factorisation matricielle non-négative au sens de la fonction objective KL-divergence. Nous validons notre modèle sur les trois collections de documents et, montrons empiriquement que notre approche est statistiquement plus performante que le modèle PLSA de base pour la tâche de clustering
An Extension of PLSA for Document Clustering
International audienceIn this paper we propose an extension of the PLSA model in which an extra latent variable allows the model to co-cluster documents and terms simultaneously. We show on three datasets that our extended model produces statistically significant improvements with respect to two clustering measures over the original PLSA and the multinomial mixture MM models