MUSI: an integrated system for identifying multiple specificity from very large peptide or nucleic acid data sets

Badis; Bailey; Ben-Gal; Berger; Brown; Carlson; Chen; Cock; David Gfeller; Dennler; Doyle; Elkan; Ernst; Farnham; Feng; Fowler; Frederick; Garvie; Gary D. Bader; Gfeller; Haiming Huang; Harris; Holland; Hua; Hutti; Katoh; Lam; Marc S. Tyndel; Mayer; Miller; Mitchell; Mukherjee; Newburger; Noguchi; Obenauer; Olsen; Pavesi; Pawson; Philip M. Kim; Portales-Casamar; Sachdev S. Sidhu; Salzberg; Schneider; Schwarz; Sinha; Stiffler; TaeHyung Kim; Thijs; Tonikian; Tonikian; Valouev; Wei; Wiedemann; Zhang; Zhao

MUSI: an integrated system for identifying multiple specificity from very large peptide or nucleic acid data sets

Authors: Badis
Bailey
Ben-Gal
Berger
Brown
Carlson
Chen
Cock
David Gfeller
Dennler
Doyle
Elkan
Ernst
Farnham
Feng
Fowler
Frederick
Garvie
Gary D. Bader
Gfeller
Haiming Huang
Harris
Holland
Hua
Hutti
Katoh
Lam
Marc S. Tyndel
Mayer
Miller
Mitchell
Mukherjee
Newburger
Noguchi
Obenauer
Olsen
Pavesi
Pawson
Philip M. Kim
Portales-Casamar
Sachdev S. Sidhu
Salzberg
Schneider
Schwarz
Sinha
Stiffler
TaeHyung Kim
Thijs
Tonikian
Tonikian
Valouev
Wei
Wiedemann
Zhang
Zhao
Publication date
Publisher: Oxford University Press
Doi

Abstract

Peptide recognition domains and transcription factors play crucial roles in cellular signaling. They bind linear stretches of amino acids or nucleotides, respectively, with high specificity. Experimental techniques that assess the binding specificity of these domains, such as microarrays or phage display, can retrieve thousands of distinct ligands, providing detailed insight into binding specificity. In particular, the advent of next-generation sequencing has recently increased the throughput of such methods by several orders of magnitude. These advances have helped reveal the presence of distinct binding specificity classes that co-exist within a set of ligands interacting with the same target. Here, we introduce a software system called MUSI that can rapidly analyze very large data sets of binding sequences to determine the relevant binding specificity patterns. Our pipeline provides two major advances. First, it can detect previously unrecognized multiple specificity patterns in any data set. Second, it offers integrated processing of very large data sets from next-generation sequencing machines. The results are visualized as multiple sequence logos describing the different binding preferences of the protein under investigation. We demonstrate the performance of MUSI by analyzing recent phage display data for human SH3 domains as well as microarray data for mouse transcription factors

Similar works

Full text

Open in the Core reader

Download PDF

Available Versions

Crossref

info:doi/10.1093%2Fnar%2Fgkr12...

Last time updated on 03/01/2020