1,166 research outputs found

    Discovery of stable and significant binding motif pairs from PDB complexes and protein interaction datasets

    Full text link
    Motivation: Discovery of binding sites is important in the study of protein-protein interactions. In this paper, we introduce stable and significant motif pairs to model protein-binding sites. The stability is the pattern's resistance to some transformation. The significance is the unexpected frequency of occurrence of the pattern in a sequence dataset comprising known interacting protein pairs. Discovery of stable motif pairs is an iterative process, undergoing a chain of changing but converging patterns. Determining the starting point for such a chain is an interesting problem. We use a protein complex dataset extracted from the Protein Data Bank to help in identifying those starting points, so that the computational complexity of the problem is much released. Results: We found 913 stable motif pairs, of which 765 are significant. We evaluated these motif pairs using comprehensive comparison results against random patterns. Wet-experimentally discovered motifs reported in the literature were also used to confirm the effectiveness of our method. © Oxford University Press 2004; all rights reserved

    Innovative Algorithms and Evaluation Methods for Biological Motif Finding

    Get PDF
    Biological motifs are defined as overly recurring sub-patterns in biological systems. Sequence motifs and network motifs are the examples of biological motifs. Due to the wide range of applications, many algorithms and computational tools have been developed for efficient search for biological motifs. Therefore, there are more computationally derived motifs than experimentally validated motifs, and how to validate the biological significance of the ‘candidate motifs’ becomes an important question. Some of sequence motifs are verified by their structural similarities or their functional roles in DNA or protein sequences, and stored in databases. However, biological role of network motifs is still invalidated and currently no databases exist for this purpose. In this thesis, we focus not only on the computational efficiency but also on the biological meanings of the motifs. We provide an efficient way to incorporate biological information with clustering analysis methods: For example, a sparse nonnegative matrix factorization (SNMF) method is used with Chou-Fasman parameters for the protein motif finding. Biological network motifs are searched by various clustering algorithms with Gene ontology (GO) information. Experimental results show that the algorithms perform better than existing algorithms by producing a larger number of high-quality of biological motifs. In addition, we apply biological network motifs for the discovery of essential proteins. Essential proteins are defined as a minimum set of proteins which are vital for development to a fertile adult and in a cellular life in an organism. We design a new centrality algorithm with biological network motifs, named MCGO, and score proteins in a protein-protein interaction (PPI) network to find essential proteins. MCGO is also combined with other centrality measures to predict essential proteins using machine learning techniques. We have three contributions to the study of biological motifs through this thesis; 1) Clustering analysis is efficiently used in this work and biological information is easily integrated with the analysis; 2) We focus more on the biological meanings of motifs by adding biological knowledge in the algorithms and by suggesting biologically related evaluation methods. 3) Biological network motifs are successfully applied to a practical application of prediction of essential proteins

    The Mathematics of Phylogenomics

    Get PDF
    The grand challenges in biology today are being shaped by powerful high-throughput technologies that have revealed the genomes of many organisms, global expression patterns of genes and detailed information about variation within populations. We are therefore able to ask, for the first time, fundamental questions about the evolution of genomes, the structure of genes and their regulation, and the connections between genotypes and phenotypes of individuals. The answers to these questions are all predicated on progress in a variety of computational, statistical, and mathematical fields. The rapid growth in the characterization of genomes has led to the advancement of a new discipline called Phylogenomics. This discipline results from the combination of two major fields in the life sciences: Genomics, i.e., the study of the function and structure of genes and genomes; and Molecular Phylogenetics, i.e., the study of the hierarchical evolutionary relationships among organisms and their genomes. The objective of this article is to offer mathematicians a first introduction to this emerging field, and to discuss specific mathematical problems and developments arising from phylogenomics.Comment: 41 pages, 4 figure

    Non-coding RNA annotation of the genome of Trichoplax adhaerens

    Get PDF
    A detailed annotation of non-protein coding RNAs is typically missing in initial releases of newly sequenced genomes. Here we report on a comprehensive ncRNA annotation of the genome of Trichoplax adhaerens, the presumably most basal metazoan whose genome has been published to-date. Since blast identified only a small fraction of the best-conserved ncRNAs—in particular rRNAs, tRNAs and some snRNAs—we developed a semi-global dynamic programming tool, GotohScan, to increase the sensitivity of the homology search. It successfully identified the full complement of major and minor spliceosomal snRNAs, the genes for RNase P and MRP RNAs, the SRP RNA, as well as several small nucleolar RNAs. We did not find any microRNA candidates homologous to known eumetazoan sequences. Interestingly, most ncRNAs, including the pol-III transcripts, appear as single-copy genes or with very small copy numbers in the Trichoplax genome

    Biochemical and Structural Analysis of the Nucleoporin Nup214 and its Involvement in mRNA Export

    Get PDF
    In order to gain a deeper understanding of the role of nups in leukemogenesis, and to make sense of the architecture and regulation of the mRNA export machinery at the NPC, I set out to biochemically and structurally characterize Nup214. In this thesis, I present the crystal structure of the human Nup214 N-terminal domain at 1.65 Å resolution. The structure reveals a sevenbladed !-propeller fold followed by a 30-residue C-terminal extended peptide segment (CTE). The CTE folds back onto the !-propeller and binds to its bottom face. Conserved surface patches on the Nup214 NTD reveal putative proteininteraction sites, one of which is crucial for the interaction with Ddx19. Using a comprehensive mutational and biochemical analysis, the interaction between the Nup214 NTD and Ddx19 is dissected. The structure of the Nup214 NTD•Ddx19 in its ADP-bound state at 2.5 Å resolution reveals the molecular basis for the interaction between the two proteins. A conserved residue of Ddx19 is shown to be crucial for complex formation in vitro and in vivo. Strikingly, the interaction surfaces exhibit strongly opposing surface potentials, with the helicase surface being positively and the Nup214 surface being negatively charged. Ddx19 is shown to bind RNA only in its ATP-bound state, and the binding of RNA and the Nup214 NTD is mutually exclusive. Finally, I speculate that Nup214 is the ATP-exchange factor for Ddx19, and propose the Ddx19 ATPase cycle as the terminal step in mRNA export
    corecore