236 research outputs found

    Identification of Trace Element-Containing Proteins in Genomic Databases

    Get PDF
    Development of bioinformatics tools provided researchers with the ability to identify full sets of trace element–containing proteins in organisms for which complete genomic sequences are available. Recently, independent bioinformatics methods were used to identify all, or almost all, genes encoding selenocysteine-containing proteins in human, mouse, and Drosophila genomes, characterizing entire selenoproteomes in these organisms. It also should be possible to search for entire sets of other trace element–associated proteins, such as metal-containing proteins, although methods for their identification are still in development

    STING Report: convenient web-based application for graphic and tabular presentations of protein sequence, structure and function descriptors from the STING database

    Get PDF
    The Sting Report is a versatile web-based application for extraction and presentation of detailed information about any individual amino acid of a protein structure stored in the STING Database. The extracted information is presented as a series of GIF images and tables, containing the values of up to 125 sequence/structure/function descriptors/parameters. The GIF images are generated by the Gold STING modules. The HTML page resulting from the STING Report query can be printed and, most importantly, it can be composed and visualized on a computer platform with an elementary configuration. Using the STING Report, a user can generate a collection of customized reports for amino acids of specific interest. Such a collection comes as an ideal match for a demand for the rapid and detailed consultation and documentation of data about structure/function. The inclusion of information generated with STING Report in a research report or even a textbook, allows for the increased density of its contents. STING Report is freely accessible within the Gold STING Suite at http://www.cbi.cnptia.embrapa.br, http://www.es.embnet.org/SMS/, http://gibk26.bse.kyutech.ac.jp/SMS/ and http://trantor.bioc.columbia.edu/SMS (option: STING Report)

    String Matching with Variable Length Gaps

    Get PDF
    We consider string matching with variable length gaps. Given a string TT and a pattern PP consisting of strings separated by variable length gaps (arbitrary strings of length in a specified range), the problem is to find all ending positions of substrings in TT that match PP. This problem is a basic primitive in computational biology applications. Let mm and nn be the lengths of PP and TT, respectively, and let kk be the number of strings in PP. We present a new algorithm achieving time O(nlog⁥k+m+α)O(n\log k + m +\alpha) and space O(m+A)O(m + A), where AA is the sum of the lower bounds of the lengths of the gaps in PP and α\alpha is the total number of occurrences of the strings in PP within TT. Compared to the previous results this bound essentially achieves the best known time and space complexities simultaneously. Consequently, our algorithm obtains the best known bounds for almost all combinations of mm, nn, kk, AA, and α\alpha. Our algorithm is surprisingly simple and straightforward to implement. We also present algorithms for finding and encoding the positions of all strings in PP for every match of the pattern.Comment: draft of full version, extended abstract at SPIRE 201

    String Indexing for Patterns with Wildcards

    Get PDF
    We consider the problem of indexing a string tt of length nn to report the occurrences of a query pattern pp containing mm characters and jj wildcards. Let occocc be the number of occurrences of pp in tt, and σ\sigma the size of the alphabet. We obtain the following results. - A linear space index with query time O(m+σjlog⁥log⁥n+occ)O(m+\sigma^j \log \log n + occ). This significantly improves the previously best known linear space index by Lam et al. [ISAAC 2007], which requires query time Θ(jn)\Theta(jn) in the worst case. - An index with query time O(m+j+occ)O(m+j+occ) using space O(σk2nlog⁥klog⁥n)O(\sigma^{k^2} n \log^k \log n), where kk is the maximum number of wildcards allowed in the pattern. This is the first non-trivial bound with this query time. - A time-space trade-off, generalizing the index by Cole et al. [STOC 2004]. We also show that these indexes can be generalized to allow variable length gaps in the pattern. Our results are obtained using a novel combination of well-known and new techniques, which could be of independent interest

    Statistical methods for biological sequence analysis for DNA binding motifs and protein contacts

    Get PDF
    Over the last decades a revolution in novel measurement techniques has permeated the biological sciences filling the databases with unprecedented amounts of data ranging from genomics, transcriptomics, proteomics and metabolomics to structural and ecological data. In order to extract insights from the vast quantity of data, computational and statistical methods are nowadays crucial tools in the toolbox of every biological researcher. In this thesis I summarize my contributions in two data-rich fields in biological sciences: transcription factor binding to DNA and protein structure prediction from protein sequences with shared evolutionary ancestry. In the first part of my thesis I introduce our work towards a web server for analysing transcription factor binding data with Bayesian Markov Models. In contrast to classical PWM or di-nucleotide models, Bayesian Markov models can capture complex inter-nucleotide dependencies that can arise from shape-readout and alternative binding modes. In addition to giving access to our methods in an easy-to-use, intuitive web-interface, we provide our users with novel tools and visualizations to better evaluate the biological relevance of the inferred binding motifs. We hope that our tools will prove useful for investigating weak and complex transcription factor binding motifs which cannot be predicted accurately with existing tools. The second part discusses a statistical attempt to correct out the phylogenetic bias arising in co-evolution methods applied to the contact prediction problem. Co-evolution methods have revolutionized the protein-structure prediction field more than 10 years ago, and, until very recently, have retained their importance as crucial input features to deep neural networks. As the co-evolution information is extracted from evolutionarily related sequences, we investigated whether the phylogenetic bias to the signal can be corrected out in a principled way using a variation of the Felsenstein's tree-pruning algorithm applied in combination with an independent-pair assumption to derive pairwise amino counts that are corrected for the evolutionary history. Unfortunately, the contact prediction derived from our corrected pairwise amino acid counts did not yield a competitive performance.2021-09-2

    bloated tubules (blot) Encodes a Drosophila Member of the Neurotransmitter Transporter Family Required for Organisation of the Apical Cytocortex

    Get PDF
    AbstractWe have identified a novel member of the vertebrate sodium- and chloride-dependent neurotransmitter symporter family from Drosophila melanogaster. This gene, named bloated tubules (blot), shows significant sequence similarity to a subgroup of vertebrate orphan transporters. blot transcripts are maternally supplied and during embryogenesis exhibit a complex and dynamic pattern in a subset of ectodermally derived epithelia, notably in the Malpighian tubules, and in the nervous system. Animals mutant for this gene are larval lethals, in which the Malpighian tubule cells are distended with an enlarged and disorganised apical surface. Embryos lacking the maternal component of blot expression die during early stages of development. They show an inability to form actin filaments in the apical cortex, resulting in impaired syncytial nuclear divisions, severe defects in the organisation of the cortical cytoskeleton, and a failure to cellularise. For the first time, a neurotransmitter transporter-like protein has been implicated in a function outside the nervous system. The isolation of blot thus provides the basis for an analysis of the relationship between the function of this putative transporter and epithelial morphogenesis

    Acoustic sequences in non-human animals: a tutorial review and prospectus.

    Get PDF
    Animal acoustic communication often takes the form of complex sequences, made up of multiple distinct acoustic units. Apart from the well-known example of birdsong, other animals such as insects, amphibians, and mammals (including bats, rodents, primates, and cetaceans) also generate complex acoustic sequences. Occasionally, such as with birdsong, the adaptive role of these sequences seems clear (e.g. mate attraction and territorial defence). More often however, researchers have only begun to characterise - let alone understand - the significance and meaning of acoustic sequences. Hypotheses abound, but there is little agreement as to how sequences should be defined and analysed. Our review aims to outline suitable methods for testing these hypotheses, and to describe the major limitations to our current and near-future knowledge on questions of acoustic sequences. This review and prospectus is the result of a collaborative effort between 43 scientists from the fields of animal behaviour, ecology and evolution, signal processing, machine learning, quantitative linguistics, and information theory, who gathered for a 2013 workshop entitled, 'Analysing vocal sequences in animals'. Our goal is to present not just a review of the state of the art, but to propose a methodological framework that summarises what we suggest are the best practices for research in this field, across taxa and across disciplines. We also provide a tutorial-style introduction to some of the most promising algorithmic approaches for analysing sequences. We divide our review into three sections: identifying the distinct units of an acoustic sequence, describing the different ways that information can be contained within a sequence, and analysing the structure of that sequence. Each of these sections is further subdivided to address the key questions and approaches in that area. We propose a uniform, systematic, and comprehensive approach to studying sequences, with the goal of clarifying research terms used in different fields, and facilitating collaboration and comparative studies. Allowing greater interdisciplinary collaboration will facilitate the investigation of many important questions in the evolution of communication and sociality.This review was developed at an investigative workshop, “Analyzing Animal Vocal Communication Sequences” that took place on October 21–23 2013 in Knoxville, Tennessee, sponsored by the National Institute for Mathematical and Biological Synthesis (NIMBioS). NIMBioS is an Institute sponsored by the National Science Foundation, the U.S. Department of Homeland Security, and the U.S. Department of Agriculture through NSF Awards #EF-0832858 and #DBI-1300426, with additional support from The University of Tennessee, Knoxville. In addition to the authors, Vincent Janik participated in the workshop. D.T.B.’s research is currently supported by NSF DEB-1119660. M.A.B.’s research is currently supported by NSF IOS-0842759 and NIH R01DC009582. M.A.R.’s research is supported by ONR N0001411IP20086 and NOPP (ONR/BOEM) N00014-11-1-0697. S.L.DeR.’s research is supported by the U.S. Office of Naval Research. R.F.-i-C.’s research was supported by the grant BASMATI (TIN2011-27479-C04-03) from the Spanish Ministry of Science and Innovation. E.C.G.’s research is currently supported by a National Research Council postdoctoral fellowship. E.E.V.’s research is supported by CONACYT, Mexico, award number I010/214/2012.This is the accepted manuscript. The final version is available at http://dx.doi.org/10.1111/brv.1216

    Innovative Algorithms and Evaluation Methods for Biological Motif Finding

    Get PDF
    Biological motifs are defined as overly recurring sub-patterns in biological systems. Sequence motifs and network motifs are the examples of biological motifs. Due to the wide range of applications, many algorithms and computational tools have been developed for efficient search for biological motifs. Therefore, there are more computationally derived motifs than experimentally validated motifs, and how to validate the biological significance of the ‘candidate motifs’ becomes an important question. Some of sequence motifs are verified by their structural similarities or their functional roles in DNA or protein sequences, and stored in databases. However, biological role of network motifs is still invalidated and currently no databases exist for this purpose. In this thesis, we focus not only on the computational efficiency but also on the biological meanings of the motifs. We provide an efficient way to incorporate biological information with clustering analysis methods: For example, a sparse nonnegative matrix factorization (SNMF) method is used with Chou-Fasman parameters for the protein motif finding. Biological network motifs are searched by various clustering algorithms with Gene ontology (GO) information. Experimental results show that the algorithms perform better than existing algorithms by producing a larger number of high-quality of biological motifs. In addition, we apply biological network motifs for the discovery of essential proteins. Essential proteins are defined as a minimum set of proteins which are vital for development to a fertile adult and in a cellular life in an organism. We design a new centrality algorithm with biological network motifs, named MCGO, and score proteins in a protein-protein interaction (PPI) network to find essential proteins. MCGO is also combined with other centrality measures to predict essential proteins using machine learning techniques. We have three contributions to the study of biological motifs through this thesis; 1) Clustering analysis is efficiently used in this work and biological information is easily integrated with the analysis; 2) We focus more on the biological meanings of motifs by adding biological knowledge in the algorithms and by suggesting biologically related evaluation methods. 3) Biological network motifs are successfully applied to a practical application of prediction of essential proteins

    Bioinformatics

    Get PDF
    This book is divided into different research areas relevant in Bioinformatics such as biological networks, next generation sequencing, high performance computing, molecular modeling, structural bioinformatics, molecular modeling and intelligent data analysis. Each book section introduces the basic concepts and then explains its application to problems of great relevance, so both novice and expert readers can benefit from the information and research works presented here
    • 

    corecore