Search CORE

2,054 research outputs found

Finding motifs from short peptides

Author: Kruup Mari-Liis
Publication venue: Tartu Ülikool
Publication date: 01/01/2013
Field of study

Käesoleva töö eesmärgiks on arendada töövoog, mis leiaks etteantud lühikestest peptiididest sarnaste peptiidide grupid ning esitaks need grupid motiividena. Sellist töövoogu oleks hiljem võimalik kasutada motiivide avastamiseks erinevate indiviidide peptiididest, et leida sarnasusi sama diagnoosiga haigete vahel. Peptiididest motiivide leidmise töövoo koostamiseks kombineeritakse erinevaid üldtuntud meetodeid, bioinformaatika tööriistu ning lisaskripte. Koostatud töövoog põhineb hierarhilisel klasterdamisel, mille abil jagatakse etteantud peptiidid sarnasuse alusel gruppidesse. Leitud gruppe modifitseeritakse, et koostada just sellised grupid, millest igaüks sisaldaks ühte unikaalset motiivi. Lõplikest gruppidest leitakse motiivid, mis visualiseeritakse logodena ning esitatakse ka regulaaravaldise kujul. Leitud motiividele lisatakse skoorid, mis annaksid infot selle kohta, kui hästi iga motiiv just oma peptiidigruppi kirjeldab. Valminud töövoog koostati ning rakendati ühe testindiviidi peal. Töövoo rakendamine oli edukas ning etteantud 277 166 peptiidist suudeti 71.19% jagada 46 motiivigruppi, millest 43 said ka väga head skoorid. Selle töövoo abil on võimalik edaspidi analüüsida erinevaid indiviide, et leida sama diagnoosiga haigetel ühiseid motiive.The goal of this thesis is to develop a workflow that could find groups of similar peptides from a set of short peptides and represent these groups as motifs. This workflow could be later used to discover motifs from peptides of different individuals to find similarities between individuals with the same disease. Different commonly known methods, bioinformatics tools and additional scripts are combined to assemble the workflow of finding motifs from the peptides. The developed workflow is based on hierarchical clustering, which divides the input peptides into groups based on their similarities. The found groups are modified to get groups that each would contain only one unique motif. Motifs of the final groups are then extracted and represented as sequence logos and regular expressions. The found motifs are scored to give information about how well every motif describes specifically that peptide group. The developed workflow was assembled and tested on one individual. The testing was successful and 71.19% of the inserted 277 166 peptides were divided into 46 motif groups, of which 43 had very good scores. In the future, this workflow can be used to analyze different individuals in order to find similar motifs between individuals with the same disease

DSpace at Tartu University Library

ViennaRNA Package 2.0

Author: A Busch
A Sczyrba
A Waugh
AJ Enright
AR Gruber
AR Gruber
AR Gruber
B Kaczkowski
B Knudsen
B Matthews
C Aksay
C Flamm
C Flamm
C Flamm
C Höner zu Siederdissen
CB Do
Christian Höner zu Siederdissen
Christoph Flamm
D Sankoff
D Thirumalai
D Upper
DA Benson
DH Mathews
DH Mathews
DH Mathews
DH Turner
EP Nawrocki
H Kiryu
H Tafer
H Tafer
H Tafer
H Tafer
Hakim Tafer
I Tinoco Jr
I Tinoco Jr
IL Hofacker
IL Hofacker
IL Hofacker
IL Hofacker
IL Hofacker
IL Hofacker
IL Hofacker
IL Hofacker
IL Hofacker
Ivo L Hofacker
J Hertel
J Hertel
J Reeder
J SantaLucia
JA Jaeger
JH Havgaard
JN Zadeh
JN Zadeh
JS McCaskill
JS Reuter
K Darty
K Reiche
L Dagum
L He
M Andronescu
M Andronescu
M Andronescu
M Fekete
M Hamada
M Höchsmann
M Kalaš
M Larkin
M Parisien
M Rehmsmeier
M Tacker
M Zuker
M Zuker
M Zuker
MS Andronescu
MS Waterman
NR Markham
P Gardner
P Gardner
P Schuster
Peter F Stadler
R Dowell
R Klein
R Lorenz
R Nussinov
R Nussinov
R Thadani
RA Dimitrov
RE Bruccoleri
Ronny Lorenz
RR Stocsits
S Bernhart
S Bonhoeffer
S Heyne
S Washietl
S Will
S Wuchty
S Zakov
SH Bernhart
SH Bernhart
SM Freier
SR Eddy
Stephan H Bernhart
T Xia
U Mückstein
V Rusinov
W Beyer
W Fontana
W Fontana
W Fontana
W Fontana
W Pearson
Y Ding
Y Ding
Publication venue: BioMed Central
Publication date: 01/01/2011
Field of study

Abstract Background Secondary structure forms an important intermediate level of description of nucleic acids that encapsulates the dominating part of the folding energy, is often well conserved in evolution, and is routinely used as a basis to explain experimental findings. Based on carefully measured thermodynamic parameters, exact dynamic programming algorithms can be used to compute ground states, base pairing probabilities, as well as thermodynamic properties. Results The <monospace>ViennaRNA</monospace> Package has been a widely used compilation of RNA secondary structure related computer programs for nearly two decades. Major changes in the structure of the standard energy model, the <it>Turner 2004 </it>parameters, the pervasive use of multi-core CPUs, and an increasing number of algorithmic variants prompted a major technical overhaul of both the underlying <monospace>RNAlib</monospace> and the interactive user programs. New features include an expanded repertoire of tools to assess RNA-RNA interactions and restricted ensembles of structures, additional output information such as <it>centroid </it>structures and <it>maximum expected accuracy </it>structures derived from base pairing probabilities, or <it>z</it>-<it>scores </it>for locally stable secondary structures, and support for input in <monospace>fasta</monospace> format. Updates were implemented without compromising the computational efficiency of the core algorithms and ensuring compatibility with earlier versions. Conclusions The <monospace>ViennaRNA Package 2.0</monospace>, supporting concurrent computations <monospace>via OpenMP</monospace>, can be downloaded from <url>http://www.tbi.univie.ac.at/RNA</url>.</p

Crossref

Springer - Publisher Connector

Directory of Open Access Journals

Fraunhofer-ePrints

PubMed Central

Permanent Hosting, Archiving and Indexing of Digital Resources and Assets

The Proteomic Code: a molecular recognition code for proteins

Author: A Bhakoo
AA Komar
B Benyo
C Levinthal
CB Anfinsen
CR Woese
CR Woese
D Naor
DA Weigent
DR Forsdyke
E Azarya-Sprinzak
E Neher
F Glaser
FHC Crick
G D'Onofrio
G Gamow
G Gamow
G Gamow
G Gamow
G Gamow
G Gamow
H Fan
H Okada
HM Berman
HM Berman
IA Adzhubei
IZ Siemion
IZ Siemion
IZ Siemion
J Biro
J Biro
J Biro
J Biro
Jan C Biro
JC Biro
JC Biro
JC Biro
JC Biro
JC Biro
JC Biro
JC Biro
JC Biro
JC Biro
JC Biro
JC Biro
JD Watson
JE Blalock
JE Blalock
JE Blalock
JE Blalock
JE Blalock
JE McGuigan
JE Zull
JE Zull
JG Omichinski
JR Heal
JR Heal
JR Heal
JR Heal
JT Wong
K Ikehara
K Ikehara
K Nord
KC Gokhale
KI Rother
KL Bost
KL Bost
L Baranyi
L Baranyi
L Baranyi
L Katz
L Pauling
L Pauling
L Pauling
LB Mekler
LB Mekler
M Eilers
M Oresic
M Zuker
ML Chiusano
MO Dayhoff
MS Singer
O Ermolaeva
RS Root-Bernstein
RS Root-Bernstein
S Brunak
S Walter
SD Seiwert
SK Gupta
T Junier
T Pawson
T Xie
TA Thanaraj
TS Kumarevel
U Segerstéen
W Gu
W Gu
W Seffens
WL Duax
Y Isogai
Y Shao
Publication venue: BioMed Central
Publication date: 01/01/2007
Field of study

Abstract Background The Proteomic Code is a set of rules by which information in genetic material is transferred into the physico-chemical properties of amino acids. It determines how individual amino acids interact with each other during folding and in specific protein-protein interactions. The Proteomic Code is part of the redundant Genetic Code. Review The 25-year-old history of this concept is reviewed from the first independent suggestions by Biro and Mekler, through the works of Blalock, Root-Bernstein, Siemion, Miller and others, followed by the discovery of a Common Periodic Table of Codons and Nucleic Acids in 2003 and culminating in the recent conceptualization of partial complementary coding of interacting amino acids as well as the theory of the nucleic acid-assisted protein folding. Methods and conclusions A novel cloning method for the design and production of specific, high-affinity-reacting proteins (SHARP) is presented. This method is based on the concept of proteomic codes and is suitable for large-scale, industrial production of specifically interacting peptides.</p

Crossref

Springer - Publisher Connector

Directory of Open Access Journals

PubMed Central

Automated Genome-Wide Protein Domain Exploration

Author: Rekepalli Bhanu Prasad
Publication venue: TRACE: Tennessee Research and Creative Exchange
Publication date: 01/12/2007
Field of study

Exploiting the exponentially growing genomics and proteomics data requires high quality, automated analysis. Protein domain modeling is a key area of molecular biology as it unravels the mysteries of evolution, protein structures, and protein functions. A plethora of sequences exist in protein databases with incomplete domain knowledge. Hence this research explores automated bioinformatics tools for faster protein domain analysis. Automated tool chains described in this dissertation generate new protein domain models thus enabling more effective genome-wide protein domain analysis. To validate the new tool chains, the Shewanella oneidensis and Escherichia coli genomes were processed, resulting in a new peptide domain database, detection of poor domain models, and identification of likely new domains. The automated tool chains will require months or years to model a small genome when executing on a single workstation. Therefore the dissertation investigates approaches with grid computing and parallel processing to significantly accelerate these bioinformatics tool chains

University of Tennessee, Knoxville: Trace

New Methods to Improve Protein Structure Modeling

Author: Abdelrasoul Maha
Publication venue: ODU Digital Commons
Publication date: 01/07/2018
Field of study

Proteins are considered the central compound necessary for life, as they play a crucial role in governing several life processes by performing the most essential biological and chemical functions in every living cell. Understanding protein structures and functions will lead to a significant advance in life science and biology. Such knowledge is vital for various fields such as drug development and synthetic biofuels production. Most proteins have definite shapes that they fold into, which are the most stable state they can adopt. Due to the fact that the protein structure information provides important insight into its functions, many research efforts have been conducted to determine the protein 3-dimensional structure from its sequence. The experimental methods for protein 3-dimensional structure determination are often time-consuming, costly, and even not feasible for some proteins. Accordingly, recent research efforts focus more and more on computational approaches to predict protein 3-dimensional structures. Template-based modeling is considered one of the most accurate protein structure prediction methods. The success of template-based modeling relies on correctly identifying one or a few experimentally determined protein structures as structural templates that are likely to resemble the structure of the target sequence as well as accurately producing a sequence alignment that maps the residues in the target sequence to those in the template. In this work, we aim at improving the template-based protein structure modeling by enhancing the correctness of identifying the most appropriate templates and precisely aligning the target and template sequences. Firstly, we investigate employing inter-residue contact score to measure the favorability of a target sequence fitting in the folding topology of a certain template. Secondly, we design a multi-objective alignment algorithm extending the famous Needleman-Wunsch algorithm to obtain a complete set of alignments yielding Pareto optimality. Then, we use protein sequence and structural information as objectives and generate the complete Pareto optimal front of alignments between target sequence and template. The alignments obtained enable one to analyze the trade-offs between the potentially conflicting objectives. These approaches lead to accuracy enhancement in template-based protein structure modeling

Old Dominion University

Procrustes Analysis of Truncated Least Squares Multidimensional Scaling

Author: Boryczko Krzysztof
Dzwinel Witold
Kurdziel Marcin
Publication venue: Institute of Informatics, Slovak Academy of Sciences
Publication date: 30/01/2013
Field of study

Multidimensional Scaling (MDS) is an important class of techniques for embedding sets of patterns in Euclidean space. Most often it is used to visualize in mathbbR3 multidimensional data sets or data sets given by dissimilarity measures that are not distance metrics. Unfortunately, embedding n patterns with MDS involves processing O(n2) pairwise pattern dissimilarities, making MDS computationally demanding for large data sets. Especially in Least Squares MDS (LS-MDS) methods, that proceed by finding a minimum of a multimodal stress function, computational cost is a limiting factor. Several works therefore explored approximate MDS techniques that are less computationally expensive. These approximate methods were evaluated in terms of correlation between Euclidean distances in the embedding and the pattern dissimilarities or value of the stress function. We employ Procrustes Analysis to directly quantify differences between embeddings constructed with an approximate LS-MDS method and embeddings constructed with exact LS-MDS. We then compare our findings to the results of classical analysis, i.e. that based on stress value and correlation between Euclidean distances and pattern dissimilarities. Our results demonstrate that small changes in stress value or correlation coefficient can translate to large differences between embeddings. The differences can be attributed not only to the inevitable variability resulting from the multimodality of the stress function but also to the approximation errors. These results show that approximation may have larger impact on MDS than what was thus far revealed by analyses of stress value and correlation between Euclidean distances and pattern dissimilarities

Computing and Informatics (E-Journal - Institute of Informatics, SAS, Bratislava)

Statistical Physics and Representations in Real and Artificial Neural Networks

Author: Cocco Simona
Monasson Rémi
Posani Lorenzo
Rosay Sophie
Tubiana Jérôme
Publication venue: 'Elsevier BV'
Publication date: 07/09/2017
Field of study

This document presents the material of two lectures on statistical physics and neural representations, delivered by one of us (R.M.) at the Fundamental Problems in Statistical Physics XIV summer school in July 2017. In a first part, we consider the neural representations of space (maps) in the hippocampus. We introduce an extension of the Hopfield model, able to store multiple spatial maps as continuous, finite-dimensional attractors. The phase diagram and dynamical properties of the model are analyzed. We then show how spatial representations can be dynamically decoded using an effective Ising model capturing the correlation structure in the neural data, and compare applications to data obtained from hippocampal multi-electrode recordings and by (sub)sampling our attractor model. In a second part, we focus on the problem of learning data representations in machine learning, in particular with artificial neural networks. We start by introducing data representations through some illustrations. We then analyze two important algorithms, Principal Component Analysis and Restricted Boltzmann Machines, with tools from statistical physics

arXiv.org e-Print Archive

Hal-Diderot

Support vector machines with profile-based kernels for remote protein homology detection

Author: Abela John
Busuttil Steven
Pace Gordon J.
The 15th International Conference on Genome Informatics (GIW'04)
Publication venue: GIW
Publication date: 01/01/2004
Field of study

Two new techniques for remote protein homology detection particulary suited for sparse data are introduced. These methods are based on position specific scoring matrices or profiles and use a support vector machine (SVM) for discrimination. The performance on standard benchmarks outperforms previous non-discriminative techniques and is comparable to that of other SVM-based methods while giving distinct advantages.peer-reviewe

OAR@UM

Bioinformatics: Basics, Development, and Future

Author: Abdurakhmonov Ibrokhim Y.
Publication venue: 'IntechOpen'
Publication date: 27/07/2016
Field of study

Bioinformatics is an interdisciplinary scientific field of life sciences. Bioinformatics research and application include the analysis of molecular sequence and genomics data; genome annotation, gene/protein prediction, and expression profiling; molecular folding, modeling, and design; building biological networks; development of databases and data management systems; development of software and analysis tools; bioinformatics services and workflow; mining of biomedical literature and text; and bioinformatics education and training. Astronomical accumulation of genomics, proteomics, and metabolomics data as well as a need for their storage, analysis, annotation, organization, systematization, and integration into biological networks and database systems were the main driving forces for the emergence and development of bioinformatics. Current critical needs for bioinformatics among others highlighted in this chapter, however, are to understand basics and specifics of bioinformatics as well as to prepare new generation scientists and specialists with integrated, interdisciplinary, and multilingual knowledge who can use modern bioinformatics resources powered with sophisticated operating systems, software, and database/networking technologies. In this introductory chapter, I aim to give an overall picture on basics and developments of the bioinformatics field for readers with some future perspectives, highlighting chapters published in this book

IntechOpen