Search CORE

1,622 research outputs found

Tandem Repeats in Proteins: Prediction Algorithms and Biological Role

Author: Pellegrini Marco
Publication venue: Frontiers
Publication date: 01/01/2015
Field of study

Tandem repetitions in protein sequence and structure is a fascinating subject of research which has been a focus of study since the late 1990s. In this survey, we give an overview on the multi-faceted aspects of research on protein tandem repeats (PTR for short), including prediction algorithms, databases, early classification efforts, mechanisms of PTR formation and evolution, and synthetic PTR design. We also touch on the rather open issue of the relationship between PTR and flexibility (or disorder) in proteins. Detection of PTR either from protein sequence or structure data is challenging due to inherent high (biological) signal-to-noise ratio that is a key feature of this problem. As early in silico analytic tools have been key enablers for starting this field of study, we expect that current and future algorithmic and statistical breakthroughs will have a high impact on the investigations of the biological role of PTR

PUblication MAnagement

Editorial: Repetitive Structures in Biological Sequences: Algorithms and Applications

Author: Iliopoulos Costas S.
Magi Alberto
Pellegrini Marco
Publication venue: 'Frontiers Media SA'
Publication date: 01/01/2016
Field of study

Repetitive structures in biological sequences are emerging as an active focus of research and the unifying concept of ?repeatome? (the ensemble of knowledge associated with repeating structures in genomic/proteomic data) has been recently proposed in order to highlight several converging trends

Florence Research

Directory of Open Access Journals

Frontiers - Publisher Connector

PubMed Central

King's Research Portal

PUblication MAnagement

Directory of Open Access Books (DOAB)

TRStalker: an Efficient Heuristic for Finding NP-Complete Tandem Repeats

Author: Pellegrini Marco
Renda Maria Elena
Vecchio Alessio
Publication venue
Publication date
Field of study

Genomic sequences in higher eucaryotic organisms contain a substantial amount of (almost) repeated sequences. Tandem Repeats (TRs) constitute a large class of repetitive sequences that are originated via phenomena such as replication slippage, are characterized by close spatial contiguity, and play an important role in several molecular regulatory mechanisms. Certain types of tandem repeats are highly polymorphic and constitute a fingerprint feature of individuals. Abnormal TRs are known to be linked to several diseases. Researchers in bio-informatics in the last 20 years have proposed many formal definitions for the rather loose notion of a Tandem Repeat and have proposed exact or heuristic algorithms to detect TRs in genomic sequences. The general trend has been to use formal (implicit or explicit) definitions of TR for which verification of the solution was easy (with complexity linear, or polynomial in the TR\u27s length and substitution+indel rates) while the effort was directed towards identifying efficiently the sub-strings of the input to submit to the verification phase (either implicitly or explicitly). In this paper we take a step forward: we use a definition of TR for which also the verification step is difficult (in effect, NP-complete) and we develop new filtering techniques for coping with high error levels. The resulting heuristic algorithm, christened TRStalker, is approximate since it cannot guarantee that all NP-Complete Tandem Repeats satisfying the target definition in the input string will be found. However, in synthetic experiments with 30% of errors allowed, TRStalker has demonstrated a very high recall (ranging from 100% to 60%, depending on motif length and repetition number) for the NP-complete TRs. TRStalker has consistently better performance than some stateof- the-art methods for a large range of parameters on the class of NP-complete Tandem Repeats. TRStalker aims at improving the capability of TR detection for classes of TRs for which existing methods do not perform well

PUblication MAnagement

Detecting microsatellites within genomes: significant variation among algorithms

Author: A Benet
A Goffeau
A Hauth
A Smit
AT Castelo
B Harr
C Abajian
D Dieringer
D Falush
D Goldstein
E Coward
E Rivals
E Rivals
E Rivals
ER Moxon
Eric Rivals
G Benson
G Chambers
GI Bell
GM Landau
H Ellegren
I Arzimanoglou
IHGS Consortium
J Jurka
J Jurka
J Majewski
J Taylor
JE Galagan
L Jin
M Adams
M Katti
M Kayser
M Mitas
M Morgante
MT Webster
O Delgrange
O Rose
P Calabrese
P Jarne
P Martin
Philippe Jarne
R Kolpakov
R Kolpakov
R Sainudiin
R Sokal
S Kruglyak
S Kruglyak
Sébastien Leclercq
T Pupko
TD Petes
TF Smith
V Fischetti
Y Wexler
YL Lai
Publication venue: BioMed Central
Publication date: 01/01/2007
Field of study

Abstract Background Microsatellites are short, tandemly-repeated DNA sequences which are widely distributed among genomes. Their structure, role and evolution can be analyzed based on exhaustive extraction from sequenced genomes. Several dedicated algorithms have been developed for this purpose. Here, we compared the detection efficiency of five of them (TRF, Mreps, Sputnik, STAR, and RepeatMasker). Results Our analysis was first conducted on the human X chromosome, and microsatellite distributions were characterized by microsatellite number, length, and divergence from a pure motif. The algorithms work with user-defined parameters, and we demonstrate that the parameter values chosen can strongly influence microsatellite distributions. The five algorithms were then compared by fixing parameters settings, and the analysis was extended to three other genomes (<it>Saccharomyces cerevisiae</it>, <it>Neurospora crassa </it>and <it>Drosophila melanogaster</it>) spanning a wide range of size and structure. Significant differences for all characteristics of microsatellites were observed among algorithms, but not among genomes, for both perfect and imperfect microsatellites. Striking differences were detected for short microsatellites (below 20 bp), regardless of motif. Conclusion Since the algorithm used strongly influences empirical distributions, studies analyzing microsatellite evolution based on a comparison between empirical and theoretical size distributions should therefore be considered with caution. We also discuss why a typological definition of microsatellites limits our capacity to capture their genomic distributions.</p

Crossref

Springer - Publisher Connector

Directory of Open Access Journals

TRStalker: an efficient heuristic for finding fuzzy tandem repeats

Author: Alessio Vecchio
Ames
Benson
Benson
Boeva
Brodzik
Buchner
Burkhardt
Burkhardt
Bussey
Campuzano
de la Higuera
Dujon
Elemento
Fischetti
Gelfand
Glusman
Grissa
Gupta
Gusfield
Gusfield
Hauth
Jiang
Jurka
Kelkar
Kolpakov
Kolpakov
Kolpakov
Krishnan
Kurtz
Kurtz
Landau
Leclercq
Legendre
M. Elena Renda
Marco Pellegrini
Motwani
Mudunuri
Mulmuley
O'Dushlaine
Parisi
Peterlongo
Rivals
Rivals
Rowen
Saha
Sammeth
Sharma
Sim
Smit
Sokol
Stolovitzky
Vissers
Vogler
Warburton
Wells
Wexler
Wexler
Wooster
Publication venue: Oxford University Press
Publication date: 01/01/2010
Field of study

Motivation: Genomes in higher eukaryotic organisms contain a substantial amount of repeated sequences. Tandem Repeats (TRs) constitute a large class of repetitive sequences that are originated via phenomena such as replication slippage and are characterized by close spatial contiguity. They play an important role in several molecular regulatory mechanisms, and also in several diseases (e.g. in the group of trinucleotide repeat disorders). While for TRs with a low or medium level of divergence the current methods are rather effective, the problem of detecting TRs with higher divergence (fuzzy TRs) is still open. The detection of fuzzy TRs is propaedeutic to enriching our view of their role in regulatory mechanisms and diseases. Fuzzy TRs are also important as tools to shed light on the evolutionary history of the genome, where higher divergence correlates with more remote duplication events

CiteSeerX

Crossref

PubMed Central

Archivio della Ricerca - Università di Pisa

Integrated multiple sequence alignment

Author: Sammeth Michael
Publication venue: Bielefeld University
Publication date: 01/01/2005
Field of study

Sammeth M. Integrated multiple sequence alignment. Bielefeld (Germany): Bielefeld University; 2005.The thesis presents enhancements for automated and manual multiple sequence alignment: existing alignment algorithms are made more easily accessible and new algorithms are designed for difficult cases. Firstly, we introduce the QAlign framework, a graphical user interface for multiple sequence alignment. It comprises several state-of-the-art algorithms and supports their parameters by convenient dialogs. An alignment viewer with guided editing functionality can also highlight or print regions of the alignment. Also phylogenetic features are provided, e.g., distance-based tree reconstruction methods, corrections for multiple substitutions and a tree viewer. The modular concept and the platform-independent implementation guarantee an easy extensibility. Further, we develop a constrained version of the divide-and-conquer alignment such that it can be restricted by anchors found earlier with local alignments. It can be shown that this method shares attributes of both, local and global aligners, in the quality of results as well as in the computation time. We further modify the local alignment step to work on bipartite (or even multipartite) sets for sequences where repeats overshadow valuable sequence information. In the end a technique is established that can accurately align sequences containing eventually repeated motifs. Finally, another algorithm is presented that allows to compare tandem repeat sequences by aligning them with respect to their possible repeat histories. We describe an evolutionary model including tandem duplications and excisions, and give an exact algorithm to compare two sequences under this model

Publications at Bielefeld University

A Fast and Specific Alignment Method for Minisatellite Maps

Author: Buard Jérôme
Bérard Sèverine
Gascuel Olivier
Nicolas François
Rivals Eric
Publication venue: Libertas Academica
Publication date: 01/01/2006
Field of study

Background: Variable minisatellites count among the most polymorphic markers of eukaryotic and prokaryotic genomes. This variability can affect gene coding regions, like in the prion protein gene, or gene regulation regions, like for the cystatin B gene, and be associated or implicated in diseases: the Creutzfeld-Jakob disease and the myoclonus epilepsy type 1, for our examples. When it affects neutrally evolving regions, the polymorphism in length (i.e. in number of copies) of minisatellites proved useful in population genetics. Motivation: In these tandem repeat sequences, different mutational mechanisms let the number of copies, as well as the copies themselves, vary. Especially, the interspersion of events of tandem duplication/contraction and of punctual mutation makes the succession of variant repeat much more informative than the sole allele length. To exploit this information requires the ability to align minisatellite alleles by accounting for both punctual mutations and tandem duplications. Results: We propose a minisatellite maps alignment program that improves on previous solutions. Our new program is faster, simpler, considers an extended evolutionary model, and is available to the community. We test it on the data set of 609 alleles of the MSY1 (DYF155S1) human minisatellite andconfirm its abilityto recover known evolutionary signals. Our experiments highlight that the informativeness of minisatellites resides in their length and composition polymorphisms. Exploiting both simultaneously is critical to unravel the implications of variable minisatellites in the control of gene expression and diseases. Availability: Software is available at http://atgc.lirmm.fr/ms_align/ Keywords: VNTR, tandem repeat, tandem duplication, variable costs, dynamic programming, sequence comparison

CiteSeerX

Directory of Open Access Journals

PubMed Central

ProdInra

Genetic abnormalities in premature ovarian failure patients

Author: Teearu Katre
Publication venue: Tartu Ülikool
Publication date: 01/01/2016
Field of study

Somatic mosaicism, defined as the presence of different cell populations with distinct genotypes within one individual, caused by post-zygotic errors, has long been considered as a source for human genetic variation within and between individuals. It is also plausible that the presence of large structural mosaic events could have important implications for human diseases. In this study, we provide a genome-wide survey of genetic variation in premature ovarian failure (POF) patients by analyzing SNP array data with a novel algorithmic method that deciphers mosaic structural alterations. We found mosaic aberrations in 8.2% of samples, including 23 mosaic copy number variation (CNV) regions, one mosaic X monosomy and 24 (larger than 1 Mb) mosaic uniparental disomy (UPD) events. In addition, we were able to investigate 23 novel CNVs among patients

DSpace at Tartu University Library

The evolution of the tape measure protein: units, duplications and losses

Author: Belcaid Mahdi
Bergeron Anne
Poisson Guylaine
Publication venue: BioMed Central
Publication date: 01/01/2011
Field of study

Abstract Background A large family of viruses that infect bacteria, called <it>phages</it>, is characterized by long tails used to inject DNA into their victims' cells. The <it>tape measure protein</it> got its name because the length of the corresponding gene is proportional to the length of the phage's tail: a fact shown by actually copying or splicing out parts of DNA in exemplar species. A natural question is whether there exist <it>units</it> for these tape measures, and if different tape measures have different units and lengths. Such units would allow us to retrace the evolution of tape measure proteins using their duplication/loss history. The vast number of sequenced phages genomes allows us to attack this problem with a comparative genomics approach. Results Here we describe a subset of phages whose tape measure proteins contain variable numbers of an 11 amino acids sequence repeat, aligned with sequence similarity, structural properties, and simple arithmetics. This subset provides a unique opportunity for the combinatorial study of phage evolution, without the added uncertainties of multiple alignments, which are trivial in this case, or of protein functions, that are well established. We give a heuristic that reconstructs the duplication history of these sequences, using divergent strains to discriminate between mutations that occurred before and after speciation, or lineage divergence. The heuristic is based on an efficient algorithm that gives an exhaustive enumeration of all possible parsimonious reconstructions of the duplication/speciation history of a single nucleotide. Finally, we present a method that allows, when possible, to discriminate between duplication and loss events. Conclusions Establishing the evolutionary history of viruses is difficult, in part due to extensive recombinations and gene transfers, and high mutation rates that often erase detectable similarity between homologous genes. In this paper, we introduce new tools to address this problem.</p

Crossref

Springer - Publisher Connector

Directory of Open Access Journals

PubMed Central