Search CORE

141 research outputs found

Is protein folding problem really a NP-complete one ? First investigations

Author: Bahi Jacques M.
Bienia Wojciech
Côté Nathalie
Guyeux Christophe
Publication venue
Publication date: 01/01/2013
Field of study

To determine the 3D conformation of proteins is a necessity to understand their functions or interactions with other molecules. It is commonly admitted that, when proteins fold from their primary linear structures to their final 3D conformations, they tend to choose the ones that minimize their free energy. To find the 3D conformation of a protein knowing its amino acid sequence, bioinformaticians use various models of different resolutions and artificial intelligence tools, as the protein folding prediction problem is a NP complete one. More precisely, to determine the backbone structure of the protein using the low resolution models (2D HP square and 3D HP cubic), by finding the conformation that minimize free energy, is intractable exactly. Both the proof of NP-completeness and the 2D prediction consider that acceptable conformations have to satisfy a self-avoiding walk (SAW) requirement, as two different amino acids cannot occupy a same position in the lattice. It is shown in this document that the SAW requirement considered when proving NP-completeness is different from the SAW requirement used in various prediction programs, and that they are different from the real biological requirement. Indeed, the proof of NP completeness and the predictions in silico consider conformations that are not possible in practice. Consequences of this fact are investigated in this research work.Comment: Submitted to Journal of Bioinformatics and Computational Biology, under revie

arXiv.org e-Print Archive

HAL - Université de Franche-Comté

Hal - Université Grenoble Alpes

HAL Descartes

Hal-Diderot

Reference based annotation with GeneMapper

Author: Chatterji Sourav
Pachter Lior
Publication venue: BioMed Central
Publication date: 01/01/2006
Field of study

We introduce GeneMapper, a program for transferring annotations from a well annotated genome to other genomes. Drawing on high quality curated annotations, GeneMapper enables rapid and accurate annotation of newly sequenced genomes and is suitable for both finished and draft genomes. GeneMapper uses a profile based approach for mapping genes into multiple species, improving upon the standard pairwise approach. GeneMapper is freely available for academic use

Springer - Publisher Connector

PubMed Central

Caltech Authors

Department of Computer Science Activity 1998-2004

Author: Kotz David
Publication venue: Dartmouth Digital Commons
Publication date: 20/03/2005
Field of study

This report summarizes much of the research and teaching activity of the Department of Computer Science at Dartmouth College between late 1998 and late 2004. The material for this report was collected as part of the final report for NSF Institutional Infrastructure award EIA-9802068, which funded equipment and technical staff during that six-year period. This equipment and staff supported essentially all of the department\u27s research activity during that period

Dartmouth Digital Commons (Dartmouth College)

TRStalker: an efficient heuristic for finding fuzzy tandem repeats

Author: Alessio Vecchio
Ames
Benson
Benson
Boeva
Brodzik
Buchner
Burkhardt
Burkhardt
Bussey
Campuzano
de la Higuera
Dujon
Elemento
Fischetti
Gelfand
Glusman
Grissa
Gupta
Gusfield
Gusfield
Hauth
Jiang
Jurka
Kelkar
Kolpakov
Kolpakov
Kolpakov
Krishnan
Kurtz
Kurtz
Landau
Leclercq
Legendre
M. Elena Renda
Marco Pellegrini
Motwani
Mudunuri
Mulmuley
O'Dushlaine
Parisi
Peterlongo
Rivals
Rivals
Rowen
Saha
Sammeth
Sharma
Sim
Smit
Sokol
Stolovitzky
Vissers
Vogler
Warburton
Wells
Wexler
Wexler
Wooster
Publication venue: Oxford University Press
Publication date: 01/01/2010
Field of study

Motivation: Genomes in higher eukaryotic organisms contain a substantial amount of repeated sequences. Tandem Repeats (TRs) constitute a large class of repetitive sequences that are originated via phenomena such as replication slippage and are characterized by close spatial contiguity. They play an important role in several molecular regulatory mechanisms, and also in several diseases (e.g. in the group of trinucleotide repeat disorders). While for TRs with a low or medium level of divergence the current methods are rather effective, the problem of detecting TRs with higher divergence (fuzzy TRs) is still open. The detection of fuzzy TRs is propaedeutic to enriching our view of their role in regulatory mechanisms and diseases. Fuzzy TRs are also important as tools to shed light on the evolutionary history of the genome, where higher divergence correlates with more remote duplication events

CiteSeerX

Crossref

PubMed Central

Archivio della Ricerca - Università di Pisa

Approximation algorithms for speeding up dynamic programming and denoising aCGH data

Author: Barry D.
Charalampos E. Tsourakakis
Ding J.
Gary L. Miller
Jagadish H. V.
Lejeune J.
Lejeune J.
Maria A. Tsiarli
Richard Peng
Russell Schwartz
Shi Y.
Viti F.
Publication venue: 'Association for Computing Machinery (ACM)'
Publication date
Field of study

Crossref

International Evaluation of Research and Doctoral Training at the University of Helsinki 2005-2010 : RC-Specific Evaluation of ALKO - Algorithms and Data Analysis

Author
Publication venue
Publication date: 01/01/2012
Field of study

Helsingin yliopiston digitaalinen arkisto

Parallel evolution strategy for protein threading.

Author: Islam Md. Rafiqul
Publication venue: 'University of Windsor Leddy Library'
Publication date: 01/01/2005
Field of study

A protein-sequence folds into a specific shape in order to function in its aqueous state. If the primary sequence of a protein is given, what is its three dimensional structure? This is a long-standing problem in the field of molecular biology and it has large implication to drug design and cure. Among several proposed approaches, protein threading represents one of the most promising technique. The protein threading problem (PTP) is the problem of determining the three-dimensional structure of a given but arbitrary protein sequence from a set of known structures of other proteins. This problem is known to be NP-hard and current computational approaches to threading are time-consuming and data-intensive. In this thesis, we proposed an evolution strategy (ES) based approach for protein threading (EST). We also developed two parallel approaches for the PTP problem and both are parallelizations of our novel EST. The first method, we call SQST-PEST (Single Query Single Template Parallel EST) threads a single query against a single template. We use ES to find the best alignment between the query and the template, and ES is parallelized. The second method, we call SQMT-PEST (Single Query Multiple Templates Parallel EST) to allow for threading a single query against multiple templates within reasonable time. We obtained better results than current comparable approaches, as well as significant reduction in execution time.Dept. of Computer Science. Paper copy at Leddy Library: Theses & Major Papers - Basement, West Bldg. / Call Number: Thesis2005 .I85. Source: Masters Abstracts International, Volume: 44-03, page: 1403. Thesis (M.Sc.)--University of Windsor (Canada), 2005

Scholarship at UWindsor

TRStalker: an Efficient Heuristic for Finding NP-Complete Tandem Repeats

Author: Pellegrini Marco
Renda Maria Elena
Vecchio Alessio
Publication venue
Publication date
Field of study

Genomic sequences in higher eucaryotic organisms contain a substantial amount of (almost) repeated sequences. Tandem Repeats (TRs) constitute a large class of repetitive sequences that are originated via phenomena such as replication slippage, are characterized by close spatial contiguity, and play an important role in several molecular regulatory mechanisms. Certain types of tandem repeats are highly polymorphic and constitute a fingerprint feature of individuals. Abnormal TRs are known to be linked to several diseases. Researchers in bio-informatics in the last 20 years have proposed many formal definitions for the rather loose notion of a Tandem Repeat and have proposed exact or heuristic algorithms to detect TRs in genomic sequences. The general trend has been to use formal (implicit or explicit) definitions of TR for which verification of the solution was easy (with complexity linear, or polynomial in the TR\u27s length and substitution+indel rates) while the effort was directed towards identifying efficiently the sub-strings of the input to submit to the verification phase (either implicitly or explicitly). In this paper we take a step forward: we use a definition of TR for which also the verification step is difficult (in effect, NP-complete) and we develop new filtering techniques for coping with high error levels. The resulting heuristic algorithm, christened TRStalker, is approximate since it cannot guarantee that all NP-Complete Tandem Repeats satisfying the target definition in the input string will be found. However, in synthetic experiments with 30% of errors allowed, TRStalker has demonstrated a very high recall (ranging from 100% to 60%, depending on motif length and repetition number) for the NP-complete TRs. TRStalker has consistently better performance than some stateof- the-art methods for a large range of parameters on the class of NP-complete Tandem Repeats. TRStalker aims at improving the capability of TR detection for classes of TRs for which existing methods do not perform well

PUblication MAnagement

Fifth Biennial Report : June 1999 - August 2001

Author
Publication venue: Max-Planck-Institut für Informatik
Publication date: 01/01/2001
Field of study

MPG.PuRe