Search CORE

5,728 research outputs found

Mutation model for nucleotide sequences based on crystal basis

Author: Minichini C.
Sciarrino A.
Publication venue
Publication date: 08/06/2005
Field of study

A nucleotides sequence is identified, in the two (four) letters alphabet, by the the labels of a vector state of an irreducible representation of U_q(sl(2)) (U_q(sl(2) + sl(2))), in the limit q -> 0. A master equation for the distribution function is written, where the intensity of the one-spin flip is assumed to depend from the variation of the labels of the state. In the two letters approximation, the numerically computed equilibrium distribution for short sequences is nicely fitted by a Yule distribution, which is the observed distribution of the ranked short oligonucleotides frequency in DNA. The four letter alphabet description, applied to the codons, is able to reproduce the form of the fitted rank ordered usage frequencies distribution.Comment: 27 pages, 9 figure

arXiv.org e-Print Archive

Archivio della ricerca - Università degli studi di Napoli Federico II

An Efficient Rank Based Approach for Closest String and Closest Substring

Author: A Ben-Dor
A Dinu
AS Fraser
AS Fraser
AWC Liew
C de la Higuera
Chuhsing Kate Hsiao
DJ States
EV Koonin
F Nicolas
F Nicolas
J Gramm
J Palmer
JC Wooley
K Lanctot
L Schmitt
L Wang
Liviu P. Dinu
LP Dinu
LP Dinu
LP Dinu
LP Dinu
LP Dinu
LP Dinu
M Chimani
M Frances
M Karpovsky
M Li
P Diaconis
R Holmquist
Radu Ionescu
S Roman
VI Levenshtein
VY Popov
W Banzhaf
X Deng
X Liu
Publication venue: Public Library of Science
Publication date: 01/01/2011
Field of study

This paper aims to present a new genetic approach that uses rank distance for solving two known NP-hard problems, and to compare rank distance with other distance measures for strings. The two NP-hard problems we are trying to solve are closest string and closest substring. For each problem we build a genetic algorithm and we describe the genetic operations involved. Both genetic algorithms use a fitness function based on rank distance. We compare our algorithms with other genetic algorithms that use different distance measures, such as Hamming distance or Levenshtein distance, on real DNA sequences. Our experiments show that the genetic algorithms based on rank distance have the best results

CiteSeerX

Public Library of Science (PLOS)

Crossref

Directory of Open Access Journals

PubMed Central

Test Set Diameter: Quantifying the Diversity of Sets of Test Cases

Author: Clark David
Feldt Robert
Poulding Simon
Yoo Shin
Publication venue
Publication date: 10/06/2015
Field of study

A common and natural intuition among software testers is that test cases need to differ if a software system is to be tested properly and its quality ensured. Consequently, much research has gone into formulating distance measures for how test cases, their inputs and/or their outputs differ. However, common to these proposals is that they are data type specific and/or calculate the diversity only between pairs of test inputs, traces or outputs. We propose a new metric to measure the diversity of sets of tests: the test set diameter (TSDm). It extends our earlier, pairwise test diversity metrics based on recent advances in information theory regarding the calculation of the normalized compression distance (NCD) for multisets. An advantage is that TSDm can be applied regardless of data type and on any test-related information, not only the test inputs. A downside is the increased computational time compared to competing approaches. Our experiments on four different systems show that the test set diameter can help select test sets with higher structural and fault coverage than random selection even when only applied to test inputs. This can enable early test design and selection, prior to even having a software system to test, and complement other types of test automation and analysis. We argue that this quantification of test set diversity creates a number of opportunities to better understand software quality and provides practical ways to increase it.Comment: In submissio

arXiv.org e-Print Archive

Crossref

A GRASP-based memetic algorithm with path relinking for the far from most string problem.

Author: Cotta-Porras Carlos
Gallardo-Ruiz José Enrique
Publication venue: Elsevier
Publication date: 01/01/2015
Field of study

Política de acceso abierto tomada de: https://www.elsevier.com/about/policies-and-standards/copyrightThe FAR FROM MOST STRING PROBLEM (FFMSP) is a string selection problem. The objective is to ﬁnd a string whose distance to other strings in a certain input set is above a given threshold for as many of those strings as possible. This problem has links with some tasks in computational biology and its resolution has been shown to be very hard. We propose a memetic algorithm (MA) to tackle the FFMSP. This MA exploits a heuristic objective function for the problem and features initialization of the population via a Greedy Randomized Adaptive Search Procedure (GRASP) metaheuristic, intensive recombination via path relinking and local improvement via hill climbing. An extensive empirical evaluation using problem instances of both random and biological origin is done to assess parameter sensitivity and draw performance comparisons with other state-of-the-art techniques. The MA is shown to perform better than these latter techniques with statistical signiﬁcance.ANYSELF (TIN2011-28627-C04-01) of MICINN and DNEMESIS (TIC-6083) of Junta de Andalucía

Crossref

Repositorio Institucional Universidad de Málaga

Network Analysis of Differential Expression for the Identification of Disease-Causing Genes

Author: AM Yip
Bernard Thienpont
C Moehle
C von Mering
Daniela Nitsch
DB Mount
DN Cox
EH Rosenberg
EK Malmberg
F Fouss
FJ Probst
FR Bach
FR Bach
Gustavo Goldman
H Parkinson
Hilde Van Esch
HY Chuang
J Johnson
JM Wright
JR Riordan
K Kyo
K Lage
Koenraad Devriendt
L Bubendorf
L Franke
Lieven Thorrez
Léon-Charles Tranchevent
M Bakay
M Cortón
M Simoni
M Urbanek
M Urbanek
MR Jones
N Kotaja
P Moretti
PE Becker
RI Kondor
S Aerts
S Draghici
S Fine
S Franks
S Ina
S Kuramochi-Miyagawa
S Köhler
SS Tanaka
T Barrett
T Noce
T Watanabe
TK Gandhi
Y Nishimura
Yves Moreau
Z Yao
Publication venue: Public Library of Science
Publication date: 01/05/2009
Field of study

Genetic studies (in particular linkage and association studies) identify chromosomal regions involved in a disease or phenotype of interest, but those regions often contain many candidate genes, only a few of which can be followed-up for biological validation. Recently, computational methods to identify (prioritize) the most promising candidates within a region have been proposed, but they are usually not applicable to cases where little is known about the phenotype (no or few confirmed disease genes, fragmentary understanding of the biological cascades involved). We seek to overcome this limitation by replacing knowledge about the biological process by experimental data on differential gene expression between affected and healthy individuals. Considering the problem from the perspective of a gene/protein network, we assess a candidate gene by considering the level of differential expression in its neighborhood under the assumption that strong candidates will tend to be surrounded by differentially expressed neighbors. We define a notion of soft neighborhood where each gene is given a contributing weight, which decreases with the distance from the candidate gene on the protein network. To account for multiple paths between genes, we define the distance using the Laplacian exponential diffusion kernel. We score candidates by aggregating the differential expression of neighbors weighted as a function of distance. Through a randomization procedure, we rank candidates by p-values. We illustrate our approach on four monogenic diseases and successfully prioritize the known disease causing genes

Lirias

Public Library of Science (PLOS)

Crossref

Directory of Open Access Journals

PubMed Central