Search CORE

699 research outputs found

Searching a bitstream in linear time for the longest substring of any given density

Author: Benjamin A. Burton
D.E. Knuth
D.R. Musser
G. Bernardi
G. Marsaglia
K. Chen
L. Duret
M.H. Goldwasser
P. Erdős
P.M. Sharp
R. Arratia
R. Hardison
R.I. Greenberg
S. Boztaş
S. Zoubak
S.M. Fullerton
T.H. Cormen
Y.-H. Hsieh
Y.-L. Lin
Publication venue: 'Springer Science and Business Media LLC'
Publication date: 07/06/2010
Field of study

Given an arbitrary bitstream, we consider the problem of finding the longest substring whose ratio of ones to zeroes equals a given value. The central result of this paper is an algorithm that solves this problem in linear time. The method involves (i) reformulating the problem as a constrained walk through a sparse matrix, and then (ii) developing a data structure for this sparse matrix that allows us to perform each step of the walk in amortised constant time. We also give a linear time algorithm to find the longest substring whose ratio of ones to zeroes is bounded below by a given value. Both problems have practical relevance to cryptography and bioinformatics.Comment: 22 pages, 19 figures; v2: minor edits and enhancement

arXiv.org e-Print Archive

Crossref

University of Queensland eSpace

Linear-Time Algorithms for Computing Maximum-Density Sequence Segments with Bioinformatics Applications

Author: Alexandrov
Bentley
Bernardi
Bernardi
Charlesworth
Chung
Duret
Eyre-Walker
Eyre-Walker
Fields
Filipski
Francino
Fullerton
Greenberg
Guldberg
Hardison
Henke
Holmquist
Hsueh-I Lu
Huang
Ikehara
Inman
Jin
Kim
Lin
Macaya
Madsen
Michael H. Goldwasser
Ming-Yang Kao
Murata
Nekrutenko
Rice
Scotto
Sellers
Sharp
Soriano
Stojanovic
Sueoka
Wang
Wolfe
Wu
Zoubak
Publication venue: 'Elsevier BV'
Publication date: 04/11/2002
Field of study

We study an abstract optimization problem arising from biomolecular sequence analysis. For a sequence A of pairs (a_i,w_i) for i = 1,..,n and w_i>0, a segment A(i,j) is a consecutive subsequence of A starting with index i and ending with index j. The width of A(i,j) is w(i,j) = sum_{i <= k <= j} w_k, and the density is (sum_{i<= k <= j} a_k)/ w(i,j). The maximum-density segment problem takes A and two values L and U as input and asks for a segment of A with the largest possible density among those of width at least L and at most U. When U is unbounded, we provide a relatively simple, O(n)-time algorithm, improving upon the O(n \log L)-time algorithm by Lin, Jiang and Chao. When both L and U are specified, there are no previous nontrivial results. We solve the problem in O(n) time if w_i=1 for all i, and more generally in O(n+n\log(U-L+1)) time when w_i>=1 for all i.Comment: 23 pages, 13 figures. A significant portion of these results appeared under the title, "Fast Algorithms for Finding Maximum-Density Segments of a Sequence with Applications to Bioinformatics," in Proceedings of the Second Workshop on Algorithms in Bioinformatics (WABI), volume 2452 of Lecture Notes in Computer Science (Springer-Verlag, Berlin), R. Guigo and D. Gusfield editors, 2002, pp. 157--17

arXiv.org e-Print Archive

Elsevier - Publisher Connector

Crossref

National Taiwan University Repository

The role of mutation rate variation and genetic diversity in the architecture of human disease

Author: A Hodgkinson
A Hodgkinson
A Kong
Adam Eyre-Walker
B Charlesworth
B Charlesworth
C Park
CJ Pink
CL Chen
FA Kondrashov
G McVicker
H Ellegren
H Huang
I Hellmann
I Hellmann
I. King Jordan
J Maynard Smith
JA Stamatoyannopoulos
JJ Michaelson
KH Wolfe
L Duret
MJ Lercher
MT Maurano
MW Nachman
NG Smith
R Blekhman
RS Hansen
S Tyekucheva
TI Gossmann
Ying Chen Eyre-Walker
Publication venue: 'Public Library of Science (PLoS)'
Publication date: 29/08/2013
Field of study

Background We have investigated the role that the mutation rate and the structure of genetic variation at a locus play in determining whether a gene is involved in disease. We predict that the mutation rate and its genetic diversity should be higher in genes associated with disease, unless all genes that could cause disease have already been identified. Results Consistent with our predictions we find that genes associated with Mendelian and complex disease are substantially longer than non-disease genes. However, we find that both Mendelian and complex disease genes are found in regions of the genome with relatively low mutation rates, as inferred from intron divergence between humans and chimpanzees, and they are predicted to have similar rates of non-synonymous mutation as other genes. Finally, we find that disease genes are in regions of significantly elevated genetic diversity, even when variation in the rate of mutation is controlled for. The effect is small nevertheless. Conclusions Our results suggest that gene length contributes to whether a gene is associated with disease. However, the mutation rate and the genetic architecture of the locus appear to play only a minor role in determining whether a gene is associated with disease

arXiv.org e-Print Archive

Public Library of Science (PLOS)

Crossref

Directory of Open Access Journals

PubMed Central

Sussex Research Online

FigShare

Annotation of two large contiguous regions from the Haemonchus contortus genome using RNA-seq and comparative analysis with Caenorhabditis elegans

Author: A Coghlan
A Coghlan
A Couthier
AJ Wolstenholme
Anna V. Protasio
C Liu
Clotilde K. S. Carlow
DB Guiliano
DL Laughton
DL Redmond
DP Knox
E Ghedin
E Redman
F Jackson
Frank Jackson
Gary Saunders
H Li
H Li
J Parkinson
J Spieth
JC Abbott
JH Graber
JL Bessereau
JM Ranz
John S. Gilleard
JR Vanfleteren
JS Gilleard
JS Gilleard
K Rutherford
Karen Mungall
L Duret
L Duret
L Rufener
LD Stein
LF LeJambre
LW Hillier
M Caceres
M Deutsch
Martin Hunt
Matthew Berriman
Michael Quail
MJ Callaghan
PS Chain
R Hoekstra
R Kaminsky
R Prichard
Robin Beech
Roz Laing
S Chen
S Leroy
Steven Laing
T Blumenthal
T Carver
TJ Carver
V Grillo
W Qian
Y Tanizawa
Publication venue: 'Public Library of Science (PLoS)'
Publication date: 15/08/2011
Field of study

The genomes of numerous parasitic nematodes are currently being sequenced, but their complexity and size, together with high levels of intra-specific sequence variation and a lack of reference genomes, makes their assembly and annotation a challenging task. Haemonchus contortus is an economically significant parasite of livestock that is widely used for basic research as well as for vaccine development and drug discovery. It is one of many medically and economically important parasites within the strongylid nematode group. This group of parasites has the closest phylogenetic relationship with the model organism Caenorhabditis elegans, making comparative analysis a potentially powerful tool for genome annotation and functional studies. To investigate this hypothesis, we sequenced two contiguous fragments from the H. contortus genome and undertook detailed annotation and comparative analysis with C. elegans. The adult H. contortus transcriptome was sequenced using an Illumina platform and RNA-seq was used to annotate a 409 kb overlapping BAC tiling path relating to the X chromosome and a 181 kb BAC insert relating to chromosome I. In total, 40 genes and 12 putative transposable elements were identified. 97.5% of the annotated genes had detectable homologues in C. elegans of which 60% had putative orthologues, significantly higher than previous analyses based on EST analysis. Gene density appears to be less in H. contortus than in C. elegans, with annotated H. contortus genes being an average of two-to-three times larger than their putative C. elegans orthologues due to a greater intron number and size. Synteny appears high but gene order is generally poorly conserved, although areas of conserved microsynteny are apparent. C. elegans operons appear to be partially conserved in H. contortus. Our findings suggest that a combination of RNA-seq and comparative analysis with C. elegans is a powerful approach for the annotation and analysis of strongylid nematode genomes

Public Library of Science (PLOS)

Crossref

Directory of Open Access Journals

PubMed Central

Enlighten

Why highly expressed proteins evolve slowly

Author: Akashi
Akashi
Akashi
Bloom
Bucciantini
C. Adami
C. O. Wilke
Cho
Coghlan
D. A. Drummond
Dong
Duret
Ellis
F. H. Arnold
Fraser
Ghaemmaghami
Goldberg
Greenbaum
Gu
Herbeck
Hirsh
Holstege
Hurst
J. D. Bloom
Kellis
Kellis
Kurtzman
Marais
Pal
Pal
Parker
Precup
P l
Rokas
Seoighe
Sharp
Sharp
Spreitzer
Subramanian
Wall
Yang
Zuckerkandl
Publication venue: 'Proceedings of the National Academy of Sciences'
Publication date: 12/08/2005
Field of study

Much recent work has explored molecular and population-genetic constraints on the rate of protein sequence evolution. The best predictor of evolutionary rate is expression level, for reasons which have remained unexplained. Here, we hypothesize that selection to reduce the burden of protein misfolding will favor protein sequences with increased robustness to translational missense errors. Pressure for translational robustness increases with expression level and constrains sequence evolution. Using several sequenced yeast genomes, global expression and protein abundance data, and sets of paralogs traceable to an ancient whole-genome duplication in yeast, we rule out several confounding effects and show that expression level explains roughly half the variation in Saccharomyces cerevisiae protein evolutionary rates. We examine causes for expression's dominant role and find that genome-wide tests favor the translational robustness explanation over existing hypotheses that invoke constraints on function or translational efficiency. Our results suggest that proteins evolve at rates largely unrelated to their functions, and can explain why highly expressed proteins evolve slowly across the tree of life.Comment: 40 pages, 3 figures, with supporting informatio

arXiv.org e-Print Archive

Crossref

PubMed Central

Caltech Authors

General Rules for Optimal Codon Choice

Author: D Graur
DC Shields
DC Shields
Dmitri A. Petrov
EP Rocha
F Wright
F Yamao
FH Crick
H Akashi
H Akashi
JA Novembre
L Duret
L Duret
M Bulmer
M Gouy
Michael W. Nachman
NA Moran
PM Sharp
PM Sharp
PM Sharp
R Hershberg
RD Knight
RM Goetz
Ruth Hershberg
S Ghaemmaghami
S Kanaya
S Kanaya
S Vicario
SL Chen
T Ikemura
Publication venue: Public Library of Science
Publication date: 01/07/2009
Field of study

Different synonymous codons are favored by natural selection for translation efficiency and accuracy in different organisms. The rules governing the identities of favored codons in different organisms remain obscure. In fact, it is not known whether such rules exist or whether favored codons are chosen randomly in evolution in a process akin to a series of frozen accidents. Here, we study this question by identifying for the first time the favored codons in 675 bacteria, 52 archea, and 10 fungi. We use a number of tests to show that the identified codons are indeed likely to be favored and find that across all studied organisms the identity of favored codons tracks the GC content of the genomes. Once the effect of the genomic GC content on selectively favored codon choice is taken into account, additional universal amino acid specific rules governing the identity of favored codons become apparent. Our results provide for the first time a clear set of rules governing the evolution of selectively favored codon usage. Based on these results, we describe a putative scenario for how evolutionary shifts in the identity of selectively favored codons can occur without even temporary weakening of natural selection for codon bias

Public Library of Science (PLOS)

Crossref

Directory of Open Access Journals

PubMed Central

Magnetic fields in noncommutative quantum mechanics

Author: Acatrinei C
Acatrinei C
Acatrinei C
Arnol'd V I
Bastos C
Bichl A A
Blank J
Bytsko A G
Cohen-Tannoudji C
Correa D H
Dadic I
Delduc F
Doplicher S
Duval C
F Delduc
F Gieres
Gazeau J-P
Gerstenhaber M
Grümm H R
Hancock J
Heisenberg W
Jackiw R
Klauder J R
M Lefrancois
Macris N
Madore J
Magro G
Marsden J E
Mezincescu L
Moyal J E
Pascual P
Pauli W
Q Duret
Reed M
Riccardi M
Rivasseau V
Rivelles V O
Scholtz F G
Seiberg N
Smailagic A
Sochichiu C
Sternheimer D
Suo B
Zachos C K
Publication venue: 'IOP Publishing'
Publication date: 23/04/2007
Field of study

We discuss various descriptions of a quantum particle on noncommutative space in a (possibly non-constant) magnetic field. We have tried to present the basic facts in a unified and synthetic manner, and to clarify the relationship between various approaches and results that are scattered in the literature.Comment: Dedicated to the memory of Julius Wess. Work presented by F. Gieres at the conference `Non-commutative Geometry and Physics' (Orsay, April 2007

arXiv.org e-Print Archive

Recombination dynamics of a human Y-chromosomal palindrome:rapid GC-biased gene conversion, multi-kilobase conversion tracts, and rare inversions

Author: A Geraldes
A Scally
AJ Jeffreys
AJ Jeffreys
B Trombetta
C Batini
E Bosch
E Mancera
FO Losch
G Marais
G Marais
GA Marais
Georgina R. Bowden
H Guillon
H Skaletsky
HM Cann
IL Berg
J Lange
J Pecon Slattery
J Wang
JA Armour
JF Hughes
JF Hughes
JK Davis
JM Chen
JZ Li
K Holloway
KM Sullivan
L Duret
M Iwase
M Iwase
M Mendez-Lago
Mark A. Jobling
Mikkel H. Schierup
N Backström
N Bouzekri
Patricia Balaresque
PE Warburton
Pille Hallast
R Assis
RD Bagnall
S Repping
S Rozen
S Sarbajna
S Sengupta
SM Adams
Stéphane Ballereau
T Connallon
TM Karafet
W Shi
W Wei
Y Xue
ZH Rosser
Publication venue: 'Public Library of Science (PLoS)'
Publication date: 01/01/2013
Field of study

The male-specific region of the human Y chromosome (MSY) includes eight large inverted repeats (palindromes) in which arm-to-arm similarity exceeds 99.9%, due to gene conversion activity. Here, we studied one of these palindromes, P6, in order to illuminate the dynamics of the gene conversion process. We genotyped ten paralogous sequence variants (PSVs) within the arms of P6 in 378 Y chromosomes whose evolutionary relationships within the SNP-defined Y phylogeny are known. This allowed the identification of 146 historical gene conversion events involving individual PSVs, occurring at a rate of 2.9-8.4×10(-4) events per generation. A consideration of the nature of nucleotide change and the ancestral state of each PSV showed that the conversion process was significantly biased towards the fixation of G or C nucleotides (GC-biased), and also towards the ancestral state. Determination of haplotypes by long-PCR allowed likely co-conversion of PSVs to be identified, and suggested that conversion tract lengths are large, with a mean of 2068 bp, and a maximum in excess of 9 kb. Despite the frequent formation of recombination intermediates implied by the rapid observed gene conversion activity, resolution via crossover is rare: only three inversions within P6 were detected in the sample. An analysis of chimpanzee and gorilla P6 orthologs showed that the ancestral state bias has existed in all three species, and comparison of human and chimpanzee sequences with the gorilla outgroup confirmed that GC bias of the conversion process has apparently been active in both the human and chimpanzee lineages

Crossref

Directory of Open Access Journals

PubMed Central

Edinburgh Research Explorer

FigShare

Leicester Research Archive

The Hanoi Omega-Automata Format

Author: A Duret-Lutz
A Duret-Lutz
C Löding
GJ Holzmann
H Tauriainen
J Klein
J Klein
J Křetínský
K Chatterjee
M Kwiatkowska
M-H Tsai
MY Vardi
SC Krishnan
T Babiak
T Babiak
Z Komárková
Publication venue: 'Springer Science and Business Media LLC'
Publication date: 01/01/2015
Field of study

We propose a flexible exchange format for ω-automata, as typically used in formal verification, and implement support for it in a range of established tools. Our aim is to simplify the interaction of tools, helping the research community to build upon other people’s work. A key feature of the format is the use of very generic acceptance conditions, specified by Boolean combinations of acceptance primitives, rather than being limited to common cases such as Büchi, Streett, or Rabin. Such flexibility in the choice of acceptance conditions can be exploited in applications, for example in probabilistic model checking, and furthermore encourages the development of acceptance-agnostic tools for automata manipulations. The format allows acceptance conditions that are either state-based or transition-based, and also supports alternating automata

Crossref

University of Birmingham Research Portal

IST Austria: PubRep (Institute of Science and Technology)

Translational selection on SHH genes

Author: Akashi H
Atefeh Khoshnevisan
Behnaz Saffar
Bulmer M
Debry RW
Ditmar KA
Dong H
Duret L
Ikemura T
Ikemura T
Kanaya S
Konu O
Lavner Y
Levy JP
Mohammadreza Hajjari
Moriyama EN
Musto H
Powell JR
Rocha EP
Sharp PM
Sørensen MA
Wolfe KH
Zuckerkandl E
Publication venue: Sociedade Brasileira de Genética
Publication date: 01/01/2010
Field of study

Codon usage bias has been observed in various organisms. In this study, the correlation between SHH genes expression in some tissues and codon usage features was analyzed by bioinformatics. We found that translational selection may act on compositional features of this set of genes

LAReferencia - Red Federada de Repositorios Institucionales de Publicaciones Científicas Latinoamericanas

Crossref

PubMed Central