702 research outputs found
Searching a bitstream in linear time for the longest substring of any given density
Given an arbitrary bitstream, we consider the problem of finding the longest
substring whose ratio of ones to zeroes equals a given value. The central
result of this paper is an algorithm that solves this problem in linear time.
The method involves (i) reformulating the problem as a constrained walk through
a sparse matrix, and then (ii) developing a data structure for this sparse
matrix that allows us to perform each step of the walk in amortised constant
time. We also give a linear time algorithm to find the longest substring whose
ratio of ones to zeroes is bounded below by a given value. Both problems have
practical relevance to cryptography and bioinformatics.Comment: 22 pages, 19 figures; v2: minor edits and enhancement
Linear-Time Algorithms for Computing Maximum-Density Sequence Segments with Bioinformatics Applications
We study an abstract optimization problem arising from biomolecular sequence
analysis. For a sequence A of pairs (a_i,w_i) for i = 1,..,n and w_i>0, a
segment A(i,j) is a consecutive subsequence of A starting with index i and
ending with index j. The width of A(i,j) is w(i,j) = sum_{i <= k <= j} w_k, and
the density is (sum_{i<= k <= j} a_k)/ w(i,j). The maximum-density segment
problem takes A and two values L and U as input and asks for a segment of A
with the largest possible density among those of width at least L and at most
U. When U is unbounded, we provide a relatively simple, O(n)-time algorithm,
improving upon the O(n \log L)-time algorithm by Lin, Jiang and Chao. When both
L and U are specified, there are no previous nontrivial results. We solve the
problem in O(n) time if w_i=1 for all i, and more generally in
O(n+n\log(U-L+1)) time when w_i>=1 for all i.Comment: 23 pages, 13 figures. A significant portion of these results appeared
under the title, "Fast Algorithms for Finding Maximum-Density Segments of a
Sequence with Applications to Bioinformatics," in Proceedings of the Second
Workshop on Algorithms in Bioinformatics (WABI), volume 2452 of Lecture Notes
in Computer Science (Springer-Verlag, Berlin), R. Guigo and D. Gusfield
editors, 2002, pp. 157--17
Annotation of two large contiguous regions from the Haemonchus contortus genome using RNA-seq and comparative analysis with Caenorhabditis elegans
The genomes of numerous parasitic nematodes are currently being sequenced, but their complexity and size, together with high levels of intra-specific sequence variation and a lack of reference genomes, makes their assembly and annotation a challenging task. Haemonchus contortus is an economically significant parasite of livestock that is widely used for basic research as well as for vaccine development and drug discovery. It is one of many medically and economically important parasites within the strongylid nematode group. This group of parasites has the closest phylogenetic relationship with the model organism Caenorhabditis elegans, making comparative analysis a potentially powerful tool for genome annotation and functional studies. To investigate this hypothesis, we sequenced two contiguous fragments from the H. contortus genome and undertook detailed annotation and comparative analysis with C. elegans. The adult H. contortus transcriptome was sequenced using an Illumina platform and RNA-seq was used to annotate a 409 kb overlapping BAC tiling path relating to the X chromosome and a 181 kb BAC insert relating to chromosome I. In total, 40 genes and 12 putative transposable elements were identified. 97.5% of the annotated genes had detectable homologues in C. elegans of which 60% had putative orthologues, significantly higher than previous analyses based on EST analysis. Gene density appears to be less in H. contortus than in C. elegans, with annotated H. contortus genes being an average of two-to-three times larger than their putative C. elegans orthologues due to a greater intron number and size. Synteny appears high but gene order is generally poorly conserved, although areas of conserved microsynteny are apparent. C. elegans operons appear to be partially conserved in H. contortus. Our findings suggest that a combination of RNA-seq and comparative analysis with C. elegans is a powerful approach for the annotation and analysis of strongylid nematode genomes
The role of mutation rate variation and genetic diversity in the architecture of human disease
Background
We have investigated the role that the mutation rate and the structure of genetic variation at a locus play in determining whether a gene is involved in disease. We predict that the mutation rate and its genetic diversity should be higher in genes associated with disease, unless all genes that could cause disease have already been identified.
Results
Consistent with our predictions we find that genes associated with Mendelian and complex disease are substantially longer than non-disease genes. However, we find that both Mendelian and complex disease genes are found in regions of the genome with relatively low mutation rates, as inferred from intron divergence between humans and chimpanzees, and they are predicted to have similar rates of non-synonymous mutation as other genes. Finally, we find that disease genes are in regions of significantly elevated genetic diversity, even when variation in the rate of mutation is controlled for. The effect is small nevertheless.
Conclusions
Our results suggest that gene length contributes to whether a gene is associated with disease. However, the mutation rate and the genetic architecture of the locus appear to play only a minor role in determining whether a gene is associated with disease
Why highly expressed proteins evolve slowly
Much recent work has explored molecular and population-genetic constraints on
the rate of protein sequence evolution. The best predictor of evolutionary rate
is expression level, for reasons which have remained unexplained. Here, we
hypothesize that selection to reduce the burden of protein misfolding will
favor protein sequences with increased robustness to translational missense
errors. Pressure for translational robustness increases with expression level
and constrains sequence evolution. Using several sequenced yeast genomes,
global expression and protein abundance data, and sets of paralogs traceable to
an ancient whole-genome duplication in yeast, we rule out several confounding
effects and show that expression level explains roughly half the variation in
Saccharomyces cerevisiae protein evolutionary rates. We examine causes for
expression's dominant role and find that genome-wide tests favor the
translational robustness explanation over existing hypotheses that invoke
constraints on function or translational efficiency. Our results suggest that
proteins evolve at rates largely unrelated to their functions, and can explain
why highly expressed proteins evolve slowly across the tree of life.Comment: 40 pages, 3 figures, with supporting informatio
General Rules for Optimal Codon Choice
Different synonymous codons are favored by natural selection for translation efficiency and accuracy in different organisms. The rules governing the identities of favored codons in different organisms remain obscure. In fact, it is not known whether such rules exist or whether favored codons are chosen randomly in evolution in a process akin to a series of frozen accidents. Here, we study this question by identifying for the first time the favored codons in 675 bacteria, 52 archea, and 10 fungi. We use a number of tests to show that the identified codons are indeed likely to be favored and find that across all studied organisms the identity of favored codons tracks the GC content of the genomes. Once the effect of the genomic GC content on selectively favored codon choice is taken into account, additional universal amino acid specific rules governing the identity of favored codons become apparent. Our results provide for the first time a clear set of rules governing the evolution of selectively favored codon usage. Based on these results, we describe a putative scenario for how evolutionary shifts in the identity of selectively favored codons can occur without even temporary weakening of natural selection for codon bias
Magnetic fields in noncommutative quantum mechanics
We discuss various descriptions of a quantum particle on noncommutative space
in a (possibly non-constant) magnetic field. We have tried to present the basic
facts in a unified and synthetic manner, and to clarify the relationship
between various approaches and results that are scattered in the literature.Comment: Dedicated to the memory of Julius Wess. Work presented by F. Gieres
at the conference `Non-commutative Geometry and Physics' (Orsay, April 2007
Recombination dynamics of a human Y-chromosomal palindrome:rapid GC-biased gene conversion, multi-kilobase conversion tracts, and rare inversions
The male-specific region of the human Y chromosome (MSY) includes eight large inverted repeats (palindromes) in which arm-to-arm similarity exceeds 99.9%, due to gene conversion activity. Here, we studied one of these palindromes, P6, in order to illuminate the dynamics of the gene conversion process. We genotyped ten paralogous sequence variants (PSVs) within the arms of P6 in 378 Y chromosomes whose evolutionary relationships within the SNP-defined Y phylogeny are known. This allowed the identification of 146 historical gene conversion events involving individual PSVs, occurring at a rate of 2.9-8.4×10(-4) events per generation. A consideration of the nature of nucleotide change and the ancestral state of each PSV showed that the conversion process was significantly biased towards the fixation of G or C nucleotides (GC-biased), and also towards the ancestral state. Determination of haplotypes by long-PCR allowed likely co-conversion of PSVs to be identified, and suggested that conversion tract lengths are large, with a mean of 2068 bp, and a maximum in excess of 9 kb. Despite the frequent formation of recombination intermediates implied by the rapid observed gene conversion activity, resolution via crossover is rare: only three inversions within P6 were detected in the sample. An analysis of chimpanzee and gorilla P6 orthologs showed that the ancestral state bias has existed in all three species, and comparison of human and chimpanzee sequences with the gorilla outgroup confirmed that GC bias of the conversion process has apparently been active in both the human and chimpanzee lineages
The Hanoi Omega-Automata Format
We propose a flexible exchange format for ω-automata, as typically used in formal verification, and implement support for it in a range of established tools. Our aim is to simplify the interaction of tools, helping the research community to build upon other people’s work. A key feature of the format is the use of very generic acceptance conditions, specified by Boolean combinations of acceptance primitives, rather than being limited to common cases such as Büchi, Streett, or Rabin. Such flexibility in the choice of acceptance conditions can be exploited in applications, for example in probabilistic model checking, and furthermore encourages the development of acceptance-agnostic tools for automata manipulations. The format allows acceptance conditions that are either state-based or transition-based, and also supports alternating automata
Translational selection on SHH genes
Codon usage bias has been observed in various organisms. In this study, the correlation between SHH genes expression in some tissues and codon usage features was analyzed by bioinformatics. We found that translational selection may act on compositional features of this set of genes
- …