24 research outputs found
Encodings of Range Maximum-Sum Segment Queries and Applications
Given an array A containing arbitrary (positive and negative) numbers, we
consider the problem of supporting range maximum-sum segment queries on A:
i.e., given an arbitrary range [i,j], return the subrange [i' ,j' ] \subseteq
[i,j] such that the sum of the numbers in A[i'..j'] is maximized. Chen and Chao
[Disc. App. Math. 2007] presented a data structure for this problem that
occupies {\Theta}(n) words, can be constructed in {\Theta}(n) time, and
supports queries in {\Theta}(1) time. Our first result is that if only the
indices [i',j'] are desired (rather than the maximum sum achieved in that
subrange), then it is possible to reduce the space to {\Theta}(n) bits,
regardless the numbers stored in A, while retaining the same construction and
query time. We also improve the best known space lower bound for any data
structure that supports range maximum-sum segment queries from n bits to
1.89113n - {\Theta}(lg n) bits, for sufficiently large values of n. Finally, we
provide a new application of this data structure which simplifies a previously
known linear time algorithm for finding k-covers: i.e., given an array A of n
numbers and a number k, find k disjoint subranges [i_1 ,j_1 ],...,[i_k ,j_k ],
such that the total sum of all the numbers in the subranges is maximized.Comment: 19 pages + 2 page appendix, 4 figures. A shortened version of this
paper will appear in CPM 201
Intron Dynamics in Ribosomal Protein Genes
The role of spliceosomal introns in eukaryotic genomes remains obscure. A large scale analysis of intron presence/absence patterns in many gene families and species is a necessary step to clarify the role of these introns. In this analysis, we used a maximum likelihood method to reconstruct the evolution of 2,961 introns in a dataset of 76 ribosomal protein genes from 22 eukaryotes and validated the results by a maximum parsimony method. Our results show that the trends of intron gain and loss differed across species in a given kingdom but appeared to be consistent within subphyla. Most subphyla in the dataset diverged around 1 billion years ago, when the “Big Bang” radiation occurred. We speculate that spliceosomal introns may play a role in the explosion of many eukaryotes at the Big Bang radiation
ProteinHistorian: Tools for the Comparative Analysis of Eukaryote Protein Origin
The evolutionary history of a protein reflects the functional history of its ancestors. Recent phylogenetic studies identified distinct evolutionary signatures that characterize proteins involved in cancer, Mendelian disease, and different ontogenic stages. Despite the potential to yield insight into the cellular functions and interactions of proteins, such comparative phylogenetic analyses are rarely performed, because they require custom algorithms. We developed ProteinHistorian to make tools for performing analyses of protein origins widely available. Given a list of proteins of interest, ProteinHistorian estimates the phylogenetic age of each protein, quantifies enrichment for proteins of specific ages, and compares variation in protein age with other protein attributes. ProteinHistorian allows flexibility in the definition of protein age by including several algorithms for estimating ages from different databases of evolutionary relationships. We illustrate the use of ProteinHistorian with three example analyses. First, we demonstrate that proteins with high expression in human, compared to chimpanzee and rhesus macaque, are significantly younger than those with human-specific low expression. Next, we show that human proteins with annotated regulatory functions are significantly younger than proteins with catalytic functions. Finally, we compare protein length and age in many eukaryotic species and, as expected from previous studies, find a positive, though often weak, correlation between protein age and length. ProteinHistorian is available through a web server with an intuitive interface and as a set of command line tools; this allows biologists and bioinformaticians alike to integrate these approaches into their analysis pipelines. ProteinHistorian's modular, extensible design facilitates the integration of new datasets and algorithms. The ProteinHistorian web server, source code, and pre-computed ages for 32 eukaryotic genomes are freely available under the GNU public license at http://lighthouse.ucsf.edu/ProteinHistorian/
Phylogenetic Distribution of Intron Positions in Alpha-Amylase Genes of Bilateria Suggests Numerous Gains and Losses
Most eukaryotes have at least some genes interrupted by introns. While it is well
accepted that introns were already present at moderate density in the last
eukaryote common ancestor, the conspicuous diversity of intron density among
genomes suggests a complex evolutionary history, with marked differences between
phyla. The question of the rates of intron gains and loss in the course of
evolution and factors influencing them remains controversial. We have
investigated a single gene family, alpha-amylase, in 55 species covering a
variety of animal phyla. Comparison of intron positions across phyla suggests a
complex history, with a likely ancestral intronless gene undergoing frequent
intron loss and gain, leading to extant intron/exon structures that are highly
variable, even among species from the same phylum. Because introns are known to
play no regulatory role in this gene and there is no alternative splicing, the
structural differences may be interpreted more easily: intron positions, sizes,
losses or gains may be more likely related to factors linked to splicing
mechanisms and requirements, and to recognition of introns and exons, or to more
extrinsic factors, such as life cycle and population size. We have shown that
intron losses outnumbered gains in recent periods, but that “resets”
of intron positions occurred at the origin of several phyla, including
vertebrates. Rates of gain and loss appear to be positively correlated. No phase
preference was found. We also found evidence for parallel gains and for intron
sliding. Presence of introns at given positions was correlated to a strong
protosplice consensus sequence AG/G, which was much weaker in the absence of
intron. In contrast, recent intron insertions were not associated with a
specific sequence. In animal Amy genes, population size and
generation time seem to have played only minor roles in shaping gene
structures
The evolutionary and functional diversity of classical and lesser-known cytoplasmic and organellar translational GTPases across the tree of life
Improved algorithms for the k-maximum subarray problem for small k
Abstract. The maximum subarray problem for a one- or two-dimensional array is to find the array portion that maiximizes the sum of array elements in it. The K-maximum subarray problem is to find the K subarrays with largest sums. We improve the time complexity for the one-dimensional case from O(min{K + n log 2 n, n √ K}) for 0 ≤ K ≤ n(n − 1)/2 to O(n log K + K 2) for K ≤ n. The latter is better when K ≤ √ n log n. If we simply extend this result to the two-dimensional case, we will have the complexity of O(n 3 log K + K 2 n 2).We improve this complexity to O(n 3) for K ≤ √ n.