7,553 research outputs found

    Weighted ancestors in suffix trees

    Full text link
    The classical, ubiquitous, predecessor problem is to construct a data structure for a set of integers that supports fast predecessor queries. Its generalization to weighted trees, a.k.a. the weighted ancestor problem, has been extensively explored and successfully reduced to the predecessor problem. It is known that any solution for both problems with an input set from a polynomially bounded universe that preprocesses a weighted tree in O(n polylog(n)) space requires \Omega(loglogn) query time. Perhaps the most important and frequent application of the weighted ancestors problem is for suffix trees. It has been a long-standing open question whether the weighted ancestors problem has better bounds for suffix trees. We answer this question positively: we show that a suffix tree built for a text w[1..n] can be preprocessed using O(n) extra space, so that queries can be answered in O(1) time. Thus we improve the running times of several applications. Our improvement is based on a number of data structure tools and a periodicity-based insight into the combinatorial structure of a suffix tree.Comment: 27 pages, LNCS format. A condensed version will appear in ESA 201

    Clustering Time Series from Mixture Polynomial Models with Discretised Data

    Get PDF
    Clustering time series is an active research area with applications in many fields. One common feature of time series is the likely presence of outliers. These uncharacteristic data can significantly effect the quality of clusters formed. This paper evaluates a method of over-coming the detrimental effects of outliers. We describe some of the alternative approaches to clustering time series, then specify a particular class of model for experimentation with k-means clustering and a correlation based distance metric. For data derived from this class of model we demonstrate that discretising the data into a binary series of above and below the median improves the clustering when the data has outliers. More specifically, we show that firstly discretisation does not significantly effect the accuracy of the clusters when there are no outliers and secondly it significantly increases the accuracy in the presence of outliers, even when the probability of outlier is very low

    Testing Generalised Freeness of Words

    Get PDF
    Pseudo-repetitions are a natural generalisation of the classical notion of repetitions in sequences: they are the repeated concatenation of a word and its encoding under a certain morphism or antimorphism (anti-/morphism, for short). We approach the problem of deciding efficiently, for a word w and a literal anti-/morphism f, whether w contains an instance of a given pattern involving a variable x and its image under f, i.e., f(x). Our results generalise both the problem of finding fixed repetitive structures (e.g., squares, cubes) inside a word and the problem of finding palindromic structures inside a word. For instance, we can detect efficiently a factor of the form xx^Rxxx^R, or any other pattern of such type. We also address the problem of testing efficiently, in the same setting, whether the word w contains an arbitrary pseudo-repetition of a given exponent

    Analysing Astronomy Algorithms for GPUs and Beyond

    Full text link
    Astronomy depends on ever increasing computing power. Processor clock-rates have plateaued, and increased performance is now appearing in the form of additional processor cores on a single chip. This poses significant challenges to the astronomy software community. Graphics Processing Units (GPUs), now capable of general-purpose computation, exemplify both the difficult learning-curve and the significant speedups exhibited by massively-parallel hardware architectures. We present a generalised approach to tackling this paradigm shift, based on the analysis of algorithms. We describe a small collection of foundation algorithms relevant to astronomy and explain how they may be used to ease the transition to massively-parallel computing architectures. We demonstrate the effectiveness of our approach by applying it to four well-known astronomy problems: Hogbom CLEAN, inverse ray-shooting for gravitational lensing, pulsar dedispersion and volume rendering. Algorithms with well-defined memory access patterns and high arithmetic intensity stand to receive the greatest performance boost from massively-parallel architectures, while those that involve a significant amount of decision-making may struggle to take advantage of the available processing power.Comment: 10 pages, 3 figures, accepted for publication in MNRA

    Detecting One-variable Patterns

    Full text link
    Given a pattern p=s1x1s2x2sr1xr1srp = s_1x_1s_2x_2\cdots s_{r-1}x_{r-1}s_r such that x1,x2,,xr1{x,x}x_1,x_2,\ldots,x_{r-1}\in\{x,\overset{{}_{\leftarrow}}{x}\}, where xx is a variable and x\overset{{}_{\leftarrow}}{x} its reversal, and s1,s2,,srs_1,s_2,\ldots,s_r are strings that contain no variables, we describe an algorithm that constructs in O(rn)O(rn) time a compact representation of all PP instances of pp in an input string of length nn over a polynomially bounded integer alphabet, so that one can report those instances in O(P)O(P) time.Comment: 16 pages (+13 pages of Appendix), 4 figures, accepted to SPIRE 201

    A Linear-Time n 0.4-Approximation for Longest Common Subsequence

    Get PDF
    We consider the classic problem of computing the Longest Common Subsequence(LCS) of two strings of length nn. While a simple quadratic algorithm has beenknown for the problem for more than 40 years, no faster algorithm has beenfound despite an extensive effort. The lack of progress on the problem hasrecently been explained by Abboud, Backurs, and Vassilevska Williams [FOCS'15]and Bringmann and K\"unnemann [FOCS'15] who proved that there is nosubquadratic algorithm unless the Strong Exponential Time Hypothesis fails.This has led the community to look for subquadratic approximation algorithmsfor the problem. Yet, unlike the edit distance problem for which a constant-factorapproximation in almost-linear time is known, very little progress has beenmade on LCS, making it a notoriously difficult problem also in the realm ofapproximation. For the general setting, only a naiveO(nε/2)O(n^{\varepsilon/2})-approximation algorithm with running timeO~(n2ε)\tilde{O}(n^{2-\varepsilon}) has been known, for any constant 0ε10 \varepsilon \le 1. Recently, a breakthrough result by Hajiaghayi, Seddighin,Seddighin, and Sun [SODA'19] provided a linear-time algorithm that yields aO(n0.497956)O(n^{0.497956})-approximation in expectation; improving upon the naiveO(n)O(\sqrt{n})-approximation for the first time. In this paper, we provide an algorithm that in time O(n2ε)O(n^{2-\varepsilon})computes an O~(n2ε/5)\tilde{O}(n^{2\varepsilon/5})-approximation with highprobability, for any 00 \tilde{O}(n^{0.4})approximationinlineartime,improvingupontheboundofHajiaghayi,Seddighin,Seddighin,andSun,(2)providesanalgorithmwhoseapproximationscaleswithanysubquadraticrunningtime-approximation in linear time, improving upon the bound ofHajiaghayi, Seddighin, Seddighin, and Sun, (2) provides an algorithm whoseapproximation scales with any subquadratic running time O(n^{2-\varepsilon}),improvinguponthenaiveboundof,improving upon the naive bound of O(n^{\varepsilon/2})forany for any \varepsilon$,and (3) instead of only in expectation, succeeds with high probability.<br

    Multifractal analysis of complex networks

    Full text link
    Complex networks have recently attracted much attention in diverse areas of science and technology. Many networks such as the WWW and biological networks are known to display spatial heterogeneity which can be characterized by their fractal dimensions. Multifractal analysis is a useful way to systematically describe the spatial heterogeneity of both theoretical and experimental fractal patterns. In this paper, we introduce a new box covering algorithm for multifractal analysis of complex networks. This algorithm is used to calculate the generalized fractal dimensions DqD_{q} of some theoretical networks, namely scale-free networks, small world networks and random networks, and one kind of real networks, namely protein-protein interaction networks of different species. Our numerical results indicate the existence of multifractality in scale-free networks and protein-protein interaction networks, while the multifractal behavior is not clear-cut for small world networks and random networks. The possible variation of DqD_{q} due to changes in the parameters of the theoretical network models is also discussed.Comment: 18 pages, 7 figures, 4 table

    Ancilla-based quantum simulation

    Full text link
    We consider simulating the BCS Hamiltonian, a model of low temperature superconductivity, on a quantum computer. In particular we consider conducting the simulation on the qubus quantum computer, which uses a continuous variable ancilla to generate interactions between qubits. We demonstrate an O(N^3) improvement over previous work conducted on an NMR computer [PRL 89 057904 (2002) & PRL 97 050504 (2006)] for the nearest neighbour and completely general cases. We then go on to show methods to minimise the number of operations needed per time step using the qubus in three cases; a completely general case, a case of exponentially decaying interactions and the case of fixed range interactions. We make these results controlled on an ancilla qubit so that we can apply the phase estimation algorithm, and hence show that when N \geq 5, our qubus simulation requires significantly less operations that a similar simulation conducted on an NMR computer.Comment: 20 pages, 10 figures: V2 added section on phase estimation and performing controlled unitaries, V3 corrected minor typo
    corecore