19 research outputs found

    Compact q-gram Profiling of Compressed Strings

    Get PDF
    We consider the problem of computing the q-gram profile of a string \str of size NN compressed by a context-free grammar with nn production rules. We present an algorithm that runs in O(Nα)O(N-\alpha) expected time and uses O(n+q+\kq) space, where NαqnN-\alpha\leq qn is the exact number of characters decompressed by the algorithm and \kq\leq N-\alpha is the number of distinct q-grams in \str. This simultaneously matches the current best known time bound and improves the best known space bound. Our space bound is asymptotically optimal in the sense that any algorithm storing the grammar and the q-gram profile must use \Omega(n+q+\kq) space. To achieve this we introduce the q-gram graph that space-efficiently captures the structure of a string with respect to its q-grams, and show how to construct it from a grammar

    Forecasting with Big Data: A Review

    Get PDF
    Big Data is a revolutionary phenomenon which is one of the most frequently discussed topics in the modern age, and is expected to remain so in the foreseeable future. In this paper we present a comprehensive review on the use of Big Data for forecasting by identifying and reviewing the problems, potential, challenges and most importantly the related applications. Skills, hardware and software, algorithm architecture, statistical significance, the signal to noise ratio and the nature of Big Data itself are identified as the major challenges which are hindering the process of obtaining meaningful forecasts from Big Data. The review finds that at present, the fields of Economics, Energy and Population Dynamics have been the major exploiters of Big Data forecasting whilst Factor models, Bayesian models and Neural Networks are the most common tools adopted for forecasting with Big Data

    From recombination of genes to the estimation of distributions I. binary parameters

    No full text
    The Breeder Genetic Algorithm (BGA) is based on the equation for the response to selection. In order to use this equation for prediction, the variance of the fitness of the population has to be estimated. For the usual sexual recombination the computation can be difficult. In this paper we shortly state the problem and investigate several modifications of sexual recombination. The first method is gene pool recombination, which leads to marginal distribution algorithms. In the last part of the paper we discuss more sophisticated methods, based on estimating the distribution of promising points

    Semantic relation extraction with kernels over typed dependency trees

    No full text
    An important step for understanding the semantic content of text is the extraction of semantic relations between entities in natural language documents. Automatic extraction techniques have to be able to identify different versions of the same relation which usually may be expressed in a great variety of ways. Therefore these techniques benefit from taking into account many syntactic and semantic features, especially parse trees generated by automatic sentence parsers. Typed dependency parse trees are edge and node labeled parse trees whose labels and topology contains valuable semantic clues. This information can be exploited for relation extraction by the use of kernels over structured data for classification. In this paper we present new tree kernels for relation extraction over typed dependency parse trees. On a public benchmark data set we are able to demonstrate a significant improvement in terms of relation extraction quality of our new kernels over other state-of-the-art kernels
    corecore