19 research outputs found
Compact q-gram Profiling of Compressed Strings
We consider the problem of computing the q-gram profile of a string \str of
size compressed by a context-free grammar with production rules. We
present an algorithm that runs in expected time and uses
O(n+q+\kq) space, where is the exact number of characters
decompressed by the algorithm and \kq\leq N-\alpha is the number of distinct
q-grams in \str. This simultaneously matches the current best known time
bound and improves the best known space bound. Our space bound is
asymptotically optimal in the sense that any algorithm storing the grammar and
the q-gram profile must use \Omega(n+q+\kq) space. To achieve this we
introduce the q-gram graph that space-efficiently captures the structure of a
string with respect to its q-grams, and show how to construct it from a
grammar
Forecasting with Big Data: A Review
Big Data is a revolutionary phenomenon which is one of the most frequently discussed topics in the modern age, and is expected to remain so in the foreseeable future. In this paper we present a comprehensive review on the use of Big Data for forecasting by identifying and reviewing the problems, potential, challenges and most importantly the related applications. Skills, hardware and software, algorithm architecture, statistical significance, the signal to noise ratio and the nature of Big Data itself are identified as the major challenges which are hindering the process of obtaining meaningful forecasts from Big Data. The review finds that at present, the fields of Economics, Energy and Population Dynamics have been the major exploiters of Big Data forecasting whilst Factor models, Bayesian models and Neural Networks are the most common tools adopted for forecasting with Big Data
From recombination of genes to the estimation of distributions I. binary parameters
The Breeder Genetic Algorithm (BGA) is based on the equation for the response to selection. In order to use this equation for prediction, the variance of the fitness of the population has to be estimated. For the usual sexual recombination the computation can be difficult. In this paper we shortly state the problem and investigate several modifications of sexual recombination. The first method is gene pool recombination, which leads to marginal distribution algorithms. In the last part of the paper we discuss more sophisticated methods, based on estimating the distribution of promising points
Semantic relation extraction with kernels over typed dependency trees
An important step for understanding the semantic content of text is the extraction of semantic relations between entities in natural language documents. Automatic extraction techniques have to be able to identify different versions of the same relation which usually may be expressed in a great variety of ways. Therefore these techniques benefit from taking into account many syntactic and semantic features, especially parse trees generated by automatic sentence parsers. Typed dependency parse trees are edge and node labeled parse trees whose labels and topology contains valuable semantic clues. This information can be exploited for relation extraction by the use of kernels over structured data for classification. In this paper we present new tree kernels for relation extraction over typed dependency parse trees. On a public benchmark data set we are able to demonstrate a significant improvement in terms of relation extraction quality of our new kernels over other state-of-the-art kernels