52,728 research outputs found
Static and dynamic semantics of NoSQL languages
We present a calculus for processing semistructured data that spans
differences of application area among several novel query languages, broadly
categorized as "NoSQL". This calculus lets users define their own operators,
capturing a wider range of data processing capabilities, whilst providing a
typing precision so far typical only of primitive hard-coded operators. The
type inference algorithm is based on semantic type checking, resulting in type
information that is both precise, and flexible enough to handle structured and
semistructured data. We illustrate the use of this calculus by encoding a large
fragment of Jaql, including operations and iterators over JSON, embedded SQL
expressions, and co-grouping, and show how the encoding directly yields a
typing discipline for Jaql as it is, namely without the addition of any type
definition or type annotation in the code
Multivariate Hawkes Processes for Large-scale Inference
In this paper, we present a framework for fitting multivariate Hawkes
processes for large-scale problems both in the number of events in the observed
history and the number of event types (i.e. dimensions). The proposed
Low-Rank Hawkes Process (LRHP) framework introduces a low-rank approximation of
the kernel matrix that allows to perform the nonparametric learning of the
triggering kernels using at most operations, where is the
rank of the approximation (). This comes as a major improvement to
the existing state-of-the-art inference algorithms that are in .
Furthermore, the low-rank approximation allows LRHP to learn representative
patterns of interaction between event types, which may be valuable for the
analysis of such complex processes in real world datasets. The efficiency and
scalability of our approach is illustrated with numerical experiments on
simulated as well as real datasets.Comment: 16 pages, 5 figure
Brownian distance covariance
Distance correlation is a new class of multivariate dependence coefficients
applicable to random vectors of arbitrary and not necessarily equal dimension.
Distance covariance and distance correlation are analogous to product-moment
covariance and correlation, but generalize and extend these classical bivariate
measures of dependence. Distance correlation characterizes independence: it is
zero if and only if the random vectors are independent. The notion of
covariance with respect to a stochastic process is introduced, and it is shown
that population distance covariance coincides with the covariance with respect
to Brownian motion; thus, both can be called Brownian distance covariance. In
the bivariate case, Brownian covariance is the natural extension of
product-moment covariance, as we obtain Pearson product-moment covariance by
replacing the Brownian motion in the definition with identity. The
corresponding statistic has an elegantly simple computing formula. Advantages
of applying Brownian covariance and correlation vs the classical Pearson
covariance and correlation are discussed and illustrated.Comment: This paper discussed in: [arXiv:0912.3295], [arXiv:1010.0822],
[arXiv:1010.0825], [arXiv:1010.0828], [arXiv:1010.0836], [arXiv:1010.0838],
[arXiv:1010.0839]. Rejoinder at [arXiv:1010.0844]. Published in at
http://dx.doi.org/10.1214/09-AOAS312 the Annals of Applied Statistics
(http://www.imstat.org/aoas/) by the Institute of Mathematical Statistics
(http://www.imstat.org
Forecasting of commercial sales with large scale Gaussian Processes
This paper argues that there has not been enough discussion in the field of
applications of Gaussian Process for the fast moving consumer goods industry.
Yet, this technique can be important as it e.g., can provide automatic feature
relevance determination and the posterior mean can unlock insights on the data.
Significant challenges are the large size and high dimensionality of commercial
data at a point of sale. The study reviews approaches in the Gaussian Processes
modeling for large data sets, evaluates their performance on commercial sales
and shows value of this type of models as a decision-making tool for
management.Comment: 1o pages, 5 figure
Statistical inference framework for source detection of contagion processes on arbitrary network structures
In this paper we introduce a statistical inference framework for estimating
the contagion source from a partially observed contagion spreading process on
an arbitrary network structure. The framework is based on a maximum likelihood
estimation of a partial epidemic realization and involves large scale
simulation of contagion spreading processes from the set of potential source
locations. We present a number of different likelihood estimators that are used
to determine the conditional probabilities associated to observing partial
epidemic realization with particular source location candidates. This
statistical inference framework is also applicable for arbitrary compartment
contagion spreading processes on networks. We compare estimation accuracy of
these approaches in a number of computational experiments performed with the
SIR (susceptible-infected-recovered), SI (susceptible-infected) and ISS
(ignorant-spreading-stifler) contagion spreading models on synthetic and
real-world complex networks
- …