Search CORE

52,728 research outputs found

Static and dynamic semantics of NoSQL languages

Author: Giuseppe Castagna
Jérôme Siméon
K.
Kim Nguyen
Martens W.
Nguyen K.
Tannen V.
Véronique Benzaken
Publication venue: 'Association for Computing Machinery (ACM)'
Publication date: 01/01/2013
Field of study

We present a calculus for processing semistructured data that spans differences of application area among several novel query languages, broadly categorized as "NoSQL". This calculus lets users define their own operators, capturing a wider range of data processing capabilities, whilst providing a typing precision so far typical only of primitive hard-coded operators. The type inference algorithm is based on semantic type checking, resulting in type information that is both precise, and flexible enough to handle structured and semistructured data. We illustrate the use of this calculus by encoding a large fragment of Jaql, including operations and iterators over JSON, embedded SQL expressions, and co-grouping, and show how the encoding directly yields a typing discipline for Jaql as it is, namely without the addition of any type definition or type annotation in the code

arXiv.org e-Print Archive

HAL-CentraleSupelec

Crossref

Hal-Diderot

Multivariate Hawkes Processes for Large-scale Inference

Author: Kalogeratos Argyris
Lemonnier Rémi
Scaman Kevin
Publication venue
Publication date: 26/02/2016
Field of study

In this paper, we present a framework for fitting multivariate Hawkes processes for large-scale problems both in the number of events in the observed history

n

and the number of event types

d

(i.e. dimensions). The proposed Low-Rank Hawkes Process (LRHP) framework introduces a low-rank approximation of the kernel matrix that allows to perform the nonparametric learning of the

d^2

triggering kernels using at most

O(ndr^2)

operations, where

r

is the rank of the approximation (

r \ll d,n

). This comes as a major improvement to the existing state-of-the-art inference algorithms that are in

O(nd^2)

. Furthermore, the low-rank approximation allows LRHP to learn representative patterns of interaction between event types, which may be valuable for the analysis of such complex processes in real world datasets. The efficiency and scalability of our approach is illustrated with numerical experiments on simulated as well as real datasets.Comment: 16 pages, 5 figure

arXiv.org e-Print Archive

Association for the Advancement of Artificial Intelligence: AAAI Publications

Brownian distance covariance

Author: J. Székely
L. Rizzo
Maria
Publication venue: 'Institute of Mathematical Statistics'
Publication date: 06/10/2010
Field of study

Distance correlation is a new class of multivariate dependence coefficients applicable to random vectors of arbitrary and not necessarily equal dimension. Distance covariance and distance correlation are analogous to product-moment covariance and correlation, but generalize and extend these classical bivariate measures of dependence. Distance correlation characterizes independence: it is zero if and only if the random vectors are independent. The notion of covariance with respect to a stochastic process is introduced, and it is shown that population distance covariance coincides with the covariance with respect to Brownian motion; thus, both can be called Brownian distance covariance. In the bivariate case, Brownian covariance is the natural extension of product-moment covariance, as we obtain Pearson product-moment covariance by replacing the Brownian motion in the definition with identity. The corresponding statistic has an elegantly simple computing formula. Advantages of applying Brownian covariance and correlation vs the classical Pearson covariance and correlation are discussed and illustrated.Comment: This paper discussed in: [arXiv:0912.3295], [arXiv:1010.0822], [arXiv:1010.0825], [arXiv:1010.0828], [arXiv:1010.0836], [arXiv:1010.0838], [arXiv:1010.0839]. Rejoinder at [arXiv:1010.0844]. Published in at http://dx.doi.org/10.1214/09-AOAS312 the Annals of Applied Statistics (http://www.imstat.org/aoas/) by the Institute of Mathematical Statistics (http://www.imstat.org

arXiv.org e-Print Archive

CiteSeerX

Forecasting of commercial sales with large scale Gaussian Processes

Author: Carmen Marsit (334042)
Jia Chen (8203)
Ke Hao (50181)
Luca Lambertini (72724)
Maya Deyssenroth (4238833)
Shouneng Peng (493132)
Publication venue
Publication date: 01/01/2017
Field of study

This paper argues that there has not been enough discussion in the field of applications of Gaussian Process for the fast moving consumer goods industry. Yet, this technique can be important as it e.g., can provide automatic feature relevance determination and the posterior mean can unlock insights on the data. Significant challenges are the large size and high dimensionality of commercial data at a point of sale. The study reviews approaches in the Gaussian Processes modeling for large data sets, evaluates their performance on commercial sales and shows value of this type of models as a decision-making tool for management.Comment: 1o pages, 5 figure

arXiv.org e-Print Archive

Crossref

FigShare

Statistical inference framework for source detection of contagion processes on arbitrary network structures

Author: Antulov-Fantulin Nino
Lancic Alen
Sikic Mile
Smuc Tomislav
Stefancic Hrvoje
Publication venue: 'Institute of Electrical and Electronics Engineers (IEEE)'
Publication date: 29/03/2013
Field of study

In this paper we introduce a statistical inference framework for estimating the contagion source from a partially observed contagion spreading process on an arbitrary network structure. The framework is based on a maximum likelihood estimation of a partial epidemic realization and involves large scale simulation of contagion spreading processes from the set of potential source locations. We present a number of different likelihood estimators that are used to determine the conditional probabilities associated to observing partial epidemic realization with particular source location candidates. This statistical inference framework is also applicable for arbitrary compartment contagion spreading processes on networks. We compare estimation accuracy of these approaches in a number of computational experiments performed with the SIR (susceptible-infected-recovered), SI (susceptible-infected) and ISS (ignorant-spreading-stifler) contagion spreading models on synthetic and real-world complex networks

arXiv.org e-Print Archive

Crossref

Full-text Institutional Repository of the Ruđer Bošković Institute