2,870,051 research outputs found

    Structured count data regression

    Get PDF
    Overdispersion in count data regression is often caused by neglection or inappropriate modelling of individual heterogeneity, temporal or spatial correlation, and nonlinear covariate effects. In this paper, we develop and study semiparametric count data models which can deal with these issues by incorporating corresponding components in structured additive form into the predictor. The models are fully Bayesian and inference is carried out by computationally efficient MCMC techniques. In a simulation study, we investigate how well the different components can be identified with the data at hand. The approach is applied to a large data set of claim frequencies from car insurance

    Licensing structured data with ease

    Get PDF
    In response to the need of a rights expression language (REL), we have proposed LicenseScript, an REL based on multiset rewriting and Prolog. LicenseScript has advantage over existing RELs, in the sense that it has a well-defined semantics. In fact besides semantics, LicenseScript has a lot of other advantages over other RELs. The mission of this paper is twofold: (1) to put a spotlight on these advantages, (2) at the same time justifying some of our design rationales in LicenseScript. We accomplish this by giving examples of licensing models that are greatly facilitated by the use of Prolog as a component of LicenseScript. At the same time showing\ud how LicenseScript makes these non-trivial models viable, we also make LicenseScript a stronger case than it previously might have occurred to be

    Improving Entity Retrieval on Structured Data

    Full text link
    The increasing amount of data on the Web, in particular of Linked Data, has led to a diverse landscape of datasets, which make entity retrieval a challenging task. Explicit cross-dataset links, for instance to indicate co-references or related entities can significantly improve entity retrieval. However, only a small fraction of entities are interlinked through explicit statements. In this paper, we propose a two-fold entity retrieval approach. In a first, offline preprocessing step, we cluster entities based on the \emph{x--means} and \emph{spectral} clustering algorithms. In the second step, we propose an optimized retrieval model which takes advantage of our precomputed clusters. For a given set of entities retrieved by the BM25F retrieval approach and a given user query, we further expand the result set with relevant entities by considering features of the queries, entities and the precomputed clusters. Finally, we re-rank the expanded result set with respect to the relevance to the query. We perform a thorough experimental evaluation on the Billions Triple Challenge (BTC12) dataset. The proposed approach shows significant improvements compared to the baseline and state of the art approaches
    corecore