2,870,051 research outputs found
Structured count data regression
Overdispersion in count data regression is often caused by neglection or inappropriate modelling of individual heterogeneity, temporal or spatial correlation, and nonlinear covariate effects. In this paper, we develop and study semiparametric count data models which can deal with these issues by incorporating corresponding components in structured additive form into the predictor. The models are fully Bayesian and inference is carried out by computationally efficient MCMC techniques. In a simulation study, we investigate how well the different components can be identified with the data at hand. The approach is applied to a large data set of claim frequencies from car insurance
Licensing structured data with ease
In response to the need of a rights expression language (REL), we have proposed LicenseScript, an REL based on multiset rewriting and Prolog. LicenseScript has advantage over existing RELs, in the sense that it has a well-defined semantics. In fact besides semantics, LicenseScript has a lot of other advantages over other RELs. The mission of this paper is twofold: (1) to put a spotlight on these advantages, (2) at the same time justifying some of our design rationales in LicenseScript. We accomplish this by giving examples of licensing models that are greatly facilitated by the use of Prolog as a component of LicenseScript. At the same time showing\ud
how LicenseScript makes these non-trivial models viable, we also make LicenseScript a stronger case than it previously might have occurred to be
Improving Entity Retrieval on Structured Data
The increasing amount of data on the Web, in particular of Linked Data, has
led to a diverse landscape of datasets, which make entity retrieval a
challenging task. Explicit cross-dataset links, for instance to indicate
co-references or related entities can significantly improve entity retrieval.
However, only a small fraction of entities are interlinked through explicit
statements. In this paper, we propose a two-fold entity retrieval approach. In
a first, offline preprocessing step, we cluster entities based on the
\emph{x--means} and \emph{spectral} clustering algorithms. In the second step,
we propose an optimized retrieval model which takes advantage of our
precomputed clusters. For a given set of entities retrieved by the BM25F
retrieval approach and a given user query, we further expand the result set
with relevant entities by considering features of the queries, entities and the
precomputed clusters. Finally, we re-rank the expanded result set with respect
to the relevance to the query. We perform a thorough experimental evaluation on
the Billions Triple Challenge (BTC12) dataset. The proposed approach shows
significant improvements compared to the baseline and state of the art
approaches
- …
