14,595 research outputs found
Infinite Probabilistic Databases
Probabilistic databases (PDBs) are used to model uncertainty in data in a quantitative way. In the standard formal framework, PDBs are finite probability spaces over relational database instances. It has been argued convincingly that this is not compatible with an open-world semantics (Ceylan et al., KR 2016) and with application scenarios that are modeled by continuous probability distributions (Dalvi et al., CACM 2009).
We recently introduced a model of PDBs as infinite probability spaces that addresses these issues (Grohe and Lindner, PODS 2019). While that work was mainly concerned with countably infinite probability spaces, our focus here is on uncountable spaces. Such an extension is necessary to model typical continuous probability distributions that appear in many applications. However, an extension beyond countable probability spaces raises nontrivial foundational issues concerned with the measurability of events and queries and ultimately with the question whether queries have a well-defined semantics.
It turns out that so-called finite point processes are the appropriate model from probability theory for dealing with probabilistic databases. This model allows us to construct suitable (uncountable) probability spaces of database instances in a systematic way. Our main technical results are measurability statements for relational algebra queries as well as aggregate queries and Datalog queries
Generalized h-index for Disclosing Latent Facts in Citation Networks
What is the value of a scientist and its impact upon the scientific thinking?
How can we measure the prestige of a journal or of a conference? The evaluation
of the scientific work of a scientist and the estimation of the quality of a
journal or conference has long attracted significant interest, due to the
benefits from obtaining an unbiased and fair criterion. Although it appears to
be simple, defining a quality metric is not an easy task. To overcome the
disadvantages of the present metrics used for ranking scientists and journals,
J.E. Hirsch proposed a pioneering metric, the now famous h-index. In this
article, we demonstrate several inefficiencies of this index and develop a pair
of generalizations and effective variants of it to deal with scientist ranking
and with publication forum ranking. The new citation indices are able to
disclose trendsetters in scientific research, as well as researchers that
constantly shape their field with their influential work, no matter how old
they are. We exhibit the effectiveness and the benefits of the new indices to
unfold the full potential of the h-index, with extensive experimental results
obtained from DBLP, a widely known on-line digital library.Comment: 19 pages, 17 tables, 27 figure
An introduction to Graph Data Management
A graph database is a database where the data structures for the schema
and/or instances are modeled as a (labeled)(directed) graph or generalizations
of it, and where querying is expressed by graph-oriented operations and type
constructors. In this article we present the basic notions of graph databases,
give an historical overview of its main development, and study the main current
systems that implement them
Big Data Visualization Tools
Data visualization is the presentation of data in a pictorial or graphical
format, and a data visualization tool is the software that generates this
presentation. Data visualization provides users with intuitive means to
interactively explore and analyze data, enabling them to effectively identify
interesting patterns, infer correlations and causalities, and supports
sense-making activities.Comment: This article appears in Encyclopedia of Big Data Technologies,
Springer, 201
- …