Search CORE

1,137,266 research outputs found

Believe It or Not: Adding Belief Annotations to Databases

Author: Balazinska Magdalena
Gatterbauer Wolfgang
Khoussainova Nodira
Suciu Dan
Publication venue
Publication date: 01/01/2008
Field of study

We propose a database model that allows users to annotate data with belief statements. Our motivation comes from scientific database applications where a community of users is working together to assemble, revise, and curate a shared data repository. As the community accumulates knowledge and the database content evolves over time, it may contain conflicting information and members can disagree on the information it should store. For example, Alice may believe that a tuple should be in the database, whereas Bob disagrees. He may also insert the reason why he thinks Alice believes the tuple should be in the database, and explain what he thinks the correct tuple should be instead. We propose a formal model for Belief Databases that interprets users' annotations as belief statements. These annotations can refer both to the base data and to other annotations. We give a formal semantics based on a fragment of multi-agent epistemic logic and define a query language over belief databases. We then prove a key technical result, stating that every belief database can be encoded as a canonical Kripke structure. We use this structure to describe a relational representation of belief databases, and give an algorithm for translating queries over the belief database into standard relational queries. Finally, we report early experimental results with our prototype implementation on synthetic data.Comment: 17 pages, 10 figure

arXiv.org e-Print Archive

CiteSeerX

Where do we go from here? Recording and analysing Roman coins from archaeological excavations

Author: Lockyear K.
Publication venue
Publication date: 01/11/2007
Field of study

The publication of English Heritage's guidelines for the analysis and publication of coins from excavations has not met with acceptance by the relevant specialists. This paper takes the opportunity to look back over what we have been doing, consider what the guidelines suggest, and makes recommendations as to where we could be going. In particular it argues that we should be making more of existing database technologies and the internet, and that the analysis of coins should be integrated with other aspects of the archaeological record. The paper is not a new set of guidelines, but is intended to stimulate debate

UCL Discovery

Providing a Realist Perspective on the eyeGENE Database System

Author: Werner Ceusters
Publication venue
Publication date: 01/01/2009
Field of study

One of the achievements of the eyeGENE Network is a repository of DNA samples of patients with inherited eye diseases and an associated database that tracks key elements of phenotype and genotype information for each patient. Although its database structure serves its direct research needs, eyeGENE has set a goal of enhancing this structure to become increasingly well integrated with medical information standards over time. This goal should be achieved by ensuring semantic interoperability with other information systems but without adopting the incoherencies and inconsistencies found in available biomedical standards. Therefore, eyeGENE’s current pragmatic perspective with focus on data and information, rather than what the information is about, should shift to a realism-based perspective that includes also the portion of reality described, and the competing opinions that clinicians may hold about it. An analysis of eyeGENE’s database structure and user interfaces suggests that such a transition is possible indeed

CiteSeerX

Crossref

Nature Precedings

Representation Independent Analytics Over Structured Data

Author: Chodpathumwan Yodsawalai
Fern Alan
Picado Jose
Sun Yizhou
Termehchy Arash
Publication venue
Publication date: 08/09/2014
Field of study

Database analytics algorithms leverage quantifiable structural properties of the data to predict interesting concepts and relationships. The same information, however, can be represented using many different structures and the structural properties observed over particular representations do not necessarily hold for alternative structures. Thus, there is no guarantee that current database analytics algorithms will still provide the correct insights, no matter what structures are chosen to organize the database. Because these algorithms tend to be highly effective over some choices of structure, such as that of the databases used to validate them, but not so effective with others, database analytics has largely remained the province of experts who can find the desired forms for these algorithms. We argue that in order to make database analytics usable, we should use or develop algorithms that are effective over a wide range of choices of structural organizations. We introduce the notion of representation independence, study its fundamental properties for a wide range of data analytics algorithms, and empirically analyze the amount of representation independence of some popular database analytics algorithms. Our results indicate that most algorithms are not generally representation independent and find the characteristics of more representation independent heuristics under certain representational shifts

arXiv.org e-Print Archive

CiteSeerX

uFLIP: Understanding Flash IO Patterns

Author: Bonnet Philippe
Bouganim Luc
Jónsson Björn
Publication venue
Publication date: 01/01/2009
Field of study

Does the advent of flash devices constitute a radical change for secondary storage? How should database systems adapt to this new form of secondary storage? Before we can answer these questions, we need to fully understand the performance characteristics of flash devices. More specifically, we want to establish what kind of IOs should be favored (or avoided) when designing algorithms and architectures for flash-based systems. In this paper, we focus on flash IO patterns, that capture relevant distribution of IOs in time and space, and our goal is to quantify their performance. We define uFLIP, a benchmark for measuring the response time of flash IO patterns. We also present a benchmarking methodology which takes into account the particular characteristics of flash devices. Finally, we present the results obtained by measuring eleven flash devices, and derive a set of design hints that should drive the development of flash-based systems on current devices.Comment: CIDR 200

arXiv.org e-Print Archive

CiteSeerX

INRIA a CCSD electronic archive server

Copenhagen University Research Information System

HAL Descartes

The IT University of Copenhagen's Repository

Hal-Diderot

HAL UVSQ

IEAD: A Novel One-Line Interface to Query Astronomical Science Archives

Author: Delmotte N.
Marco Lombardi
Publication venue: 'University of Chicago Press'
Publication date: 01/01/2012
Field of study

In this article I present IEAD, a new interface for astronomical science databases. It is based on a powerful, yet simple, syntax designed to completely abstract the user from the structure of the underlying database. The programming language chosen for its implementation, JavaScript, makes it possible to interact directly with the user and to provide real-time information on the parsing process, error messages, and the name resolution of targets; additionally, the same parsing engine is used for context-sensitive autocompletion. Ultimately, this product should significantly simplify the use of astronomical archives, inspire more advanced uses of them, and allow the user to focus on what scientific research to perform, instead of on how to instruct the computer to do it.Comment: 13 pages, PASP in pres

arXiv.org e-Print Archive

Crossref

AIR Universita degli studi di Milano

The EU-Directive on the Legal Protection of Databases and the Incentives to Update: An Economic Analysis

Author: Koboldt Christian
Publication venue
Publication date
Field of study

The database directive, initiated by the European Commission in 1992 and due to be finalised in the near future, establishes a two-tiered system of protection, amending copyright with a sui generis rule that grants protection against unfair extraction. The terms of protection are extended if the producter makes "substantial changes" to update the database. This paper analyses the incentive to update created by the database directive. In contrast to the usual findings of the literature on the incentive effects of intellectual property rights, we find that, although in most cases the incentives to update a database are insufficient from society's point of view, the possibility of extending the term of protection by making 'substantial changes' in the database may create an incentive for excessive updating. This leads to conclusions about what should be considered a substantial change -- Die in Datenbank-Direktive, deren endgültige Fassung in Kürze vorliegen wird, garantiert Datanbankproduzenten einen zweistufigen Schutz: Neben dem Urheberrecht existier ein sui generis Recht das vor unlauteren Auszügen schützt und dessen Schutzdauer sich verlängert, wenn der Produzent die Datenbank durch substantielle Änderungen aktualisiert. Dieses Papier befaßt sich mit den Anreizen zur Aktualisierung. Im Gegensatz zu den üblichen Anreizwirkungen von Rechten zum Schutz geistigen Eigentums ergibt sich hier ein Anreiz zu exzessiven Investitionen in die Aktualisierung von Datenbanken. Produzenten nehmen Aktualisierungen auch dann vor, wenn dies gesamtgesellschaftlich nicht wünschenswert ist. Aus dieser Erkenntnis ergeben sich Folgerungen für die Festlegung dessen, was als substantielle Änderung gelten sollte.Copyright,databases,updating

Research Papers in Economics

MIRACLE at GeoCLEF Query Parsing 2007: Extraction and Classification of Geographical Information

Author: Goñi Menoyo José Miguel
Lana Serrano Sara
Villena Román Julio
Publication venue: E.T.S.I. Telecomunicación (UPM)
Publication date: 01/01/2007
Field of study

This paper describes the participation of MIRACLE research consortium at the Query Parsing task of GeoCLEF 2007. Our system is composed of three main modules. First, the Named Geo-entity Identifier, whose objective is to perform the geo-entity identification and tagging, i.e., to extract the “where” component of the geographical query, should there be any. This module is based on a gazetteer built up from the Geonames geographical database and carries out a sequential process in three steps that consist on geo-entity recognition, geo-entity selection and query tagging. Then, the Query Analyzer parses this tagged query to identify the “what” and “geo-relation” components by means of a rule-based grammar. Finally, a two-level multiclassifier first decides whether the query is indeed a geographical query and, should it be positive, then determines the query type according to the type of information that the user is supposed to be looking for: map, yellow page or information. According to a strict evaluation criterion where a match should have all fields correct, our system reaches a precision value of 42.8% and a recall of 56.6% and our submission is ranked 1st out of 6 participants in the task. A detailed evaluation of the confusion matrixes reveal that some extra effort must be invested in “user-oriented” disambiguation techniques to improve the first level binary classifier for detecting geographical queries, as it is a key component to eliminate many false-positives

Archivo Digital UPM