Search CORE

211 research outputs found

On the need for explicit confidence assessments of flexible query answers

Author: D Dubois
EF Codd
M Richardson
S de F. Mendes Sampaio
S Destercke
S Destercke
V Lancker Van
Publication venue: 'Springer Science and Business Media LLC'
Publication date: 01/01/2017
Field of study

Crossref

Ghent University Academic Bibliography

5-State Rotation-Symmetric Number-Conserving Cellular Automata are not Strongly Universal

Author: A Moreira
B Durand
EF Codd
J Neumann von
K Nagel
K Zuse
N Boccara
N Tanimoto
N Tanimoto
T Serizawa
Publication venue: 'Springer Science and Business Media LLC'
Publication date: 07/07/2014
Field of study

We study two-dimensional rotation-symmetric number-conserving cellular automata working on the von Neumann neighborhood (RNCA). It is known that such automata with 4 states or less are trivial, so we investigate the possible rules with 5 states. We give a full characterization of these automata and show that they cannot be strongly Turing universal. However, we give example of constructions that allow to embed some boolean circuit elements in a 5-states RNCA

arXiv.org e-Print Archive

Crossref

HAL Descartes

Hal-Diderot

The Database for Aggregate Analysis of ClinicalTrials.gov (AACT) and Subsequent Regrouping by Clinical Specialty

Author: Asba Tasneem
Brian J. McCourt
CD DeAngelis
DA Zarin
DA Zarin
EF Codd
EF Codd
FA Thiers
Hari Ananth
Joel Joseph Gagnier
Karen Chiswell
Laura Aberle
Ricardo Pietrobon
Swati Chakraborty
Publication venue: Public Library of Science
Publication date: 16/03/2012
Field of study

BACKGROUND: The ClinicalTrials.gov registry provides information regarding characteristics of past, current, and planned clinical studies to patients, clinicians, and researchers; in addition, registry data are available for bulk download. However, issues related to data structure, nomenclature, and changes in data collection over time present challenges to the aggregate analysis and interpretation of these data in general and to the analysis of trials according to clinical specialty in particular. Improving usability of these data could enhance the utility of ClinicalTrials.gov as a research resource. METHODS/PRINCIPAL RESULTS: The purpose of our project was twofold. First, we sought to extend the usability of ClinicalTrials.gov for research purposes by developing a database for aggregate analysis of ClinicalTrials.gov (AACT) that contains data from the 96,346 clinical trials registered as of September 27, 2010. Second, we developed and validated a methodology for annotating studies by clinical specialty, using a custom taxonomy employing Medical Subject Heading (MeSH) terms applied by an NLM algorithm, as well as MeSH terms and other disease condition terms provided by study sponsors. Clinical specialists reviewed and annotated MeSH and non-MeSH disease condition terms, and an algorithm was created to classify studies into clinical specialties based on both MeSH and non-MeSH annotations. False positives and false negatives were evaluated by comparing algorithmic classification with manual classification for three specialties. CONCLUSIONS/SIGNIFICANCE: The resulting AACT database features study design attributes parsed into discrete fields, integrated metadata, and an integrated MeSH thesaurus, and is available for download as Oracle extracts (.dmp file and text format). This publicly-accessible dataset will facilitate analysis of studies and permit detailed characterization and analysis of the U.S. clinical trials enterprise as a whole. In addition, the methodology we present for creating specialty datasets may facilitate other efforts to analyze studies by specialty groups

Public Library of Science (PLOS)

Crossref

Directory of Open Access Journals

PubMed Central

FigShare

Keeping Data Inter-related in a Blockchain

Author: D Ioannidis
EF Codd
PP-S Chen
Publication venue: 'Springer Science and Business Media LLC'
Publication date: 23/05/2019
Field of study

Crossref

Explore Bristol Research

Relational lattices via duality

Author: A Kurucz
André Joyal
BA Davey
EF Codd
FW Lawvere
JB Nation
N Ackerman
R Freese
R Hirsch
R Hirsch
S Priess-Crampe
Tadeusz Litak
Publication venue
Publication date: 06/01/2016
Field of study

The natural join and the inner union combine in different ways tables of a relational database. Tropashko [18] observed that these two operations are the meet and join in a class of lattices-called the relational lattices- and proposed lattice theory as an alternative algebraic approach to databases. Aiming at query optimization, Litak et al. [12] initiated the study of the equational theory of these lattices. We carry on with this project, making use of the duality theory developed in [16]. The contributions of this paper are as follows. Let A be a set of column's names and D be a set of cell values; we characterize the dual space of the relational lattice R(D, A) by means of a generalized ultrametric space, whose elements are the functions from A to D, with the P (A)-valued distance being the Hamming one but lifted to subsets of A. We use the dual space to present an equational axiomatization of these lattices that reflects the combinatorial properties of these generalized ultrametric spaces: symmetry and pairwise completeness. Finally, we argue that these equations correspond to combinatorial properties of the dual spaces of lattices, in a technical sense analogous of correspondence theory in modal logic. In particular, this leads to an exact characterization of the finite lattices satisfying these equations.Comment: Coalgebraic Methods in Computer Science 2016, Apr 2016, Eindhoven, Netherland

arXiv.org e-Print Archive

CiteSeerX

Crossref

HAL AMU

Search extension transforms Wiki into a relational system: A case for flavonoid metabolite database

Author: B Mons
BA Bohm
EF Codd
J Giles
Kazuhiro Suwa
Masanori Arita
R Ierusalimschy
SL Salzberg
T Tokimatsu
Y Shinbo
Publication venue: BioMed Central
Publication date: 01/01/2008
Field of study

Abstract Background In computer science, database systems are based on the relational model founded by Edgar Codd in 1970. On the other hand, in the area of biology the word 'database' often refers to loosely formatted, very large text files. Although such bio-databases may describe conflicts or ambiguities (e.g. a protein pair do and do not interact, or unknown parameters) in a positive sense, the flexibility of the data format sacrifices a systematic query mechanism equivalent to the widely used SQL. Results To overcome this disadvantage, we propose embeddable string-search commands on a Wiki-based system and designed a half-formatted database. As proof of principle, a database of flavonoid with 6902 molecular structures from over 1687 plant species was implemented on MediaWiki, the background system of Wikipedia. Registered users can describe any information in an arbitrary format. Structured part is subject to text-string searches to realize relational operations. The system was written in PHP language as the extension of MediaWiki. All modifications are open-source and publicly available. Conclusion This scheme benefits from both the free-formatted Wiki style and the concise and structured relational-database style. MediaWiki supports multi-user environments for document management, and the cost for database maintenance is alleviated.</p

Crossref

Springer - Publisher Connector

Directory of Open Access Journals

PubMed Central

Extraction of fact tables from a relational database: an effort to establish rules in denormalization

Author: A Darwiche
A Kingdon
AF Cardenas
CJ Date
EF Codd
J Pearl
JA Wald
JH Ter-Bekke
PN Ramos
R Kimball
SJ Russell
X Wu
Publication venue: 'Springer Science and Business Media LLC'
Publication date: 01/01/2019
Field of study

Relational databases are supported by very well established models. However, some neglected problems can occur with the join operator: semantic mistakes caused by the multiple access path problem and faults when connection traps arise. In this paper we intend to identify and overcome those problems and to establish rules for relational data denormalization. Two denormalization forms are proposed and a case study is presented.info:eu-repo/semantics/publishedVersio

Crossref

Repositório Aberto da Universidade Aberta

Organizing research data

Author: C Churcher
D Kleppner
EF Codd
International Nucleotide Sequence Database Collaboration
J Gray
National Academy of Sciences
Peter Sestoft
R Snodgrass
R Wilcke
S Brunak
S Lippert
Publication venue: BioMed Central
Publication date: 01/01/2011
Field of study

Research relies on ever larger amounts of data from experiments, automated production equipment, questionnaries, times series such as weather records, and so on. A major task in science is to combine, process and analyse such data to obtain evidence of patterns and correlations

Crossref

Springer - Publisher Connector

PubMed Central

The IT University of Copenhagen's Repository

Implementing a transcription factor interaction prediction system using the genometric query language

Author: A Jankowski
A Kaitoua
EF Codd
M Masseroli
M Masseroli
N Geisel
R Batuwita
SG Landt
Publication venue: 'Springer Science and Business Media LLC'
Publication date: 01/01/2018
Field of study

Novel technologies and growing interest have resulted in a large increase in the amount of data available for genomics and transcriptomics studies, both in terms of volume and contents. Biology is relying more and more on computational methods to process, investigate, and extract knowledge from this huge amount of data. In this work, we present the TICA web server (available at http://www.gmql.eu/tica/), a fast and compact tool developed to support data-driven knowledge discovery in the realm of transcription factor interaction prediction. TICA leverages both the GenoMetric Query Language, a novel query tool (based on the Apache Hadoop and Spark technologies) specialized in the integration and management of heterogeneous, large genomic datasets, and a statistical method for robust detection of co-locations across interval-based data, in order to infer physically interacting transcription factors. Notably, TICA allows investigators to upload and analyze their own ChIP-seq experiments datasets, comparing them both against ENCODE data or between themselves, achieving computation time which increases linearly with respect to dataset size and density. Using ENCODE data from three well-studied cell lines as reference, we show that TICA predictions are supported by existing biological knowledge, making the web server a reliable and efficient tool for interaction screening and data-driven hypothesis generation

Archivio istituzionale della ricerca - Politecnico di Milano

Crossref

Model inference for spreadsheets

Author: D Maier
E Visser
EF Codd
J Cunha
J Cunha
JD Ullman
Jorge Mendes
João Saraiva
Jácome Cunha
M Erwig
M Höst
Martin Erwig
R Alhajj
SG Powell
T Cheng
T Connolly
T Isakowitz
Publication venue: 'Springer Science and Business Media LLC'
Publication date: 01/01/2016
Field of study

Many errors in spreadsheet formulas can be avoided if spreadsheets are built automati- cally from higher-level models that can encode and enforce consistency constraints in the generated spreadsheets. Employing this strategy for legacy spreadsheets is dificult, because the model has to be reverse engineered from an existing spreadsheet and existing data must be transferred into the new model-generated spreadsheet. We have developed and implemented a technique that automatically infers relational schemas from spreadsheets. This technique uses particularities from the spreadsheet realm to create better schemas. We have evaluated this technique in two ways: First, we have demonstrated its appli- cability by using it on a set of real-world spreadsheets. Second, we have run an empirical study with users. The study has shown that the results produced by our technique are comparable to the ones developed by experts starting from the same (legacy) spreadsheet data. Although relational schemas are very useful to model data, they do not t well spreadsheets as they do not allow to express layout. Thus, we have also introduced a mapping between relational schemas and ClassSheets. A ClassSheet controls further changes to the spreadsheet and safeguards it against a large class of formula errors. The developed tool is a contribution to spreadsheet (reverse) engineering, because it lls an important gap and allows a promising design method (ClassSheets) to be applied to a huge collection of legacy spreadsheets with minimal effort.We would like to thank Orlando Belo for his help on running and analyzing the empirical study. We would also like to thank Paulo Azevedo for his help in conducting the statistical analysis of our empirical study. We would also like to thank the anonymous reviewers for their suggestions which helped us to improve the paper. This work is funded by ERDF - European Regional Development Fund through the COMPETE Programme (operational programme for competitiveness) and by National Funds through the FCT - Fundacao para a Ciencia e a Tecnologia (Portuguese Foundation for Science and Technology) within project FCOMP-01-0124-FEDER-010048. The first author was also supported by FCT grant SFRH/BPD/73358/2010

Universidade do Minho: RepositoriUM

Crossref