211 research outputs found

    5-State Rotation-Symmetric Number-Conserving Cellular Automata are not Strongly Universal

    Full text link
    We study two-dimensional rotation-symmetric number-conserving cellular automata working on the von Neumann neighborhood (RNCA). It is known that such automata with 4 states or less are trivial, so we investigate the possible rules with 5 states. We give a full characterization of these automata and show that they cannot be strongly Turing universal. However, we give example of constructions that allow to embed some boolean circuit elements in a 5-states RNCA

    The Database for Aggregate Analysis of ClinicalTrials.gov (AACT) and Subsequent Regrouping by Clinical Specialty

    Get PDF
    BACKGROUND: The ClinicalTrials.gov registry provides information regarding characteristics of past, current, and planned clinical studies to patients, clinicians, and researchers; in addition, registry data are available for bulk download. However, issues related to data structure, nomenclature, and changes in data collection over time present challenges to the aggregate analysis and interpretation of these data in general and to the analysis of trials according to clinical specialty in particular. Improving usability of these data could enhance the utility of ClinicalTrials.gov as a research resource. METHODS/PRINCIPAL RESULTS: The purpose of our project was twofold. First, we sought to extend the usability of ClinicalTrials.gov for research purposes by developing a database for aggregate analysis of ClinicalTrials.gov (AACT) that contains data from the 96,346 clinical trials registered as of September 27, 2010. Second, we developed and validated a methodology for annotating studies by clinical specialty, using a custom taxonomy employing Medical Subject Heading (MeSH) terms applied by an NLM algorithm, as well as MeSH terms and other disease condition terms provided by study sponsors. Clinical specialists reviewed and annotated MeSH and non-MeSH disease condition terms, and an algorithm was created to classify studies into clinical specialties based on both MeSH and non-MeSH annotations. False positives and false negatives were evaluated by comparing algorithmic classification with manual classification for three specialties. CONCLUSIONS/SIGNIFICANCE: The resulting AACT database features study design attributes parsed into discrete fields, integrated metadata, and an integrated MeSH thesaurus, and is available for download as Oracle extracts (.dmp file and text format). This publicly-accessible dataset will facilitate analysis of studies and permit detailed characterization and analysis of the U.S. clinical trials enterprise as a whole. In addition, the methodology we present for creating specialty datasets may facilitate other efforts to analyze studies by specialty groups

    Keeping Data Inter-related in a Blockchain

    Get PDF

    Relational lattices via duality

    Full text link
    The natural join and the inner union combine in different ways tables of a relational database. Tropashko [18] observed that these two operations are the meet and join in a class of lattices-called the relational lattices- and proposed lattice theory as an alternative algebraic approach to databases. Aiming at query optimization, Litak et al. [12] initiated the study of the equational theory of these lattices. We carry on with this project, making use of the duality theory developed in [16]. The contributions of this paper are as follows. Let A be a set of column's names and D be a set of cell values; we characterize the dual space of the relational lattice R(D, A) by means of a generalized ultrametric space, whose elements are the functions from A to D, with the P (A)-valued distance being the Hamming one but lifted to subsets of A. We use the dual space to present an equational axiomatization of these lattices that reflects the combinatorial properties of these generalized ultrametric spaces: symmetry and pairwise completeness. Finally, we argue that these equations correspond to combinatorial properties of the dual spaces of lattices, in a technical sense analogous of correspondence theory in modal logic. In particular, this leads to an exact characterization of the finite lattices satisfying these equations.Comment: Coalgebraic Methods in Computer Science 2016, Apr 2016, Eindhoven, Netherland

    Search extension transforms Wiki into a relational system: A case for flavonoid metabolite database

    Get PDF
    <p>Abstract</p> <p>Background</p> <p>In computer science, database systems are based on the relational model founded by Edgar Codd in 1970. On the other hand, in the area of biology the word 'database' often refers to loosely formatted, very large text files. Although such bio-databases may describe conflicts or ambiguities (e.g. a protein pair do and do not interact, or unknown parameters) in a positive sense, the flexibility of the data format sacrifices a systematic query mechanism equivalent to the widely used SQL.</p> <p>Results</p> <p>To overcome this disadvantage, we propose embeddable string-search commands on a Wiki-based system and designed a half-formatted database. As proof of principle, a database of flavonoid with 6902 molecular structures from over 1687 plant species was implemented on MediaWiki, the background system of Wikipedia. Registered users can describe any information in an arbitrary format. Structured part is subject to text-string searches to realize relational operations. The system was written in PHP language as the extension of MediaWiki. All modifications are open-source and publicly available.</p> <p>Conclusion</p> <p>This scheme benefits from both the free-formatted Wiki style and the concise and structured relational-database style. MediaWiki supports multi-user environments for document management, and the cost for database maintenance is alleviated.</p

    Extraction of fact tables from a relational database: an effort to establish rules in denormalization

    Get PDF
    Relational databases are supported by very well established models. However, some neglected problems can occur with the join operator: semantic mistakes caused by the multiple access path problem and faults when connection traps arise. In this paper we intend to identify and overcome those problems and to establish rules for relational data denormalization. Two denormalization forms are proposed and a case study is presented.info:eu-repo/semantics/publishedVersio

    Organizing research data

    Get PDF
    Research relies on ever larger amounts of data from experiments, automated production equipment, questionnaries, times series such as weather records, and so on. A major task in science is to combine, process and analyse such data to obtain evidence of patterns and correlations

    Implementing a transcription factor interaction prediction system using the genometric query language

    Get PDF
    Novel technologies and growing interest have resulted in a large increase in the amount of data available for genomics and transcriptomics studies, both in terms of volume and contents. Biology is relying more and more on computational methods to process, investigate, and extract knowledge from this huge amount of data. In this work, we present the TICA web server (available at http://www.gmql.eu/tica/), a fast and compact tool developed to support data-driven knowledge discovery in the realm of transcription factor interaction prediction. TICA leverages both the GenoMetric Query Language, a novel query tool (based on the Apache Hadoop and Spark technologies) specialized in the integration and management of heterogeneous, large genomic datasets, and a statistical method for robust detection of co-locations across interval-based data, in order to infer physically interacting transcription factors. Notably, TICA allows investigators to upload and analyze their own ChIP-seq experiments datasets, comparing them both against ENCODE data or between themselves, achieving computation time which increases linearly with respect to dataset size and density. Using ENCODE data from three well-studied cell lines as reference, we show that TICA predictions are supported by existing biological knowledge, making the web server a reliable and efficient tool for interaction screening and data-driven hypothesis generation

    Model inference for spreadsheets

    Get PDF
    Many errors in spreadsheet formulas can be avoided if spreadsheets are built automati- cally from higher-level models that can encode and enforce consistency constraints in the generated spreadsheets. Employing this strategy for legacy spreadsheets is dificult, because the model has to be reverse engineered from an existing spreadsheet and existing data must be transferred into the new model-generated spreadsheet. We have developed and implemented a technique that automatically infers relational schemas from spreadsheets. This technique uses particularities from the spreadsheet realm to create better schemas. We have evaluated this technique in two ways: First, we have demonstrated its appli- cability by using it on a set of real-world spreadsheets. Second, we have run an empirical study with users. The study has shown that the results produced by our technique are comparable to the ones developed by experts starting from the same (legacy) spreadsheet data. Although relational schemas are very useful to model data, they do not t well spreadsheets as they do not allow to express layout. Thus, we have also introduced a mapping between relational schemas and ClassSheets. A ClassSheet controls further changes to the spreadsheet and safeguards it against a large class of formula errors. The developed tool is a contribution to spreadsheet (reverse) engineering, because it lls an important gap and allows a promising design method (ClassSheets) to be applied to a huge collection of legacy spreadsheets with minimal effort.We would like to thank Orlando Belo for his help on running and analyzing the empirical study. We would also like to thank Paulo Azevedo for his help in conducting the statistical analysis of our empirical study. We would also like to thank the anonymous reviewers for their suggestions which helped us to improve the paper. This work is funded by ERDF - European Regional Development Fund through the COMPETE Programme (operational programme for competitiveness) and by National Funds through the FCT - Fundacao para a Ciencia e a Tecnologia (Portuguese Foundation for Science and Technology) within project FCOMP-01-0124-FEDER-010048. The first author was also supported by FCT grant SFRH/BPD/73358/2010
    corecore