147,766 research outputs found
Making an Embedded DBMS JIT-friendly
While database management systems (DBMSs) are highly optimized, interactions
across the boundary between the programming language (PL) and the DBMS are
costly, even for in-process embedded DBMSs. In this paper, we show that
programs that interact with the popular embedded DBMS SQLite can be
significantly optimized - by a factor of 3.4 in our benchmarks - by inlining
across the PL / DBMS boundary. We achieved this speed-up by replacing parts of
SQLite's C interpreter with RPython code and composing the resulting
meta-tracing virtual machine (VM) - called SQPyte - with the PyPy VM. SQPyte
does not compromise stand-alone SQL performance and is 2.2% faster than SQLite
on the widely used TPC-H benchmark suite.Comment: 24 pages, 18 figure
Using Links to prototype a Database Wiki
Both relational databases and wikis have strengths that make them attractive for use in collaborative applications. In the last decade, database-backed Web applications have been used extensively to develop valuable shared biological references called curated databases. Databases offer many advantages such as scalability, query optimization and concurrency control, but are not easy to use and lack other features needed for collaboration. Wikis have become very popular for early-stage biocuration projects because they are easy to use, encourage sharing and collaboration, and provide built-in support for archiving, history-tracking and annotation. However, curation projects often outgrow the limited capabilities of wikis for structuring and efficiently querying data at scale, necessitating a painful phase transition to a database-backed Web application. We perceive a need for a new class of general-purpose system, which we call a Database Wiki, that combines flexible wiki-like support for collaboration with robust database-like capabilities for structuring and querying data. This paper presents DBWiki, a design prototype for such a system written in the Web programming language Links. We present the architecture, typical use, and wiki markup language design for DBWiki and discuss features of Links that provided unique advantages for rapid Web/database application prototyping
Large-scale event extraction from literature with multi-level gene normalization
Text mining for the life sciences aims to aid database curation, knowledge summarization and information retrieval through the automated processing of biomedical texts. To provide comprehensive coverage and enable full integration with existing biomolecular database records, it is crucial that text mining tools scale up to millions of articles and that their analyses can be unambiguously linked to information recorded in resources such as UniProt, KEGG, BioGRID and NCBI databases. In this study, we investigate how fully automated text mining of complex biomolecular events can be augmented with a normalization strategy that identifies biological concepts in text, mapping them to identifiers at varying levels of granularity, ranging from canonicalized symbols to unique gene and proteins and broad gene families. To this end, we have combined two state-of-the-art text mining components, previously evaluated on two community-wide challenges, and have extended and improved upon these methods by exploiting their complementary nature. Using these systems, we perform normalization and event extraction to create a large-scale resource that is publicly available, unique in semantic scope, and covers all 21.9 million PubMed abstracts and 460 thousand PubMed Central open access full-text articles. This dataset contains 40 million biomolecular events involving 76 million gene/protein mentions, linked to 122 thousand distinct genes from 5032 species across the full taxonomic tree. Detailed evaluations and analyses reveal promising results for application of this data in database and pathway curation efforts. The main software components used in this study are released under an open-source license. Further, the resulting dataset is freely accessible through a novel API, providing programmatic and customized access (http://www.evexdb.org/api/v001/). Finally, to allow for large-scale bioinformatic analyses, the entire resource is available for bulk download from http://evexdb.org/download/, under the Creative Commons -Attribution - Share Alike (CC BY-SA) license
Modeling of Traceability Information System for Material Flow Control Data.
This paper focuses on data modeling for traceability of material/work flow in information
layer of manufacturing control system. The model is able to trace all associated data throughout the
product manufacturing from order to final product. Dynamic data processing of Quality and Purchase
activities are considered in data modeling as well as Order and Operation base on lots particulars. The
modeling consisted of four steps and integrated as one final model. Entity-Relationships Modeling as
data modeling methodology is proposed. The model is reengineered with Toad Data Modeler software
in physical modeling step. The developed model promises to handle fundamental issues of a
traceability system effectively. It supports for customization and real-time control of material in flow
in all levels of manufacturing processes. Through enhanced visibility and dynamic store/retrieval of
data, all traceability usages and applications is responded. Designed solution is initially applicable as
reference data model in identical lot-base traceability system
Clinical data wrangling using Ontological Realism and Referent Tracking
Ontological realism aims at the development of high quality ontologies that faithfully represent what is general in reality and to use these ontologies to render heterogeneous data collections comparable. To achieve this second goal for clinical research datasets presupposes not merely (1) that the requisite ontologies already exist, but also (2) that the datasets in question are faithful to reality in the dual sense that (a) they denote only particulars and relationships between particulars that do in fact exist and (b) they do this in terms of the types and type-level relationships described in these ontologies. While much attention has been devoted to (1), work on (2), which is the topic of this paper, is comparatively rare. Using Referent Tracking as basis, we describe a technical data wrangling strategy which consists in creating for each dataset a template that, when applied to each particular record in the
dataset, leads to the generation of a collection of Referent Tracking Tuples (RTT) built out of unique identifiers for the entities described by means of the data items in the record. The proposed strategy is based on (i) the distinction between data and what data are about, and (ii) the explicit descriptions of portions of reality which RTTs provide and which range not only over the particulars described by data items in a dataset, but also over these data items themselves. This last feature allows us to describe particulars that are only implicitly referred to by the dataset; to provide information about correspondences between data items in a dataset; and to assert
which data items are unjustifiably or redundantly present in or absent from the dataset. The approach has been tested on a dataset collected from patients seeking treatment for orofacial pain at two German universities and made available for the
NIDCR-funded OPMQoL project
Towards a Semantic Perceptual Image Metric
We present a full reference, perceptual image metric based on VGG-16, an
artificial neural network trained on object classification. We fit the metric
to a new database based on 140k unique images annotated with ground truth by
human raters who received minimal instruction. The resulting metric shows
competitive performance on TID 2013, a database widely used to assess image
quality assessments methods. More interestingly, it shows strong responses to
objects potentially carrying semantic relevance such as faces and text, which
we demonstrate using a visualization technique and ablation experiments. In
effect, the metric appears to model a higher influence of semantic context on
judgments, which we observe particularly in untrained raters. As the vast
majority of users of image processing systems are unfamiliar with Image Quality
Assessment (IQA) tasks, these findings may have significant impact on
real-world applications of perceptual metrics
- …