616 research outputs found
Challenging Ubiquitous Inverted Files
Stand-alone ranking systems based on highly optimized inverted file structures are generally considered âtheâ solution for building search engines. Observing various developments in software and hardware, we argue however that IR research faces a complex engineering problem in the quest for more flexible yet efficient retrieval systems. We propose to base the development of retrieval systems on âthe database approachâ: mapping high-level declarative specifications of the retrieval process into efficient query plans. We present the Mirror DBMS as a prototype implementation of a retrieval system based on this approach
SoK: Cryptographically Protected Database Search
Protected database search systems cryptographically isolate the roles of
reading from, writing to, and administering the database. This separation
limits unnecessary administrator access and protects data in the case of system
breaches. Since protected search was introduced in 2000, the area has grown
rapidly; systems are offered by academia, start-ups, and established companies.
However, there is no best protected search system or set of techniques.
Design of such systems is a balancing act between security, functionality,
performance, and usability. This challenge is made more difficult by ongoing
database specialization, as some users will want the functionality of SQL,
NoSQL, or NewSQL databases. This database evolution will continue, and the
protected search community should be able to quickly provide functionality
consistent with newly invented databases.
At the same time, the community must accurately and clearly characterize the
tradeoffs between different approaches. To address these challenges, we provide
the following contributions:
1) An identification of the important primitive operations across database
paradigms. We find there are a small number of base operations that can be used
and combined to support a large number of database paradigms.
2) An evaluation of the current state of protected search systems in
implementing these base operations. This evaluation describes the main
approaches and tradeoffs for each base operation. Furthermore, it puts
protected search in the context of unprotected search, identifying key gaps in
functionality.
3) An analysis of attacks against protected search for different base
queries.
4) A roadmap and tools for transforming a protected search system into a
protected database, including an open-source performance evaluation platform
and initial user opinions of protected search.Comment: 20 pages, to appear to IEEE Security and Privac
BioWarehouse: a bioinformatics database warehouse toolkit
BACKGROUND: This article addresses the problem of interoperation of heterogeneous bioinformatics databases. RESULTS: We introduce BioWarehouse, an open source toolkit for constructing bioinformatics database warehouses using the MySQL and Oracle relational database managers. BioWarehouse integrates its component databases into a common representational framework within a single database management system, thus enabling multi-database queries using the Structured Query Language (SQL) but also facilitating a variety of database integration tasks such as comparative analysis and data mining. BioWarehouse currently supports the integration of a pathway-centric set of databases including ENZYME, KEGG, and BioCyc, and in addition the UniProt, GenBank, NCBI Taxonomy, and CMR databases, and the Gene Ontology. Loader tools, written in the C and JAVA languages, parse and load these databases into a relational database schema. The loaders also apply a degree of semantic normalization to their respective source data, decreasing semantic heterogeneity. The schema supports the following bioinformatics datatypes: chemical compounds, biochemical reactions, metabolic pathways, proteins, genes, nucleic acid sequences, features on protein and nucleic-acid sequences, organisms, organism taxonomies, and controlled vocabularies. As an application example, we applied BioWarehouse to determine the fraction of biochemically characterized enzyme activities for which no sequences exist in the public sequence databases. The answer is that no sequence exists for 36% of enzyme activities for which EC numbers have been assigned. These gaps in sequence data significantly limit the accuracy of genome annotation and metabolic pathway prediction, and are a barrier for metabolic engineering. Complex queries of this type provide examples of the value of the data warehousing approach to bioinformatics research. CONCLUSION: BioWarehouse embodies significant progress on the database integration problem for bioinformatics
The exploration of a category theory-based virtual Geometrical product specification system for design and manufacturing
In order to ensure quality of products and to facilitate global outsourcing, almost all
the so-called âworld-classâ manufacturing companies nowadays are applying various
tools and methods to maintain the consistency of a productâs characteristics
throughout its manufacturing life cycle. Among these, for ensuring the consistency of
the geometric characteristics, a tolerancing language â the Geometrical Product
Specification (GPS) has been widely adopted to precisely transform the functional
requirements from customers into manufactured workpieces expressed as tolerance
notes in technical drawings. Although commonly acknowledged by industrial users as
one of the most successful efforts in integrating existing manufacturing life-cycle
standards, current GPS implementations and software packages suffer from several
drawbacks in their practical use, possibly the most significant, the difficulties in
inferring the data for the âbestâ solutions. The problem stemmed from the foundation
of data structures and knowledge-based system design. This indicates that there need
to be a ânewâ software system to facilitate GPS applications.
The presented thesis introduced an innovative knowledge-based system â the
VirtualGPS â that provides an integrated GPS knowledge platform based on a stable
and efficient database structure with knowledge generation and accessing facilities.
The system focuses on solving the intrinsic product design and production problems
by acting as a virtual domain expert through translating GPS standards and rules into
the forms of computerized expert advices and warnings. Furthermore, this system can
be used as a training tool for young and new engineers to understand the huge amount
of GPS standards in a relative âquickerâ manner.
The thesis started with a detailed discussion of the proposed categorical modelling
mechanism, which has been devised based on the Category Theory. It provided a
unified mechanism for knowledge acquisition and representation, knowledge-based
system design, and database schema modelling. As a core part for assessing this
knowledge-based system, the implementation of the categorical Database
Management System (DBMS) is also presented in this thesis. The focus then moved
on to demonstrate the design and implementation of the proposed VirtualGPS system.
The tests and evaluations of this system were illustrated in Chapter 6. Finally, the
thesis summarized the contributions to knowledge in Chapter 7.
After thoroughly reviewing the project, the conclusions reached construe that the
III
entire VirtualGPS system was designed and implemented to conform to Category
Theory and object-oriented programming rules. The initial tests and performance
analyses show that the system facilitates the geometric product manufacturing
operations and benefits the manufacturers and engineers alike from function designs,
to a manufacturing and verification
- âŚ