20,903 research outputs found
Biochemical network matching and composition
This paper looks at biochemical network matching and compositio
SoK: Cryptographically Protected Database Search
Protected database search systems cryptographically isolate the roles of
reading from, writing to, and administering the database. This separation
limits unnecessary administrator access and protects data in the case of system
breaches. Since protected search was introduced in 2000, the area has grown
rapidly; systems are offered by academia, start-ups, and established companies.
However, there is no best protected search system or set of techniques.
Design of such systems is a balancing act between security, functionality,
performance, and usability. This challenge is made more difficult by ongoing
database specialization, as some users will want the functionality of SQL,
NoSQL, or NewSQL databases. This database evolution will continue, and the
protected search community should be able to quickly provide functionality
consistent with newly invented databases.
At the same time, the community must accurately and clearly characterize the
tradeoffs between different approaches. To address these challenges, we provide
the following contributions:
1) An identification of the important primitive operations across database
paradigms. We find there are a small number of base operations that can be used
and combined to support a large number of database paradigms.
2) An evaluation of the current state of protected search systems in
implementing these base operations. This evaluation describes the main
approaches and tradeoffs for each base operation. Furthermore, it puts
protected search in the context of unprotected search, identifying key gaps in
functionality.
3) An analysis of attacks against protected search for different base
queries.
4) A roadmap and tools for transforming a protected search system into a
protected database, including an open-source performance evaluation platform
and initial user opinions of protected search.Comment: 20 pages, to appear to IEEE Security and Privac
Nonparametric Bayesian Modeling for Automated Database Schema Matching
The problem of merging databases arises in many government and commercial
applications. Schema matching, a common first step, identifies equivalent
fields between databases. We introduce a schema matching framework that builds
nonparametric Bayesian models for each field and compares them by computing the
probability that a single model could have generated both fields. Our
experiments show that our method is more accurate and faster than the existing
instance-based matching algorithms in part because of the use of nonparametric
Bayesian models
Evolving NoSQL Databases Without Downtime
NoSQL databases like Redis, Cassandra, and MongoDB are increasingly popular
because they are flexible, lightweight, and easy to work with. Applications
that use these databases will evolve over time, sometimes necessitating (or
preferring) a change to the format or organization of the data. The problem we
address in this paper is: How can we support the evolution of high-availability
applications and their NoSQL data online, without excessive delays or
interruptions, even in the presence of backward-incompatible data format
changes?
We present KVolve, an extension to the popular Redis NoSQL database, as a
solution to this problem. KVolve permits a developer to submit an upgrade
specification that defines how to transform existing data to the newest
version. This transformation is applied lazily as applications interact with
the database, thus avoiding long pause times. We demonstrate that KVolve is
expressive enough to support substantial practical updates, including format
changes to RedisFS, a Redis-backed file system, while imposing essentially no
overhead in general use and minimal pause times during updates.Comment: Update to writing/structur
Chemoinformatics Research at the University of Sheffield: A History and Citation Analysis
This paper reviews the work of the Chemoinformatics Research Group in the Department of Information Studies at the University of Sheffield, focusing particularly on the work carried out in the period 1985-2002. Four major research areas are discussed, these involving the development of methods for: substructure searching in databases of three-dimensional structures, including both rigid and flexible molecules; the representation and searching of the Markush structures that occur in chemical patents; similarity searching in databases of both two-dimensional and three-dimensional structures; and compound selection and the design of combinatorial libraries. An analysis of citations to 321 publications from the Group shows that it attracted a total of 3725 residual citations during the period 1980-2002. These citations appeared in 411 different journals, and involved 910 different citing organizations from 54 different countries, thus demonstrating the widespread impact of the Group's work
Active artefact management for distributed software engineering
We describe a software artefact repository that provides its contents with some awareness of their own creation. "Active" artefacts are distinguished from their passive counterparts by their enriched meta-data model which reflects the work-flow process that created them, the actors responsible, the actions taken to change the artefact, and various other pieces of organisational knowledge. This enriched view of an artefact is intended to support re-use of both software and the expertise gained when creating the software. Unlike other organisational knowledge systems, the meta-data is intrinsically part of the artefact and may be populated automatically from sources including existing data-format specific information, user supplied data and records of communication.
Such a system is of increased importance in the world of
"virtual teams" where transmission of vital organisational
knowledge, at best difficult, is further constrained by the
lack of direct contact between engineers and differing development cultures
- …