654,484 research outputs found
Oracle Database 10g: a platform for BLAST search and Regular Expression pattern matching in life sciences
As database management systems expand their array of analytical functionality, they become powerful research engines for biomedical data analysis and drug discovery. Databases can hold most of the data types commonly required in life sciences and consequently can be used as flexible platforms for the implementation of knowledgebases. Performing data analysis in the database simplifies data management by minimizing the movement of data from disks to memory, allowing pre-filtering and post-processing of datasets, and enabling data to remain in a secure, highly available environment. This article describes the Oracle Database 10g implementation of BLAST and Regular Expression Searches and provides case studies of their usage in bioinformatics. http://www.oracle.com/technology/software/index.htm
Decision support method for the selection of OMSs
With the increasing demand for highly complex, integrated and application-domain-specific systems engineering environments (SEEs) more or less specialized components of the SEEs are developed. An important component is the database management system (DBMS). As conventional DBMSs are not useful to fulfill the requirements on highly complex, persistent data structures, specialized DBMSs, namely object management systems (OMS), have been developed. An advantage of OMSs is that they further enhance the integration not only of data but also of processes. Currently several specialized OMSs with significantly different properties such as the data model, architecture and performance are available. As it is very difficult for an SEE developer to select the most appropriate OMS, we propose a decision support method which enables an SEE developer to identify his requirements and to compare the evaluation results of different OMSs. Additionally we present a practical experiment where we have applied the decision support method for comparing different OMSs. Experiences of the investigation are presented briefly
Scalable and Highly Available Database Systems in the Cloud
Cloud computing allows users to tap into a massive pool of shared computing
resources such as servers, storage, and network. These resources are provided as a
service to the users allowing them to “plug into the cloud” similar to a utility grid.
The promise of the cloud is to free users from the tedious and often complex task of
managing and provisioning computing resources to run applications. At the same
time, the cloud brings several additional benefits including: a pay-as-you-go cost
model, easier deployment of applications, elastic scalability, high availability, and
a more robust and secure infrastructure.
One important class of applications that users are increasingly deploying in
the cloud is database management systems. Database management systems differ
from other types of applications in that they manage large amounts of state that
is frequently updated, and that must be kept consistent at all scales and in the
presence of failure. This makes it difficult to provide scalability and high availability
for database systems in the cloud. In this thesis, we show how we can exploit
cloud technologies and relational database systems to provide a highly available
and scalable database service in the cloud.
The first part of the thesis presents RemusDB, a reliable, cost-effective high
availability solution that is implemented as a service provided by the virtualization
platform. RemusDB can make any database system highly available with little or
no code modifications by exploiting the capabilities of virtualization. In the second
part of the thesis, we present two systems that aim to provide elastic scalability
for database systems in the cloud using two very different approaches. The three
systems presented in this thesis bring us closer to the goal of building a scalable
and reliable transactional database service in the cloud
Scalable transactions in the cloud: partitioning revisited
Lecture Notes in Computer Science, 6427Cloud computing is becoming one of the most used paradigms to deploy highly available and scalable systems. These systems usually demand the management of huge amounts of data, which cannot be solved with traditional nor replicated database systems as we know them. Recent solutions store data in special key-value structures, in an approach that commonly lacks the consistency provided by transactional guarantees, as it is traded for high scalability and availability. In order to ensure consistent access to the information, the use of transactions is required. However, it is well-known that traditional replication protocols do not scale well for a cloud environment. Here we take a look at current proposals to deploy transactional systems in the cloud and we propose a new system aiming at being a step forward in achieving this goal. We proceed to focus on data partitioning and describe the key role it plays in achieving high scalability.This work has been partially supported by the Spanish Government under grant TIN2009-14460-C03-02 and by the Spanish MEC under grant BES-2007-17362 and by project ReD Resilient Database Clusters (PDTC/EIA-EIA/109044/2008)
Personal digital assistants: Essential tools for preparing dietetics professionals to use new generation information technology
Rapid integration of information technology into health care systems has included the use of highly portable systems-in particular, personal digital assistants (PDAs). With their large built-in memories, fast processors, wireless connectivity, multimedia capacity, and large library of applications, PDAs have been widely adopted by physicians and nurses for patient tracking, disease management, medical references and drug information, enhancing a quality of health care. Many health-related PDA applications are available to both dietetics professionals and clients. Dietetics professionals can effectively use PDAs for client tracking and support, accessing to hospital database or information, and providing better self-monitoring tools to clients. Internship programs for dietetics professionals should include training in the use of PDAs and their dietetics applications, so that new practitioners can stay abreast of this rapidly evolving technology. Several considerations to keep in mind in selecting a PDA and its applications are discussed
Evaluating data freshness in large scale replicated databases
There is nowadays an increasing need for database replication, as the construction of high performance, highly available, and large-scale applications depends on it to maintain data synchronized across multiple servers. A particularly popular approach, used for instance byFacebook, is the MySQL open source database management system and its built-in asynchronous replication mechanism. The limitations imposed by MySQL on replication topologies mean that data has to go through a number of hops or each server has to handle a large number of slaves. This is particularly worrisome when updates are accepted by multiple replicas and in large systems. It is however difficult to accurately evaluate the impact of replication in data freshness, since one has to compare observations at multiple servers while running a realistic workload and without disturbing the system under test. In this paper we address this problem by introducing a tool that can accurately measure replication delays for any workload and then apply it to the industry standard TPC-C benchmark. This allows us to draw interesting conclusions about the scalability properties of MySQL replication
LST-Bench: Benchmarking Log-Structured Tables in the Cloud
Log-Structured Tables (LSTs), also commonly referred to as table formats,
have recently emerged to bring consistency and isolation to object stores. With
the separation of compute and storage, object stores have become the go-to for
highly scalable and durable storage. However, this comes with its own set of
challenges, such as the lack of recovery and concurrency management that
traditional database management systems provide. This is where LSTs such as
Delta Lake, Apache Iceberg, and Apache Hudi come into play, providing an
automatic metadata layer that manages tables defined over object stores,
effectively addressing these challenges. A paradigm shift in the design of
these systems necessitates the updating of evaluation methodologies. In this
paper, we examine the characteristics of LSTs and propose extensions to
existing benchmarks, including workload patterns and metrics, to accurately
capture their performance. We introduce our framework, LST-Bench, which enables
users to execute benchmarks tailored for the evaluation of LSTs. Our evaluation
demonstrates how these benchmarks can be utilized to evaluate the performance,
efficiency, and stability of LSTs. The code for LST-Bench is open sourced and
is available at https://github.com/microsoft/lst-bench/
A generalized system performance model for object-oriented database applications
Although relational database systems have met many needs in traditional business applications, such technology is inadequate for non-traditional applications such as computer-aided design, computer-aided software engineering, and knowledge bases. Object-oriented database systems (OODB) enhance the data modeling power and performance of database management systems for these applications.
Response time is an important issue facing OODB. However, standard measures of on-line transaction processing are irrelevant for OODB . Benchmarks compare alternative implementations of OODB system software, running a constant application workload. Few attempts have been made to characterize performance implications of OODB application design, given a fixed OODB and operating system platform.
In this study, design features of the 007 Benchmark database application (Carey, DeWitt, and Naughton, 1993 ) were varied to explore the impact on response time to perform database operations Sensitivity to the degree of aggregation and to the degree of inheritance in the application were measured. Variability in response times also was measured, using a sequence of database operations to simulate a user transaction workload.
Degree of aggregation was defined as the number of relationship objects processed during a database operation. Response time was linear with the degree of aggregation. The size of the database segment processed, compared to the size of available memory, affected the coefficients of the regression line.
Degree of inheritance was defined as the Number of Children (Chidamber and Kemerer, 1994) in the application class definitions, and as the extent to which run-time polymorphism was implemented. In this study, increased inheritance caused a statistically significant increase in response time for the 007 Traversal 1 only, although this difference was not meaningful.
In the simulated transaction workload of nine 007 operations, response times were highly variable. Response times per operation depended on the number of objects processed and the effect of preceding operations on memory contents. Operations that used disparate physical segments or had large working sets relative to the size of memory caused large increases in response time. Average response times and variability were reduced by removing these operations from the sequence (equivalent to scheduling these transactions at some time when the impact would be minimized)
Ivar, an interpretation‐oriented tool to manage the update and revision of variant annotation and classification
The rapid evolution of Next Generation Sequencing in clinical settings, and the resulting challenge of variant reinterpretation given the constantly updated information, require robust data management systems and organized approaches. In this paper, we present iVar: a freely available and highly customizable tool with a user‐friendly web interface. It represents a platform for the unified management of variants identified by different sequencing technologies. iVar accepts variant call format (VCF) files and text annotation files and elaborates them, optimizing data organization and avoiding redundancies. Updated annotations can be periodically re‐uploaded and associated with variants as historically tracked attributes, i.e., modifications can be recorded whenever an updated value is imported, thus keeping track of all changes. Data can be visualized through variant‐centered and sample‐centered interfaces. A customizable search function can be exploited to periodically check if pathogenicity‐related data of a variant has changed over time. Patient recontacting ensuing from variant reinterpretation is made easier by iVar through the effective identification of all patients present in the database carrying a specific variant. We tested iVar by uploading 4171 VCF files and 1463 annotation files, obtaining a database of 4166 samples and 22,569 unique variants. iVar has proven to be a useful tool with good performance in terms of collecting and managing data from a medium‐throughput laboratory
- …