632,341 research outputs found
LogBase: A Scalable Log-structured Database System in the Cloud
Numerous applications such as financial transactions (e.g., stock trading)
are write-heavy in nature. The shift from reads to writes in web applications
has also been accelerating in recent years. Write-ahead-logging is a common
approach for providing recovery capability while improving performance in most
storage systems. However, the separation of log and application data incurs
write overheads observed in write-heavy environments and hence adversely
affects the write throughput and recovery time in the system. In this paper, we
introduce LogBase - a scalable log-structured database system that adopts
log-only storage for removing the write bottleneck and supporting fast system
recovery. LogBase is designed to be dynamically deployed on commodity clusters
to take advantage of elastic scaling property of cloud environments. LogBase
provides in-memory multiversion indexes for supporting efficient access to data
maintained in the log. LogBase also supports transactions that bundle read and
write operations spanning across multiple records. We implemented the proposed
system and compared it with HBase and a disk-based log-structured
record-oriented system modeled after RAMCloud. The experimental results show
that LogBase is able to provide sustained write throughput, efficient data
access out of the cache, and effective system recovery.Comment: VLDB201
Determinación de factores influyentes sobre una respuesta en un dominio poco estructurado
This report focuses on results obtained from a classification
technique applied to time series data in a medical ill-structured
The statistical analysis and classification --in ill-structured--
of such data are often inadequate because of the intrinsic
characteristics of those domains.
The database in this analysis contains information relative to
patients with major depressive disorders or esquizofrenia; as a
consequence, a high quantity of database variables contain data
corresponding to measures taken in different instant of time,
making curves.
For this reason we are motivated about how we can establish a
useful classification technique of curves in a medical
ill-structured domain.Postprint (published version
The MultiDark Database: Release of the Bolshoi and MultiDark Cosmological Simulations
We present the online MultiDark Database -- a Virtual Observatory-oriented,
relational database for hosting various cosmological simulations. The data is
accessible via an SQL (Structured Query Language) query interface, which also
allows users to directly pose scientific questions, as shown in a number of
examples in this paper. Further examples for the usage of the database are
given in its extensive online documentation (www.multidark.org). The database
is based on the same technology as the Millennium Database, a fact that will
greatly facilitate the usage of both suites of cosmological simulations. The
first release of the MultiDark Database hosts two 8.6 billion particle
cosmological N-body simulations: the Bolshoi (250/h Mpc simulation box, 1/h kpc
resolution) and MultiDark Run1 simulation (MDR1, or BigBolshoi, 1000/h Mpc
simulation box, 7/h kpc resolution). The extraction methods for halos/subhalos
from the raw simulation data, and how this data is structured in the database
are explained in this paper. With the first data release, users get full access
to halo/subhalo catalogs, various profiles of the halos at redshifts z=0-15,
and raw dark matter data for one time-step of the Bolshoi and four time-steps
of the MultiDark simulation. Later releases will also include galaxy mock
catalogs and additional merging trees for both simulations as well as new large
volume simulations with high resolution. This project is further proof of the
viability to store and present complex data using relational database
technology. We encourage other simulators to publish their results in a similar
manner.Comment: 28 pages, 9 figures, submitted to New Astronom
Path constraints in semistructured databases
AbstractWe investigate a class of path constraints that is of interest in connection with both semistructured and structured data. In standard database systems, constraints are typically expressed as part of the schema, but in semistructured data there is no explicit schema and path constraints provide a natural alternative. As with structured data, path constraints on semistructured data express integrity constraints associated with the semantics of data and are important in query optimization. We show that in semistructured databases, despite the simple syntax of the constraints, their associated implication problem is r.e. complete and finite implication problem is co-r.e. complete. However, we establish the decidability of the implication and finite implication problems for several fragments of the path constraint language and demonstrate that these fragments suffice to express important semantic information such as extent constraints, inverse relationships, and local database constraints commonly found in object-oriented databases
- …