Search CORE

632,341 research outputs found

LogBase: A Scalable Log-structured Database System in the Cloud

Author: Agrawal Divyakant
Chen Gang
Ooi Beng Chin
Vo Hoang Tam
Wang Sheng
Publication venue
Publication date: 01/01/2012
Field of study

Numerous applications such as financial transactions (e.g., stock trading) are write-heavy in nature. The shift from reads to writes in web applications has also been accelerating in recent years. Write-ahead-logging is a common approach for providing recovery capability while improving performance in most storage systems. However, the separation of log and application data incurs write overheads observed in write-heavy environments and hence adversely affects the write throughput and recovery time in the system. In this paper, we introduce LogBase - a scalable log-structured database system that adopts log-only storage for removing the write bottleneck and supporting fast system recovery. LogBase is designed to be dynamically deployed on commodity clusters to take advantage of elastic scaling property of cloud environments. LogBase provides in-memory multiversion indexes for supporting efficient access to data maintained in the log. LogBase also supports transactions that bundle read and write operations spanning across multiple records. We implemented the proposed system and compared it with HBase and a disk-based log-structured record-oriented system modeled after RAMCloud. The experimental results show that LogBase is able to provide sustained write throughput, efficient data access out of the cache, and effective system recovery.Comment: VLDB201

arXiv.org e-Print Archive

CiteSeerX

Determinación de factores influyentes sobre una respuesta en un dominio poco estructurado

Author: Gibert Karina
Rodas Osollo Jorge Enrique
Rojo Emilio
Publication venue
Publication date: 01/01/2001
Field of study

This report focuses on results obtained from a classification technique applied to time series data in a medical ill-structured The statistical analysis and classification --in ill-structured-- of such data are often inadequate because of the intrinsic characteristics of those domains. The database in this analysis contains information relative to patients with major depressive disorders or esquizofrenia; as a consequence, a high quantity of database variables contain data corresponding to measures taken in different instant of time, making curves. For this reason we are motivated about how we can establish a useful classification technique of curves in a medical ill-structured domain.Postprint (published version

The MultiDark Database: Release of the Bolshoi and MultiDark Cosmological Simulations

Author: Aarseth
Aarseth
Allgood
Bennett
Bower
Boylan-Kolchin
Bryan
Bullock
Conroy
Croton
Davis
De Lucia
Dubinski
Efstathiou
Gao
Gott
Iliev
Jenkins
Jing
Kauffmann
Kim
Klypin
Klypin
Knollmann
Kravtsov
Kravtsov
Kravtsov
Kuhlen
Lahav
Macciö
Moore
More
Muñoz-Cuartas
Navarro
Neto
Peebles
Prada
Prada
Schneider
Sheth
Somerville
Somerville
Springel
Springel
Springel
Stadel
Teyssier
Tinker
Tinker
Trujillo-Gomez
Vale
van den Bosch
Warren
Wechsler
Wetzel
White
Zentner
Zhao
Publication venue: 'Wiley'
Publication date: 02/09/2011
Field of study

We present the online MultiDark Database -- a Virtual Observatory-oriented, relational database for hosting various cosmological simulations. The data is accessible via an SQL (Structured Query Language) query interface, which also allows users to directly pose scientific questions, as shown in a number of examples in this paper. Further examples for the usage of the database are given in its extensive online documentation (www.multidark.org). The database is based on the same technology as the Millennium Database, a fact that will greatly facilitate the usage of both suites of cosmological simulations. The first release of the MultiDark Database hosts two 8.6 billion particle cosmological N-body simulations: the Bolshoi (250/h Mpc simulation box, 1/h kpc resolution) and MultiDark Run1 simulation (MDR1, or BigBolshoi, 1000/h Mpc simulation box, 7/h kpc resolution). The extraction methods for halos/subhalos from the raw simulation data, and how this data is structured in the database are explained in this paper. With the first data release, users get full access to halo/subhalo catalogs, various profiles of the halos at redshifts z=0-15, and raw dark matter data for one time-step of the Bolshoi and four time-steps of the MultiDark simulation. Later releases will also include galaxy mock catalogs and additional merging trees for both simulations as well as new large volume simulations with high resolution. This project is further proof of the viability to store and present complex data using relational database technology. We encourage other simulators to publish their results in a similar manner.Comment: 28 pages, 9 figures, submitted to New Astronom

arXiv.org e-Print Archive

Path constraints in semistructured databases

Author: Abiteboul
Abiteboul
Bancilhon
Barwise
Börger
Cattell
Chakravarthy
Ebbinghaus
Enderton
Florescu
Grädel
Immerman
Ito
Lamb
Mendelzon
Peter Buneman
Scott Weinstein
van Bommel
Wang
Wenfei Fan
Publication venue: 'Elsevier BV'
Publication date: 01/01/2000
Field of study

AbstractWe investigate a class of path constraints that is of interest in connection with both semistructured and structured data. In standard database systems, constraints are typically expressed as part of the schema, but in semistructured data there is no explicit schema and path constraints provide a natural alternative. As with structured data, path constraints on semistructured data express integrity constraints associated with the semantics of data and are important in query optimization. We show that in semistructured databases, despite the simple syntax of the constraints, their associated implication problem is r.e. complete and finite implication problem is co-r.e. complete. However, we establish the decidability of the implication and finite implication problems for several fragments of the path constraint language and demonstrate that these fragments suffice to express important semantic information such as extent constraints, inverse relationships, and local database constraints commonly found in object-oriented databases