518 research outputs found

    Pay One, Get Hundreds for Free: Reducing Cloud Costs through Shared Query Execution

    Full text link
    Cloud-based data analysis is nowadays common practice because of the lower system management overhead as well as the pay-as-you-go pricing model. The pricing model, however, is not always suitable for query processing as heavy use results in high costs. For example, in query-as-a-service systems, where users are charged per processed byte, collections of queries accessing the same data frequently can become expensive. The problem is compounded by the limited options for the user to optimize query execution when using declarative interfaces such as SQL. In this paper, we show how, without modifying existing systems and without the involvement of the cloud provider, it is possible to significantly reduce the overhead, and hence the cost, of query-as-a-service systems. Our approach is based on query rewriting so that multiple concurrent queries are combined into a single query. Our experiments show the aggregated amount of work done by the shared execution is smaller than in a query-at-a-time approach. Since queries are charged per byte processed, the cost of executing a group of queries is often the same as executing a single one of them. As an example, we demonstrate how the shared execution of the TPC-H benchmark is up to 100x and 16x cheaper in Amazon Athena and Google BigQuery than using a query-at-a-time approach while achieving a higher throughput

    Report on the Second International Workshop on Data Management on Modern Hardware (DaMoN'06)

    Get PDF
    This report summarizes the presentations and discussions that occurred during the Second International Workshop on Data Management on Modern Hardware (DaMoN). DaMoN was held in Chicago on June 25th, 2006, and was collocated with ACM SIGMOD 2006. The aim of this one-day workshop is to bring together researchers interested in optimizing database performance on modern computing infrastructure by designing new data management techniques and tools

    Forecasting the cost of processing multi-join queries via hashing for main-memory databases (Extended version)

    Full text link
    Database management systems (DBMSs) carefully optimize complex multi-join queries to avoid expensive disk I/O. As servers today feature tens or hundreds of gigabytes of RAM, a significant fraction of many analytic databases becomes memory-resident. Even after careful tuning for an in-memory environment, a linear disk I/O model such as the one implemented in PostgreSQL may make query response time predictions that are up to 2X slower than the optimal multi-join query plan over memory-resident data. This paper introduces a memory I/O cost model to identify good evaluation strategies for complex query plans with multiple hash-based equi-joins over memory-resident data. The proposed cost model is carefully validated for accuracy using three different systems, including an Amazon EC2 instance, to control for hardware-specific differences. Prior work in parallel query evaluation has advocated right-deep and bushy trees for multi-join queries due to their greater parallelization and pipelining potential. A surprising finding is that the conventional wisdom from shared-nothing disk-based systems does not directly apply to the modern shared-everything memory hierarchy. As corroborated by our model, the performance gap between the optimal left-deep and right-deep query plan can grow to about 10X as the number of joins in the query increases.Comment: 15 pages, 8 figures, extended version of the paper to appear in SoCC'1

    Quality predictors of abdominal fetal electrocardiography recording in antenatal ambulatory and bedside settings

    Get PDF
    Background: Fetal electrocardiography using an abdominal monitor (Monica AN24™) could increase the diagnostic use of fetal heart rate (fHR) variability measurements. However, signal quality may depend on factors such as maternal physical activity, posture, and bedside versus ambulatory setting. Methods: Sixty-three healthy women wore the monitor at home and 42 women during a hospital stay. All women underwent a posture experiment, and all home and 13 hospital participants wore the monitor during daytime and nighttime. The success rate (SR) of fHR detection was analyzed in relation to maternal physical activity, posture, daytime versus nighttime, and other maternal and fetal predictors. Results: Ambulatorily, the SR was 86.8% for nighttime and 40.2% for daytime. The low daytime SR was largely due to effects of maternal physical activity and posture. The in-hospital SR was lower during nighttime (71.1%) and similar during daytime (43.3%). SR was related to gestational age, but not affected by pre-pregnancy and current body mass index or fetal growth restriction. Conclusions: The success of beat-to-beat fHR detection strongly depends on the home/hospital setting and predictors such as time of recording, activity levels, and maternal posture. Its clinical utility may be limited in periods of unsupervised recording with physical activity or posture shifts

    Замена электродвигателя ПЭН турбоприводом на Кемеровской ТЭЦ

    Get PDF
    В данной работе рассматривается возможность замены электродвигателя ПЭН турбоприводом на Кемеровской ТЭЦ, с установкой турбопривода на существующий фундамент. Целью работы является оценка возможности увеличения отпуска электроэнергии от станции в результате уменьшения затрат на собственные нужды и повышение маневренности ТЭЦ.In this paper we consider the possibility of replacing the turbine drive motor PEN to Kemerovo CHP , with the installation of turbine drive on the existing foundation. The aim is to assess the possibility of increasing the supply of electric power from the plant by reducing the costs of their own needs and improving maneuverability CHP

    Genome sequence analysis with MonetDB: a case study on Ebola virus diversity

    Get PDF
    Next-generation sequencing (NGS) technology has led the life sciences into the big data era. Today, sequencing genomes takes little time and cost, but results in terabytes of data to be stored and analysed. Biologists are often exposed to excessively time consuming and error-prone data management and analysis hurdles. In this paper, we propose a database management system (DBMS) based approach to accelerate and substantially simplify genome sequence analysis. We have extended MonetDB, an open-source column-based DBMS, with a BAM module, which enables easy, flexible, and rapid management and analysis of sequence alignment data stored as Sequence Alignment/Map (SAM/BAM) files. We describe the main features of MonetDB/BAM using a case study on Ebola virus genomes

    GeoTriples: Transforming geospatial data into RDF graphs using R2RML and RML mappings

    Get PDF
    A lot of geospatial data has become available at no charge in many countries recently. Geospatial data that is currently made available by government agencies usually do not follow the linked data paradigm. In the few cases where government agencies do follow the linked data paradigm (e.g., Ordnance Survey in the United Kingdom), specialized scripts have been used for transforming geospatial data into RDF. In this paper we present the open source tool GeoTriples which generates and processes extended R2RML and RML mappings that transform geospatial data from many input formats into RDF. GeoTriples allows the transformation of geospatial data stored in raw files (shapefiles, CSV, KML, XML, GML and GeoJSON) and spatially-enabled RDBMS (PostGIS and MonetDB) into RDF graphs using well-known vocabularies like GeoSPARQL and stSPARQL, but without being tightly coupled to a specific vocabulary. GeoTriples has been developed in European projects LEO and Melodies and has been used to transform many geospatial data sources into linked data. We study the performance of GeoTriples experimentally using large publicly available geospatial datasets, and show that GeoTriples is very efficient and scalable especially when its mapping processor is implemented using Apache Hadoop
    corecore