518 research outputs found
Minimum Clinical Recommendations for diagnosis, treatment and follow-up of malignant pleural mesothelioma
Pay One, Get Hundreds for Free: Reducing Cloud Costs through Shared Query Execution
Cloud-based data analysis is nowadays common practice because of the lower
system management overhead as well as the pay-as-you-go pricing model. The
pricing model, however, is not always suitable for query processing as heavy
use results in high costs. For example, in query-as-a-service systems, where
users are charged per processed byte, collections of queries accessing the same
data frequently can become expensive. The problem is compounded by the limited
options for the user to optimize query execution when using declarative
interfaces such as SQL. In this paper, we show how, without modifying existing
systems and without the involvement of the cloud provider, it is possible to
significantly reduce the overhead, and hence the cost, of query-as-a-service
systems. Our approach is based on query rewriting so that multiple concurrent
queries are combined into a single query. Our experiments show the aggregated
amount of work done by the shared execution is smaller than in a
query-at-a-time approach. Since queries are charged per byte processed, the
cost of executing a group of queries is often the same as executing a single
one of them. As an example, we demonstrate how the shared execution of the
TPC-H benchmark is up to 100x and 16x cheaper in Amazon Athena and Google
BigQuery than using a query-at-a-time approach while achieving a higher
throughput
Report on the Second International Workshop on Data Management on Modern Hardware (DaMoN'06)
This report summarizes the presentations and discussions that occurred during the Second International Workshop on Data Management on Modern Hardware (DaMoN). DaMoN was held in Chicago on June 25th, 2006, and was collocated with ACM SIGMOD 2006. The aim of this one-day workshop is to bring together researchers interested in optimizing database performance on modern computing infrastructure by designing new data management techniques and tools
Welcome to Sigmod 2019 - The 2019 ACM SIGMOD International Conference on the Management of Data!
Forecasting the cost of processing multi-join queries via hashing for main-memory databases (Extended version)
Database management systems (DBMSs) carefully optimize complex multi-join
queries to avoid expensive disk I/O. As servers today feature tens or hundreds
of gigabytes of RAM, a significant fraction of many analytic databases becomes
memory-resident. Even after careful tuning for an in-memory environment, a
linear disk I/O model such as the one implemented in PostgreSQL may make query
response time predictions that are up to 2X slower than the optimal multi-join
query plan over memory-resident data. This paper introduces a memory I/O cost
model to identify good evaluation strategies for complex query plans with
multiple hash-based equi-joins over memory-resident data. The proposed cost
model is carefully validated for accuracy using three different systems,
including an Amazon EC2 instance, to control for hardware-specific differences.
Prior work in parallel query evaluation has advocated right-deep and bushy
trees for multi-join queries due to their greater parallelization and
pipelining potential. A surprising finding is that the conventional wisdom from
shared-nothing disk-based systems does not directly apply to the modern
shared-everything memory hierarchy. As corroborated by our model, the
performance gap between the optimal left-deep and right-deep query plan can
grow to about 10X as the number of joins in the query increases.Comment: 15 pages, 8 figures, extended version of the paper to appear in
SoCC'1
Quality predictors of abdominal fetal electrocardiography recording in antenatal ambulatory and bedside settings
Background: Fetal electrocardiography using an abdominal monitor (Monica AN24™) could increase the diagnostic use of fetal heart rate (fHR) variability measurements. However, signal quality may depend on factors such as maternal physical activity, posture, and bedside versus ambulatory setting. Methods: Sixty-three healthy women wore the monitor at home and 42 women during a hospital stay. All women underwent a posture experiment, and all home and 13 hospital participants wore the monitor during daytime and nighttime. The success rate (SR) of fHR detection was analyzed in relation to maternal physical activity, posture, daytime versus nighttime, and other maternal and fetal predictors. Results: Ambulatorily, the SR was 86.8% for nighttime and 40.2% for daytime. The low daytime SR was largely due to effects of maternal physical activity and posture. The in-hospital SR was lower during nighttime (71.1%) and similar during daytime (43.3%). SR was related to gestational age, but not affected by pre-pregnancy and current body mass index or fetal growth restriction. Conclusions: The success of beat-to-beat fHR detection strongly depends on the home/hospital setting and predictors such as time of recording, activity levels, and maternal posture. Its clinical utility may be limited in periods of unsupervised recording with physical activity or posture shifts
Замена электродвигателя ПЭН турбоприводом на Кемеровской ТЭЦ
В данной работе рассматривается возможность замены электродвигателя ПЭН турбоприводом на Кемеровской ТЭЦ, с установкой турбопривода на существующий фундамент. Целью работы является оценка возможности увеличения отпуска электроэнергии от станции в результате уменьшения затрат на собственные нужды и повышение маневренности ТЭЦ.In this paper we consider the possibility of replacing the turbine drive motor PEN to Kemerovo CHP , with the installation of turbine drive on the existing foundation. The aim is to assess the possibility of increasing the supply of electric power from the plant by reducing the costs of their own needs and improving maneuverability CHP
Docetaxel-based induction therapy prior to radiotherapy with or without docetaxel for non-small-cell lung cancer.
Genome sequence analysis with MonetDB: a case study on Ebola virus diversity
Next-generation sequencing (NGS) technology has led the life sciences into the big data era. Today, sequencing genomes takes little time and cost, but results in terabytes of data to be stored and analysed. Biologists are often exposed to excessively time consuming and error-prone data management and analysis hurdles. In this paper, we propose a database management system (DBMS) based approach to accelerate and substantially simplify genome sequence analysis. We have extended MonetDB, an open-source column-based DBMS, with a BAM module, which enables easy, flexible, and rapid management and analysis of sequence alignment data stored as Sequence Alignment/Map (SAM/BAM) files. We describe the main features of MonetDB/BAM using a case study on Ebola virus genomes
GeoTriples: Transforming geospatial data into RDF graphs using R2RML and RML mappings
A lot of geospatial data has become available at no charge in many countries recently. Geospatial data that is currently made available by government agencies usually do not follow the linked data paradigm. In the few cases where government agencies do follow the linked data paradigm (e.g., Ordnance Survey in the United Kingdom), specialized scripts have been used for transforming geospatial data into RDF. In this paper we present the open source tool GeoTriples which generates and processes extended R2RML and RML mappings that transform geospatial data from many input formats into RDF. GeoTriples allows the transformation of geospatial data stored in raw files (shapefiles, CSV, KML, XML, GML and GeoJSON) and spatially-enabled RDBMS (PostGIS and MonetDB) into RDF graphs using well-known vocabularies like GeoSPARQL and stSPARQL, but without being tightly coupled to a specific vocabulary. GeoTriples has been developed in European projects LEO and Melodies and has been used to transform many geospatial data sources into linked data. We study the performance of GeoTriples experimentally using large publicly available geospatial datasets, and show that GeoTriples is very efficient and scalable especially when its mapping processor is implemented using Apache Hadoop
- …