285 research outputs found

    A Sample Advisor for Approximate Query Processing

    Get PDF
    The rapid growth of current data warehouse systems makes random sampling a crucial component of modern data management systems. Although there is a large body of work on database sampling, the problem of automatic sample selection remained (almost) unaddressed. In this paper, we tackle the problem with a sample advisor. We propose a cost model to evaluate a sample for a given query. Based on this, our sample advisor determines the optimal set of samples for a given set of queries specified by an expert. We further propose an extension to utilize recorded workload information. In this case, the sample advisor takes the set of queries and a given memory bound into account for the computation of a sample advice. Additionally, we consider the merge of samples in case of overlapping sample advice and present both an exact and a heuristic solution. Within our evaluation, we analyze the properties of the cost model and compare the proposed algorithms. We further demonstrate the effectiveness and the efficiency of the heuristic solutions with a variety of experiments

    Optimizing Sample Design for Approximate Query Processing

    Get PDF
    The rapid increase of data volumes makes sampling a crucial component of modern data management systems. Although there is a large body of work on database sampling, the problem of automatically determine the optimal sample for a given query remained (almost) unaddressed. To tackle this problem the authors propose a sample advisor based on a novel cost model. Primarily designed for advising samples of a few queries specified by an expert, the authors additionally propose two extensions of the sample advisor. The first extension enhances the applicability by utilizing recorded workload information and taking memory bounds into account. The second extension increases the effectiveness by merging samples in case of overlapping pieces of sample advice. For both extensions, the authors present exact and heuristic solutions. Within their evaluation, the authors analyze the properties of the cost model and demonstrate the effectiveness and the efficiency of the heuristic solutions with a variety of experiments

    Sample Footprints fĂĽr Data-Warehouse-Datenbanken

    Get PDF
    Durch stetig wachsende Datenmengen in aktuellen Data-Warehouse-Datenbanken erlangen Stichproben eine immer größer werdende Bedeutung. Insbesondere interaktive Analysen können von den signifikant kürzeren Antwortzeiten der approximativen Anfrageverarbeitung erheblich profitieren. Linked-Bernoulli-Synopsen bieten in diesem Szenario speichereffiziente, schemaweite Synopsen, d. h. Synopsen mit Stichproben jeder im Schema enthaltenen Tabelle bei minimalem Mehraufwand für die Erhaltung der referenziellen Integrität innerhalb der Synopse. Dies ermöglicht eine effiziente Unterstützung der näherungsweisen Beantwortung von Anfragen mit beliebigen Fremdschlüsselverbundoperationen. In diesem Artikel wird der Einsatz von Linked-Bernoulli-Synopsen in Data-Warehouse-Umgebungen detaillierter analysiert. Dies beinhaltet zum einen die Konstruktion speicherplatzbeschränkter, schemaweiter Synopsen, wobei unter anderem folgende Fragen adressiert werden: Wie kann der verfügbare Speicherplatz auf die einzelnen Stichproben aufgeteilt werden? Was sind die Auswirkungen auf den Mehraufwand? Zum anderen wird untersucht, wie Linked-Bernoulli-Synopsen für die Verwendung in Data-Warehouse-Datenbanken angepasst werden können. Hierfür werden eine inkrementelle Wartungsstrategie sowie eine Erweiterung um eine Ausreißerbehandlung für die Reduzierung von Schätzfehlern approximativer Antworten von Aggregationsanfragen mit Fremdschlüsselverbundoperationen vorgestellt. Eine Vielzahl von Experimenten zeigt, dass Linked-Bernoulli-Synopsen und die in diesem Artikel präsentierten Verfahren vielversprechend für den Einsatz in Data-Warehouse-Datenbanken sind.With the amount of data in current data warehouse databases growing steadily, random sampling is continuously gaining in importance. In particular, interactive analyses of large datasets can greatly benefit from the significantly shorter response times of approximate query processing. In this scenario, Linked Bernoulli Synopses provide memory-efficient schema-level synopses, i. e., synopses that consist of random samples of each table in the schema with minimal overhead for retaining foreign-key integrity within the synopsis. This provides efficient support to the approximate answering of queries with arbitrary foreign-key joins. In this article, we focus on the application of Linked Bernoulli Synopses in data warehouse environments. On the one hand, we analyze the instantiation of memory-bounded synopses. Among others, we address the following questions: How can the given space be partitioned among the individual samples? What is the impact on the overhead? On the other hand, we consider further adaptations of Linked Bernoulli Synopses for usage in data warehouse databases. We show how synopses can incrementally be kept up-to-date when the underlying data changes. Further, we suggest additional outlier handling methods to reduce the estimation error of approximate answers of aggregation queries with foreign-key joins. With a variety of experiments, we show that Linked Bernoulli Synopses and the proposed techniques have great potential in the context of data warehouse databases

    Vacuum Electromagnetic Counterparts of Binary Black-Hole Mergers

    Full text link
    As one step towards a systematic modeling of the electromagnetic (EM) emission from an inspiralling black hole binary we consider a simple scenario in which the binary moves in a uniform magnetic field anchored to a distant circumbinary disc. We study this system by solving the Einstein-Maxwell equations in which the EM fields are chosen with astrophysically consistent strengths. We consider binaries with spins aligned or anti-aligned with the orbital angular momentum and study the dependence of gravitational and EM signals with these spin configurations. Overall we find that the EM radiation in the lowest l=2, m=2 multipole accurately reflects the gravitational one, with identical phase evolutions and amplitudes that differ only by a scaling factor. We also compute the efficiency of the energy emission in EM waves and find that it is given by E^rad_EM/M ~ 10^-15 (M/10^8 M_Sun)^2 (B/10^4 G)^2, hence 13 orders of magnitude smaller than the gravitational energy for realistic magnetic fields. The corresponding luminosity is much smaller than the accretion luminosity if the system is accreting at near the Eddington rate. Most importantly, this EM emission is at frequencies of 10^-4 (10^8 M_Sun/M) Hz, well outside those accessible to astronomical radio observations. As a result, it is unlikely that the EM emission discussed here can be detected directly and simultaneously with the gravitational-wave one. However, indirect processes, driven by changes in the EM fields behavior could yield observable events. In particular if the accretion rate of the circumbinary disc is small and sufficiently stable over the timescale of the final inspiral, then the EM emission may be observable indirectly as it will alter the accretion rate through the magnetic torques exerted by the distorted magnetic field lines

    Linked Bernoulli Synopses: Sampling along Foreign Keys

    Get PDF
    Random sampling is a popular technique for providing fast approximate query answers, especially in data warehouse environments. Compared to other types of synopses, random sampling bears the advantage of retaining the dataset’s dimensionality; it also associates probabilistic error bounds with the query results. Most of the available sampling techniques focus on table-level sampling, that is, they produce a sample of only a single database table. Queries that contain joins over multiple tables cannot be answered with such samples because join results on random samples are often small and skewed. On the contrary, schema-level sampling techniques by design support queries containing joins. In this paper, we introduce Linked Bernoulli Synopses, a schema-level sampling scheme based upon the well-known Join Synopses. Both schemes rely on the idea of maintaining foreign-key integrity in the synopses; they are therefore suited to process queries containing arbitrary foreign-key joins. In contrast to Join Synopses, however, Linked Bernoulli Synopses correlate the sampling processes of the different tables in the database so as to minimize the space overhead, without destroying the uniformity of the individual samples. We also discuss how to compute Linked Bernoulli Synopses which maximize the effective sampling fraction for a given memory budget. The computation of the optimum solution is often computationally prohibitive so that approximate solutions are needed. We propose a simple heuristic approach which is fast and seems to produce close-to-optimum results in practice. We conclude the paper with an evaluation of our methods on both synthetic and real-world datasets

    Efficient Forecasting for Hierarchical Time Series

    Get PDF
    Forecasting is used as the basis for business planning in many application areas such as energy, sales and traffic management. Time series data used in these areas is often hierarchically organized and thus, aggregated along the hierarchy levels based on their dimensional features. Calculating forecasts in these environments is very time consuming, due to ensuring forecasting consistency between hierarchy levels. To increase the forecasting efficiency for hierarchically organized time series, we introduce a novel forecasting approach that takes advantage of the hierarchical organization. There, we reuse the forecast models maintained on the lowest level of the hierarchy to almost instantly create already estimated forecast models on higher hierarchical levels. In addition, we define a hierarchical communication framework, increasing the communication flexibility and efficiency. Our experiments show significant runtime improvements for creating a forecast model at higher hierarchical levels, while still providing a very high accuracy

    pEDM: Online-Forecasting for Smart Energy Analytics

    Get PDF
    Continuous balancing of energy demand and supply is a fundamental prerequisite for the stability of energy grids and requires accurate forecasts of electricity consumption and production at any point in time. Today's Energy Data Management (EDM) systems already provide accurate predictions, but typically employ a very time-consuming and inflexible forecasting process. However, emerging trends such as intra-day trading and an increasing share of renewable energy sources need a higher forecasting efficiency. Additionally, the wide variety of applications in the energy domain pose different requirements with respect to runtime and accuracy and thus, require flexible control of the forecasting process. To solve this issue, we introduce our novel online forecasting process as part of our EDM system called pEDM. The online forecasting process rapidly provides forecasting results and iteratively refines them over time. Thus, we avoid long calculation times and allow applications to adapt the process to their needs. Our evaluation shows that our online forecasting process offers a very efficient and flexible way of providing forecasts to the requesting applications

    Forecasting in Hierarchical Environments

    Get PDF
    Forecasting is an important data analysis technique and serves as the basis for business planning in many application areas such as energy, sales and traffic management. The currently employed statistical models already provide very accurate predictions, but the forecasting calculation process is very time consuming. This is especially true since many application domains deal with hierarchically organized data. Forecasting in these environments is especially challenging due to ensuring forecasting consistency between hierarchy levels, which leads to an increased data processing and communication effort. For this purpose, we introduce our novel hierarchical forecasting approach, where we propose to push forecast models to the entities on the lowest hierarch level and reuse these models to efficiently create forecast models on higher hierarchical levels. With that we avoid the time-consuming parameter estimation process and allow an almost instant calculation of forecasts

    The assessment of left ventricular mechanical dyssynchrony from gated 99mTc-tetrofosmin SPECT and gated 18F-FDG PET by QGS: a comparative study

    Get PDF
    BACKGROUND Due to partly conflicting studies, further research is warranted with the QGS software package, with regard to the performance of gated FDG PET phase analysis as compared to gated MPS as well as the establishment of possible cut-off values for FDG PET to define dyssynchrony. METHODS Gated MPS and gated FDG PET datasets of 93 patients were analyzed with the QGS software. BW, Phase SD, and Entropy were calculated and compared between the methods. The performance of gated PET to identify dyssynchrony was measured against SPECT as reference standard. ROC analysis was performed to identify the best discriminator of dyssynchrony and to define cut-off values. RESULTS BW and Phase SD differed significantly between the SPECT and PET. There was no significant difference in Entropy with a high linear correlation between methods. There was only moderate agreement between SPECT and PET to identify dyssynchrony. Entropy was the best single PET parameter to predict dyssynchrony with a cut-off point at 62%. CONCLUSION Gated MPS and gated FDG PET can assess LVMD. The methods cannot be used interchangeably. Establishing reference ranges and cut-off values is difficult due to the lack of an external gold standard. Further prospective research is necessary
    • …
    corecore