903 research outputs found

    The data cyclotron query processing scheme

    Get PDF
    Distributed database systems exploit static workload characteristics to steer data fragmentation and data allocation schemes. However, the grand challenge of distributed query processing is to come up with a self-organizing architecture, which exploits all resources to manage the hot data set, minimize query response time, and maximize throughput without global co-ordination. In this paper, we introduce the Data Cyclotron architecture which addresses the challenges using turbulent data movement through a storage ring built from distributed main memory capitalizing modern remote-DMA facilities. Queries assigned to individual nodes interact with the Data Cyclotron by picking up data fragments continuously flowing around, i.e., the hot set. Each data fragment carries a level of interest (LOI) metric, which represents the cumulative query interest as the fragment passes around the ring multiple times. A fragment with a LOI below a given threshold, inversely proportional to the ring load, is pulled o

    The Database Architectures Research Group at CWI

    Get PDF
    The Database research group at CWI was established in 1985. It has steadily grown from two PhD students to a group of 17 people ultimo 2011. The group is supported by a scientific programmer and a system engineer to keep our machines running. In this short note, we look back at our past and highlight the multitude of topics being addressed

    Solving the Optimal Trading Trajectory Problem Using a Quantum Annealer

    Get PDF
    We solve a multi-period portfolio optimization problem using D-Wave Systems' quantum annealer. We derive a formulation of the problem, discuss several possible integer encoding schemes, and present numerical examples that show high success rates. The formulation incorporates transaction costs (including permanent and temporary market impact), and, significantly, the solution does not require the inversion of a covariance matrix. The discrete multi-period portfolio optimization problem we solve is significantly harder than the continuous variable problem. We present insight into how results may be improved using suitable software enhancements, and why current quantum annealing technology limits the size of problem that can be successfully solved today. The formulation presented is specifically designed to be scalable, with the expectation that as quantum annealing technology improves, larger problems will be solvable using the same techniques.Comment: 7 pages; expanded and update

    MonetDB: Two Decades of Research in Column-oriented Database Architectures

    Get PDF
    MonetDB is a state-of-the-art open-source column-store database management system targeting applications in need for analytics over large collections of data. MonetDB is actively used nowadays in health care, in telecommunications as well as in scientific databases and in data management research, accumulating on average more than 10,000 downloads on a monthly basis. This paper gives a brief overview of the MonetDB technology as it developed over the past two decades and the main research highlights which drive the current MonetDB design and form the basis for its future evolution

    Just-in-time Data Distribution for Analytical Query Processing

    Get PDF
    Distributed processing commonly requires data spread across machines using a priori static or hash-based data allocation. In this paper, we explore an alternative approach that starts from a master node in control of the complete database, and a variable number of worker nodes for delegated query processing. Data is shipped just-in-time to the worker nodes using a need to know policy, and is being reused, if possible, in subsequent queries. A bidding mechanism among the workers yields a scheduling with the most efficient reuse of previously shipped data, minimizing the data transfer costs. Just-in-time data shipment allows our system to benefit from locally available idle resources to boost overall performance. The system is maintenance-free and allocation is fully transparent to users. Our experiments show that the proposed adaptive distributed architecture is a viable and flexible alternative for small scale MapReduce-type of settings

    Assessment of Metabolome Annotation Quality: A Method for Evaluating the False Discovery Rate of Elemental Composition Searches

    Get PDF
    BACKGROUND: In metabolomics researches using mass spectrometry (MS), systematic searching of high-resolution mass data against compound databases is often the first step of metabolite annotation to determine elemental compositions possessing similar theoretical mass numbers. However, incorrect hits derived from errors in mass analyses will be included in the results of elemental composition searches. To assess the quality of peak annotation information, a novel methodology for false discovery rates (FDR) evaluation is presented in this study. Based on the FDR analyses, several aspects of an elemental composition search, including setting a threshold, estimating FDR, and the types of elemental composition databases most reliable for searching are discussed. METHODOLOGY/PRINCIPAL FINDINGS: The FDR can be determined from one measured value (i.e., the hit rate for search queries) and four parameters determined by Monte Carlo simulation. The results indicate that relatively high FDR values (30-50%) were obtained when searching time-of-flight (TOF)/MS data using the KNApSAcK and KEGG databases. In addition, searches against large all-in-one databases (e.g., PubChem) always produced unacceptable results (FDR >70%). The estimated FDRs suggest that the quality of search results can be improved not only by performing more accurate mass analysis but also by modifying the properties of the compound database. A theoretical analysis indicates that FDR could be improved by using compound database with smaller but higher completeness entries. CONCLUSIONS/SIGNIFICANCE: High accuracy mass analysis, such as Fourier transform (FT)-MS, is needed for reliable annotation (FDR <10%). In addition, a small, customized compound database is preferable for high-quality annotation of metabolome data
    corecore