903 research outputs found
The data cyclotron query processing scheme
Distributed database systems exploit static workload characteristics to steer data fragmentation and data allocation schemes. However, the grand challenge of distributed query processing is to come up with a self-organizing architecture, which exploits all resources to manage the hot data set, minimize query response time, and maximize throughput without global co-ordination.
In this paper, we introduce the Data Cyclotron architecture which addresses the challenges using turbulent data movement through a storage ring built from distributed main memory capitalizing modern remote-DMA facilities. Queries assigned to individual nodes interact with the Data Cyclotron by picking up data fragments continuously flowing around, i.e., the hot set.
Each data fragment carries a level of interest (LOI) metric, which represents the cumulative query interest as the fragment passes around the ring multiple times. A fragment with a LOI below a given threshold, inversely proportional to the ring load, is pulled o
The Database Architectures Research Group at CWI
The Database research group at CWI was established in 1985. It has steadily grown from two PhD students to a group of 17 people ultimo 2011. The group is supported by a scientific programmer and a system engineer to keep our machines running. In this short note, we look back at our past and highlight the multitude of topics being addressed
Solving the Optimal Trading Trajectory Problem Using a Quantum Annealer
We solve a multi-period portfolio optimization problem using D-Wave Systems'
quantum annealer. We derive a formulation of the problem, discuss several
possible integer encoding schemes, and present numerical examples that show
high success rates. The formulation incorporates transaction costs (including
permanent and temporary market impact), and, significantly, the solution does
not require the inversion of a covariance matrix. The discrete multi-period
portfolio optimization problem we solve is significantly harder than the
continuous variable problem. We present insight into how results may be
improved using suitable software enhancements, and why current quantum
annealing technology limits the size of problem that can be successfully solved
today. The formulation presented is specifically designed to be scalable, with
the expectation that as quantum annealing technology improves, larger problems
will be solvable using the same techniques.Comment: 7 pages; expanded and update
MonetDB: Two Decades of Research in Column-oriented Database Architectures
MonetDB is a state-of-the-art open-source column-store database management system targeting applications in need for analytics over large collections of data. MonetDB is actively used nowadays in
health care, in telecommunications as well as in scientific databases and in data management research,
accumulating on average more than 10,000 downloads on a monthly basis. This paper gives a brief
overview of the MonetDB technology as it developed over the past two decades and the main research
highlights which drive the current MonetDB design and form the basis for its future evolution
Just-in-time Data Distribution for Analytical Query Processing
Distributed processing commonly requires data spread across machines using a
priori static or hash-based data allocation. In this paper, we explore
an alternative approach that starts from a master node in control of the
complete database, and a variable number of worker nodes for delegated
query processing. Data is shipped just-in-time to the worker nodes using
a need to know policy, and is being reused, if possible, in subsequent
queries. A bidding mechanism among the workers yields a scheduling with
the most efficient reuse of previously shipped data, minimizing the data
transfer costs.
Just-in-time data shipment allows our system to benefit from locally
available idle resources to boost overall performance. The system is
maintenance-free and allocation is fully transparent to users. Our
experiments show that the proposed adaptive distributed architecture is a
viable and flexible alternative for small scale MapReduce-type of
settings
Assessment of Metabolome Annotation Quality: A Method for Evaluating the False Discovery Rate of Elemental Composition Searches
BACKGROUND: In metabolomics researches using mass spectrometry (MS), systematic searching of high-resolution mass data against compound databases is often the first step of metabolite annotation to determine elemental compositions possessing similar theoretical mass numbers. However, incorrect hits derived from errors in mass analyses will be included in the results of elemental composition searches. To assess the quality of peak annotation information, a novel methodology for false discovery rates (FDR) evaluation is presented in this study. Based on the FDR analyses, several aspects of an elemental composition search, including setting a threshold, estimating FDR, and the types of elemental composition databases most reliable for searching are discussed. METHODOLOGY/PRINCIPAL FINDINGS: The FDR can be determined from one measured value (i.e., the hit rate for search queries) and four parameters determined by Monte Carlo simulation. The results indicate that relatively high FDR values (30-50%) were obtained when searching time-of-flight (TOF)/MS data using the KNApSAcK and KEGG databases. In addition, searches against large all-in-one databases (e.g., PubChem) always produced unacceptable results (FDR >70%). The estimated FDRs suggest that the quality of search results can be improved not only by performing more accurate mass analysis but also by modifying the properties of the compound database. A theoretical analysis indicates that FDR could be improved by using compound database with smaller but higher completeness entries. CONCLUSIONS/SIGNIFICANCE: High accuracy mass analysis, such as Fourier transform (FT)-MS, is needed for reliable annotation (FDR <10%). In addition, a small, customized compound database is preferable for high-quality annotation of metabolome data
- …