Search CORE

3,854 research outputs found

On the Verge of One Petabyte - the Story Behind the BaBar Database System

Author: Adesanya Adeyemi
Azemoon Tofigh
Becla Jacek
Gaponenko Igor
Hanushevsky Andrew
Hasan Adil
Kroeger Wilko
Patton Simon
Quarrie David
Trunov Artem
Wang Daniel
Publication venue
Publication date: 01/01/2003
Field of study

The BaBar database has pioneered the use of a commercial ODBMS within the HEP community. The unique object-oriented architecture of Objectivity/DB has made it possible to manage over 700 terabytes of production data generated since May'99, making the BaBar database the world's largest known database. The ongoing development includes new features, addressing the ever-increasing luminosity of the detector as well as other changing physics requirements. Significant efforts are focused on reducing space requirements and operational costs. The paper discusses our experience with developing a large scale database system, emphasizing universal aspects which may be applied to any large scale system, independently of underlying technology used.Comment: Talk from the 2003 Computing in High Energy and Nuclear Physics (CHEP03), La Jolla, Ca, USA, March 2003, 6 pages. PSN MOKT01

arXiv.org e-Print Archive

CiteSeerX

A framework for resource-aware knowledge discovery in data streams: a holistic approach with its application to clustering

Author: Gaber M.
Yu P.
Publication venue
Publication date: 01/01/2006
Field of study

Portsmouth University Research Portal (Pure)

A holistic approach for resource-aware adaptive data stream mining

Author: Gaber M.
Yu P.
Publication venue
Publication date: 01/01/2006
Field of study

Portsmouth University Research Portal (Pure)

Scalable Model-Based Management of Correlated Dimensional Time Series in ModelarDB+

Author: Jensen Søren Kejser
Pedersen Torben Bach
Thomsen Christian
Publication venue
Publication date: 01/01/2019
Field of study

To monitor critical infrastructure, high quality sensors sampled at a high frequency are increasingly used. However, as they produce huge amounts of data, only simple aggregates are stored. This removes outliers and fluctuations that could indicate problems. As a remedy, we present a model-based approach for managing time series with dimensions that exploits correlation in and among time series. Specifically, we propose compressing groups of correlated time series using an extensible set of model types within a user-defined error bound (possibly zero). We name this new category of model-based compression methods for time series Multi-Model Group Compression (MMGC). We present the first MMGC method GOLEMM and extend model types to compress time series groups. We propose primitives for users to effectively define groups for differently sized data sets, and based on these, an automated grouping method using only the time series dimensions. We propose algorithms for executing simple and multi-dimensional aggregate queries on models. Last, we implement our methods in the Time Series Management System (TSMS) ModelarDB (ModelarDB+). Our evaluation shows that compared to widely used formats, ModelarDB+ provides up to 13.7 times faster ingestion due to high compression, 113 times better compression due to the adaptivity of GOLEMM, 630 times faster aggregates by using models, and close to linear scalability. It is also extensible and supports online query processing.Comment: 12 Pages, 28 Figures, and 1 Tabl

arXiv.org e-Print Archive

Crossref

VBN

Data assurance in opaque computations

Author: A.J. Roycroft
A.J. Roycroft
A.J. Roycroft
A.J. Roycroft
A.O. Herbstman
C. Wirth
E.A. Heinz
E.A. Komissarchik
H.J. Herik
H.J. Herik
J. Schaeffer
J. Schaeffer
K. Thompson
K. Thompson
L.B. Stiller
M. Fierz
M.J.C. Gordon
M.J.C. Gordon
N. Shadbolt
P. McCorduck
R. Sattler
R. Wu
R.E. Bryant
T. Coe
X. Leroy
Publication venue: 'Springer Science and Business Media LLC'
Publication date: 01/01/2010
Field of study

The chess endgame is increasingly being seen through the lens of, and therefore effectively defined by, a data ‘model’ of itself. It is vital that such models are clearly faithful to the reality they purport to represent. This paper examines that issue and systems engineering responses to it, using the chess endgame as the exemplar scenario. A structured survey has been carried out of the intrinsic challenges and complexity of creating endgame data by reviewing the past pattern of errors during work in progress, surfacing in publications and occurring after the data was generated. Specific measures are proposed to counter observed classes of error-risk, including a preliminary survey of techniques for using state-of-the-art verification tools to generate EGTs that are correct by construction. The approach may be applied generically beyond the game domain

Central Archive at the University of Reading

CiteSeerX

Crossref