Search CORE

2,474 research outputs found

Software Challenges For HL-LHC Data Analysis

Author: Amadio Guilherme
An Sitong
Bellenot Bertrand
Blomer Jakob
Brann Kim Albertsson
Canal Philippe
Couet Olivier
Galli Massimiliano
Guiraud Enrico
Hageboeck Stephan
Linev Sergey
Moneta Lorenzo
Naumann Axel
Padulano Vincenzo Eduardo
Pla Xavier Valls
Rademakers Fons
ROOT Team
Saavedra Enric Tejedor
Shadura Oksana
Tadel Alja Mrak
Tadel Matevz
Vassilev Vassil
Vila Pere Mato
Wunsch Stefan
Publication venue
Publication date: 04/05/2020
Field of study

The high energy physics community is discussing where investment is needed to prepare software for the HL-LHC and its unprecedented challenges. The ROOT project is one of the central software players in high energy physics since decades. From its experience and expectations, the ROOT team has distilled a comprehensive set of areas that should see research and development in the context of data analysis software, for making best use of HL-LHC's physics potential. This work shows what these areas could be, why the ROOT team believes investing in them is needed, which gains are expected, and where related work is ongoing. It can serve as an indication for future research proposals and cooperations

arXiv.org e-Print Archive

CERN Document Server

Improving memory efficiency for processing large-scale models

Author: Benelallam Amine
Daniel Gwendal
Sunyé Gerson
Tisi Massimo
Publication venue: HAL CCSD
Publication date: 24/07/2014
Field of study

International audienceScalability is a main obstacle for applying Model-Driven Engineering to reverse engineering, or to any other activity manipulating large models. Existing solutions to persist and query large models are currently ine cient and strongly linked to memory availability. In this paper, we propose a memory unload strategy for Neo4EMF, a persistence layer built on top of the Eclipse Modeling Framework and based on a Neo4j database backend. Our solution allows us to partially unload a model during the execution of a query by using a periodical dirty saving mechanism and transparent reloading. Our experiments show that this approach enables to query large models in a restricted amount of memory with an acceptable performance

INRIA a CCSD electronic archive server

HAL Mines Nantes

Survey over Existing Query and Transformation Languages

Author: Bolzer Oliver
Bry François
Furche Tim
Horrocks Ian
Kraus Michael
Orsini Renzo
Schaffert Sebastian
Publication venue
Publication date: 01/01/2004
Field of study

A widely acknowledged obstacle for realizing the vision of the Semantic Web is the inability of many current Semantic Web approaches to cope with data available in such diverging representation formalisms as XML, RDF, or Topic Maps. A common query language is the first step to allow transparent access to data in any of these formats. To further the understanding of the requirements and approaches proposed for query languages in the conventional as well as the Semantic Web, this report surveys a large number of query languages for accessing XML, RDF, or Topic Maps. This is the first systematic survey to consider query languages from all these areas. From the detailed survey of these query languages, a common classification scheme is derived that is useful for understanding and differentiating languages within and among all three areas

CiteSeerX

Open Access LMU

Optimistic Concurrency Control for Distributed Unsupervised Learning

Author: Broderick Tamara
Gonzalez Joseph E.
Jegelka Stefanie
Jordan Michael I.
Pan Xinghao
Publication venue
Publication date: 01/01/2013
Field of study

Research on distributed machine learning algorithms has focused primarily on one of two extremes - algorithms that obey strict concurrency constraints or algorithms that obey few or no such constraints. We consider an intermediate alternative in which algorithms optimistically assume that conflicts are unlikely and if conflicts do arise a conflict-resolution protocol is invoked. We view this "optimistic concurrency control" paradigm as particularly appropriate for large-scale machine learning algorithms, particularly in the unsupervised setting. We demonstrate our approach in three problem areas: clustering, feature learning and online facility location. We evaluate our methods via large-scale experiments in a cluster computing environment.Comment: 25 pages, 5 figure

arXiv.org e-Print Archive

CiteSeerX

Oze: Decentralized Graph-based Concurrency Control for Real-world Long Transactions on BoM Benchmark

Author: Hoshino Takashi
Kambayashi Takashi
Kawashima Hideyuki
Nemoto Jun
Publication venue
Publication date: 09/10/2022
Field of study

In this paper, we propose Oze, a new concurrency control protocol that handles heterogeneous workloads which include long-running update transactions. Oze explores a large scheduling space using a fully precise multi-version serialization graph to reduce false positives. Oze manages the graph in a decentralized manner to exploit many cores in modern servers. We also propose a new OLTP benchmark, BoMB (Bill of Materials Benchmark), based on a use case in an actual manufacturing company. BoMB consists of one long-running update transaction and five short transactions that conflict with each other. Experiments using BoMB show that Oze keeps the abort rate of the long-running update transaction at zero while reaching up to 1.7 Mtpm for short transactions with near linear scalability, whereas state-of-the-art protocols cannot commit the long transaction or experience performance degradation in short transaction throughput

arXiv.org e-Print Archive

Storage Solutions for Big Data Systems: A Qualitative Study and Comparison

Author: Alam Mansaf
Ali Syed Arshad
Khan Samiya
Liu Xiufeng
Publication venue
Publication date: 01/01/2019
Field of study

Big data systems development is full of challenges in view of the variety of application areas and domains that this technology promises to serve. Typically, fundamental design decisions involved in big data systems design include choosing appropriate storage and computing infrastructures. In this age of heterogeneous systems that integrate different technologies for optimized solution to a specific real world problem, big data system are not an exception to any such rule. As far as the storage aspect of any big data system is concerned, the primary facet in this regard is a storage infrastructure and NoSQL seems to be the right technology that fulfills its requirements. However, every big data application has variable data characteristics and thus, the corresponding data fits into a different data model. This paper presents feature and use case analysis and comparison of the four main data models namely document oriented, key value, graph and wide column. Moreover, a feature analysis of 80 NoSQL solutions has been provided, elaborating on the criteria and points that a developer must consider while making a possible choice. Typically, big data storage needs to communicate with the execution engine and other processing and visualization technologies to create a comprehensive solution. This brings forth second facet of big data storage, big data file formats, into picture. The second half of the research paper compares the advantages, shortcomings and possible use cases of available big data file formats for Hadoop, which is the foundation for most big data computing technologies. Decentralized storage and blockchain are seen as the next generation of big data storage and its challenges and future prospects have also been discussed

arXiv.org e-Print Archive

Online Research Database In Technology