2,474 research outputs found
Software Challenges For HL-LHC Data Analysis
The high energy physics community is discussing where investment is needed to
prepare software for the HL-LHC and its unprecedented challenges. The ROOT
project is one of the central software players in high energy physics since
decades. From its experience and expectations, the ROOT team has distilled a
comprehensive set of areas that should see research and development in the
context of data analysis software, for making best use of HL-LHC's physics
potential. This work shows what these areas could be, why the ROOT team
believes investing in them is needed, which gains are expected, and where
related work is ongoing. It can serve as an indication for future research
proposals and cooperations
Improving memory efficiency for processing large-scale models
International audienceScalability is a main obstacle for applying Model-Driven Engineering to reverse engineering, or to any other activity manipulating large models. Existing solutions to persist and query large models are currently ine cient and strongly linked to memory availability. In this paper, we propose a memory unload strategy for Neo4EMF, a persistence layer built on top of the Eclipse Modeling Framework and based on a Neo4j database backend. Our solution allows us to partially unload a model during the execution of a query by using a periodical dirty saving mechanism and transparent reloading. Our experiments show that this approach enables to query large models in a restricted amount of memory with an acceptable performance
Survey over Existing Query and Transformation Languages
A widely acknowledged obstacle for realizing the vision of the Semantic Web is the inability
of many current Semantic Web approaches to cope with data available in such diverging
representation formalisms as XML, RDF, or Topic Maps. A common query language is the first
step to allow transparent access to data in any of these formats. To further the understanding
of the requirements and approaches proposed for query languages in the conventional as well
as the Semantic Web, this report surveys a large number of query languages for accessing
XML, RDF, or Topic Maps. This is the first systematic survey to consider query languages from
all these areas. From the detailed survey of these query languages, a common classification
scheme is derived that is useful for understanding and differentiating languages within and
among all three areas
Optimistic Concurrency Control for Distributed Unsupervised Learning
Research on distributed machine learning algorithms has focused primarily on
one of two extremes - algorithms that obey strict concurrency constraints or
algorithms that obey few or no such constraints. We consider an intermediate
alternative in which algorithms optimistically assume that conflicts are
unlikely and if conflicts do arise a conflict-resolution protocol is invoked.
We view this "optimistic concurrency control" paradigm as particularly
appropriate for large-scale machine learning algorithms, particularly in the
unsupervised setting. We demonstrate our approach in three problem areas:
clustering, feature learning and online facility location. We evaluate our
methods via large-scale experiments in a cluster computing environment.Comment: 25 pages, 5 figure
Oze: Decentralized Graph-based Concurrency Control for Real-world Long Transactions on BoM Benchmark
In this paper, we propose Oze, a new concurrency control protocol that
handles heterogeneous workloads which include long-running update transactions.
Oze explores a large scheduling space using a fully precise multi-version
serialization graph to reduce false positives. Oze manages the graph in a
decentralized manner to exploit many cores in modern servers. We also propose a
new OLTP benchmark, BoMB (Bill of Materials Benchmark), based on a use case in
an actual manufacturing company. BoMB consists of one long-running update
transaction and five short transactions that conflict with each other.
Experiments using BoMB show that Oze keeps the abort rate of the long-running
update transaction at zero while reaching up to 1.7 Mtpm for short transactions
with near linear scalability, whereas state-of-the-art protocols cannot commit
the long transaction or experience performance degradation in short transaction
throughput
Storage Solutions for Big Data Systems: A Qualitative Study and Comparison
Big data systems development is full of challenges in view of the variety of
application areas and domains that this technology promises to serve.
Typically, fundamental design decisions involved in big data systems design
include choosing appropriate storage and computing infrastructures. In this age
of heterogeneous systems that integrate different technologies for optimized
solution to a specific real world problem, big data system are not an exception
to any such rule. As far as the storage aspect of any big data system is
concerned, the primary facet in this regard is a storage infrastructure and
NoSQL seems to be the right technology that fulfills its requirements. However,
every big data application has variable data characteristics and thus, the
corresponding data fits into a different data model. This paper presents
feature and use case analysis and comparison of the four main data models
namely document oriented, key value, graph and wide column. Moreover, a feature
analysis of 80 NoSQL solutions has been provided, elaborating on the criteria
and points that a developer must consider while making a possible choice.
Typically, big data storage needs to communicate with the execution engine and
other processing and visualization technologies to create a comprehensive
solution. This brings forth second facet of big data storage, big data file
formats, into picture. The second half of the research paper compares the
advantages, shortcomings and possible use cases of available big data file
formats for Hadoop, which is the foundation for most big data computing
technologies. Decentralized storage and blockchain are seen as the next
generation of big data storage and its challenges and future prospects have
also been discussed
- …