163 research outputs found
LiteMat: a scalable, cost-efficient inference encoding scheme for large RDF graphs
The number of linked data sources and the size of the linked open data graph
keep growing every day. As a consequence, semantic RDF services are more and
more confronted with various "big data" problems. Query processing in the
presence of inferences is one them. For instance, to complete the answer set of
SPARQL queries, RDF database systems evaluate semantic RDFS relationships
(subPropertyOf, subClassOf) through time-consuming query rewriting algorithms
or space-consuming data materialization solutions. To reduce the memory
footprint and ease the exchange of large datasets, these systems generally
apply a dictionary approach for compressing triple data sizes by replacing
resource identifiers (IRIs), blank nodes and literals with integer values. In
this article, we present a structured resource identification scheme using a
clever encoding of concepts and property hierarchies for efficiently evaluating
the main common RDFS entailment rules while minimizing triple materialization
and query rewriting. We will show how this encoding can be computed by a
scalable parallel algorithm and directly be implemented over the Apache Spark
framework. The efficiency of our encoding scheme is emphasized by an evaluation
conducted over both synthetic and real world datasets.Comment: 8 pages, 1 figur
Schulentwicklung mit Chor- und Bläserklassen. Eine qualitative Fallstudie am "Evangelischen Gymnasium am Dom zu Brandenburg"
Der vorliegende Tagungsbeitrag ist eingebettet in die begleitende Forschung zur Ganztagsschulentwicklung in Deutschland. Er basiert auf grundlegenden Erkenntnissen, die im Rahmen der „Studie zur musisch-kulturellen Bildung an Ganztagsschulen (MUKUS)“ gewonnen werden konnten. Gegenstand der Arbeit ist die qualitativ empirische Erforschung einer Schule in konfessioneller Trägerschaft im Bundesland Brandenburg. Zentrales Anliegen der Arbeit war es gerade nicht, Effizienz und Leistung einzelner schulischer Akteure oder etwa eines schulischen Systems zu einem Zeitpunkt X zu messen. Vielmehr sollten Wechselwirkungen, Veränderungen und dynamische Interaktionen sowie hemmende oder fördernde Faktoren, denen sich eine Schule im Zusammenhang mit Schulentwicklungsprozessen gegenüber sieht, rekonstruiert werden. (DIPF/Orig.
ATEM: A Topic Evolution Model for the Detection of Emerging Topics in Scientific Archives
This paper presents ATEM, a novel framework for studying topic evolution in
scientific archives. ATEM is based on dynamic topic modeling and dynamic graph
embedding techniques that explore the dynamics of content and citations of
documents within a scientific corpus. ATEM explores a new notion of contextual
emergence for the discovery of emerging interdisciplinary research topics based
on the dynamics of citation links in topic clusters. Our experiments show that
ATEM can efficiently detect emerging cross-disciplinary topics within the DBLP
archive of over five million computer science articles
ANTM: An Aligned Neural Topic Model for Exploring Evolving Topics
This paper presents an algorithmic family of dynamic topic models called
Aligned Neural Topic Models (ANTM), which combine novel data mining algorithms
to provide a modular framework for discovering evolving topics. ANTM maintains
the temporal continuity of evolving topics by extracting time-aware features
from documents using advanced pre-trained Large Language Models (LLMs) and
employing an overlapping sliding window algorithm for sequential document
clustering. This overlapping sliding window algorithm identifies a different
number of topics within each time frame and aligns semantically similar
document clusters across time periods. This process captures emerging and
fading trends across different periods and allows for a more interpretable
representation of evolving topics. Experiments on four distinct datasets show
that ANTM outperforms probabilistic dynamic topic models in terms of topic
coherence and diversity metrics. Moreover, it improves the scalability and
flexibility of dynamic topic models by being accessible and adaptable to
different types of algorithms. Additionally, a Python package is developed for
researchers and scientists who wish to study the trends and evolving patterns
of topics in large-scale textual data
On Distributed SPARQL Query Processing Using Triangles of RDF Triples
Knowledge Graphs are providing valuable functionalities, such as data integration and reasoning, to an increasing number of applications in all kinds of companies. These applications partly depend on the efficiency of a Knowledge Graph management system which is often based on the RDF data model and queried with SPARQL. In this context, query performance is preponderant and relies on an optimizer that usually makes an intensive usage of a large set of indexes. Generally, these indexes correspond to different re-orderings of the subject, predicate and object of a triple pattern. In this work, we present a novel approach that considers indexes formed by a frequently encountered basic graph pattern: triangle of triples. We propose dedicated data structures to store these triangles, provide distributed algorithms to discover and materialize them, including inferred triangles, and detail query optimization techniques, including a data partitioning approach for bias data. We provide an implementation that runs on top of Apache Spark and experiment on two real-world RDF data sets. This evaluation emphasizes the performance boost (up to 40x on query processing) that one can obtain by using our approach when facing triangles of triples
- …
