Search CORE

163 research outputs found

LiteMat: a scalable, cost-efficient inference encoding scheme for large RDF graphs

Author: Amann Bernd
Curé Olivier
Naacke Hubert
Randriamalala Tendry
Publication venue
Publication date: 12/10/2015
Field of study

The number of linked data sources and the size of the linked open data graph keep growing every day. As a consequence, semantic RDF services are more and more confronted with various "big data" problems. Query processing in the presence of inferences is one them. For instance, to complete the answer set of SPARQL queries, RDF database systems evaluate semantic RDFS relationships (subPropertyOf, subClassOf) through time-consuming query rewriting algorithms or space-consuming data materialization solutions. To reduce the memory footprint and ease the exchange of large datasets, these systems generally apply a dictionary approach for compressing triple data sizes by replacing resource identifiers (IRIs), blank nodes and literals with integer values. In this article, we present a structured resource identification scheme using a clever encoding of concepts and property hierarchies for efficiently evaluating the main common RDFS entailment rules while minimizing triple materialization and query rewriting. We will show how this encoding can be computed by a scalable parallel algorithm and directly be implemented over the Apache Spark framework. The efficiency of our encoding scheme is emphasized by an evaluation conducted over both synthetic and real world datasets.Comment: 8 pages, 1 figur

arXiv.org e-Print Archive

Crossref

HAL: Hyper Article en Ligne

Hal-Diderot

HAL-Ecole des Ponts ParisTech

HAL - UPEC / UPEM

Current trends in the European asset management industry: Lot 1

Author: Borell Mariela
Davydoff Didier
Naacke Grégoire
Schröder Michael
Publication venue: Mannheim: Zentrum für Europäische Wirtschaftsforschung (ZEW)
Publication date: 01/01/2006
Field of study

EconStor (ZBW Kiel)

Schulentwicklung mit Chor- und Bläserklassen. Eine qualitative Fallstudie am "Evangelischen Gymnasium am Dom zu Brandenburg"

Author: Naacke Susanne
Publication venue: pedocs-Dokumentenserver/DIPF
Publication date: 01/01/2010
Field of study

Der vorliegende Tagungsbeitrag ist eingebettet in die begleitende Forschung zur Ganztagsschulentwicklung in Deutschland. Er basiert auf grundlegenden Erkenntnissen, die im Rahmen der „Studie zur musisch-kulturellen Bildung an Ganztagsschulen (MUKUS)“ gewonnen werden konnten. Gegenstand der Arbeit ist die qualitativ empirische Erforschung einer Schule in konfessioneller Trägerschaft im Bundesland Brandenburg. Zentrales Anliegen der Arbeit war es gerade nicht, Effizienz und Leistung einzelner schulischer Akteure oder etwa eines schulischen Systems zu einem Zeitpunkt X zu messen. Vielmehr sollten Wechselwirkungen, Veränderungen und dynamische Interaktionen sowie hemmende oder fördernde Faktoren, denen sich eine Schule im Zusammenhang mit Schulentwicklungsprozessen gegenüber sieht, rekonstruiert werden. (DIPF/Orig.

Fachlicher Dokumentenserver Paedagogik/Erziehungswissenschaften

ATEM: A Topic Evolution Model for the Detection of Emerging Topics in Scientific Archives

Author: Amann Bernd
Constantin Camelia
Naacke Hubert
Rahimi Hamed
Publication venue
Publication date: 03/06/2023
Field of study

This paper presents ATEM, a novel framework for studying topic evolution in scientific archives. ATEM is based on dynamic topic modeling and dynamic graph embedding techniques that explore the dynamics of content and citations of documents within a scientific corpus. ATEM explores a new notion of contextual emergence for the discovery of emerging interdisciplinary research topics based on the dynamics of citation links in topic clusters. Our experiments show that ATEM can efficiently detect emerging cross-disciplinary topics within the DBLP archive of over five million computer science articles

arXiv.org e-Print Archive

ANTM: An Aligned Neural Topic Model for Exploring Evolving Topics

Author: Amann Bernd
Constantin Camelia
Naacke Hubert
Rahimi Hamed
Publication venue
Publication date: 04/06/2023
Field of study

This paper presents an algorithmic family of dynamic topic models called Aligned Neural Topic Models (ANTM), which combine novel data mining algorithms to provide a modular framework for discovering evolving topics. ANTM maintains the temporal continuity of evolving topics by extracting time-aware features from documents using advanced pre-trained Large Language Models (LLMs) and employing an overlapping sliding window algorithm for sequential document clustering. This overlapping sliding window algorithm identifies a different number of topics within each time frame and aligns semantically similar document clusters across time periods. This process captures emerging and fading trends across different periods and allows for a more interpretable representation of evolving topics. Experiments on four distinct datasets show that ANTM outperforms probabilistic dynamic topic models in terms of topic coherence and diversity metrics. Moreover, it improves the scalability and flexibility of dynamic topic models by being accessible and adaptable to different types of algorithms. Additionally, a Python package is developed for researchers and scientists who wish to study the trends and evolving patterns of topics in large-scale textual data

arXiv.org e-Print Archive

On Distributed SPARQL Query Processing Using Triangles of RDF Triples

Author: Hubert Naacke
Olivier Curé
Publication venue: RonPub
Publication date: 01/01/2020
Field of study

Knowledge Graphs are providing valuable functionalities, such as data integration and reasoning, to an increasing number of applications in all kinds of companies. These applications partly depend on the efficiency of a Knowledge Graph management system which is often based on the RDF data model and queried with SPARQL. In this context, query performance is preponderant and relies on an optimizer that usually makes an intensive usage of a large set of indexes. Generally, these indexes correspond to different re-orderings of the subject, predicate and object of a triple pattern. In this work, we present a novel approach that considers indexes formed by a frequently encountered basic graph pattern: triangle of triples. We propose dedicated data structures to store these triangles, provide distributed algorithms to discover and materialize them, including inferred triangles, and detail query optimization techniques, including a data partitioning approach for bias data. We provide an implementation that runs on top of Apache Spark and experiment on two real-world RDF data sets. This evaluation emphasizes the performance boost (up to 40x on query processing) that one can obtain by using our approach when facing triangles of triples

RonPub -- Research Online Publishing

HAL-Ecole des Ponts ParisTech