Search CORE

46 research outputs found

Log Parsing Evaluation in the Era of Modern Software Systems

Author: Hengst Floris den
Petrescu Stefan
Rellermeyer Jan S.
Uta Alexandru
Publication venue
Publication date: 17/08/2023
Field of study

Due to the complexity and size of modern software systems, the amount of logs generated is tremendous. Hence, it is infeasible to manually investigate these data in a reasonable time, thereby requiring automating log analysis to derive insights about the functioning of the systems. Motivated by an industry use-case, we zoom-in on one integral part of automated log analysis, log parsing, which is the prerequisite to deriving any insights from logs. Our investigation reveals problematic aspects within the log parsing field, particularly its inefficiency in handling heterogeneous real-world logs. We show this by assessing the 14 most-recognized log parsing approaches in the literature using (i) nine publicly available datasets, (ii) one dataset comprised of combined publicly available data, and (iii) one dataset generated within the infrastructure of a large bank. Subsequently, toward improving log parsing robustness in real-world production scenarios, we propose a tool, Logchimera, that enables estimating log parsing performance in industry contexts through generating synthetic log data that resemble industry logs. Our contributions serve as a foundation to consolidate past research efforts, facilitate future research advancements, and establish a strong link between research and industry log parsing

arXiv.org e-Print Archive

Log Parsing Evaluation in the Era of Modern Software Systems

Author: Hengst Floris den
Petrescu Stefan
Rellermeyer Jan S.
Uta Alexandru
Publication venue
Publication date: 17/08/2023
Field of study

VU Research Portal

Is big data performance reproducible in modern cloud networks?

Author: Custura Alexandru
Duplyakin Dmitry
Iosup Alexandru
Jimenez Ivo
Maltzahn Carlos
Rellermeyer Jan
Ricci Robert
Uta Alexandru
Publication venue: USENIX Association
Publication date: 01/02/2020
Field of study

VU Research Portal

The Performance of Distributed Applications: A Traffic Shaping Perspective

Author: Cardellini Valeria
Di Marco Antinisca
Hasenoot Jasper A.
Rellermeyer Jan S.
Tuma Petr
Uta Alexandru
Vieira Marco
Publication venue: New York, NY : Association for Computing Machinery
Publication date: 01/01/2023
Field of study

Widely used in datacenters and clouds, network traffic shaping is a performance influencing factor that is often overlooked when benchmarking or simply deploying distributed applications. While in theory traffic shaping should allow for a fairer sharing of network resources, in practice it also introduces new problems: performance (measurement) inconsistency and long tails. In this paper we investigate the effects of traffic shaping mechanisms on common distributed applications. We characterize the performance of a distributed key-value store, big data workloads, and high-performance computing under state-of-the-art benchmarks, while the underlying network's traffic is shaped using state-of-the-art mechanisms such as token-buckets or priority queues. Our results show that the impact of traffic shaping needs to be taken into account when benchmarking or deploying distributed applications. To help researchers, practitioners, and application developers we uncover several practical implications and make recommendations on how certain applications are to be deployed so that performance is least impacted by the shaping protocols

Institutionelles Repositorium der Leibniz Universität Hannover

In-Memory Indexed Caching for Distributed Data Processing

Author: Boncz P.A. (Peter)
Dave A. (Ankur)
Ghit B. (Bogdan)
Rellermeyer J. (Jan)
Uta A. (Alexandru)
Publication venue
Publication date: 12/12/2021
Field of study

Powerful abstractions such as dataframes are only as efficient as their underlying runtime system. The de-facto distributed data processing framework, Apache Spark, is poorly suited for the modern cloud-based data-science workloads due to its outdated assumptions: static datasets analyzed using coarse-grained transformations. In this paper, we introduce the Indexed DataFrame, an in-memory cache that supports a dataframe abstraction which incorporates indexing capabilities to support fast lookup and join operations. Moreover, it supports appends with multi-version concurrency control. We implement the Indexed DataFrame as a lightweight, standalone library which can be integrated with minimum effort in existing Spark programs. We analyze the performance of the Indexed DataFrame in cluster and cloud deployments with real-world datasets and benchmarks using both Apache Spark and Databricks Runtime. In our evaluation, we show that the Indexed DataFrame significantly speeds-up query execution when compared to a non-indexed dataframe, incurring modest memory overhead

CWI's Institutional Repository

Modularity as a systems design principle

Author: Rellermeyer Jan
Publication venue: ETH
Publication date: 01/01/2011
Field of study

Repository for Publications and Research Data