140 research outputs found
PlayMyData: a curated dataset of multi-platform video games
Being predominant in digital entertainment for decades, video games have been
recognized as valuable software artifacts by the software engineering (SE)
community just recently. Such an acknowledgment has unveiled several research
opportunities, spanning from empirical studies to the application of AI
techniques for classification tasks. In this respect, several curated game
datasets have been disclosed for research purposes even though the collected
data are insufficient to support the application of advanced models or to
enable interdisciplinary studies. Moreover, the majority of those are limited
to PC games, thus excluding notorious gaming platforms, e.g., PlayStation,
Xbox, and Nintendo. In this paper, we propose PlayMyData, a curated dataset
composed of 99,864 multi-platform games gathered by IGDB website. By exploiting
a dedicated API, we collect relevant metadata for each game, e.g., description,
genre, rating, gameplay video URLs, and screenshots. Furthermore, we enrich
PlayMyData with the timing needed to complete each game by mining the HLTB
website. To the best of our knowledge, this is the most comprehensive dataset
in the domain that can be used to support different automated tasks in SE. More
importantly, PlayMyData can be used to foster cross-domain investigations built
on top of the provided multimedia data.Comment: Accepted at the The 21st Mining Software Repositories (MSR 2024
Automated categorization of pre-trained models for software engineering: A case study with a Hugging Face dataset
Software engineering (SE) activities have been revolutionized by the advent
of pre-trained models (PTMs), defined as large machine learning (ML) models
that can be fine-tuned to perform specific SE tasks. However, users with
limited expertise may need help to select the appropriate model for their
current task. To tackle the issue, the Hugging Face (HF) platform simplifies
the use of PTMs by collecting, storing, and curating several models.
Nevertheless, the platform currently lacks a comprehensive categorization of
PTMs designed specifically for SE, i.e., the existing tags are more suited to
generic ML categories.
This paper introduces an approach to address this gap by enabling the
automatic classification of PTMs for SE tasks. First, we utilize a public dump
of HF to extract PTMs information, including model documentation and associated
tags. Then, we employ a semi-automated method to identify SE tasks and their
corresponding PTMs from existing literature. The approach involves creating an
initial mapping between HF tags and specific SE tasks, using a similarity-based
strategy to identify PTMs with relevant tags. The evaluation shows that model
cards are informative enough to classify PTMs considering the pipeline tag.
Moreover, we provide a mapping between SE tasks and stored PTMs by relying on
model names.Comment: Accepted at The International Conference on Evaluation and Assessment
in Software Engineering (EASE), 2024 editio
Monte Carlo generators for top quark physics at the LHC
We review the main features of Monte Carlo generators for top quark
phenomenology and present some results for t-tbar and single-top signals and
backgrounds at the LHC.Comment: 7 pages, 5 figures. Talk given at `V Workshop Italiano sulla Fisica
pp a LHC', Perugia, Italy, 30 January - 2 February 2008. References update
Re-discovery of the top quark at the LHC and first measurements
This paper describes the top quark physics measurements that can be performed
with the first LHC data in the ATLAS and CMS experiments.Comment: 6 pages, 2 figures. Talk given at `V Workshop Italiano sulla Fisica
pp a LHC', Perugia, Italy, 30 January - 2 February 200
GPGPU for track finding in High Energy Physics
The LHC experiments are designed to detect large amount of physics events
produced with a very high rate. Considering the future upgrades, the data
acquisition rate will become even higher and new computing paradigms must be
adopted for fast data-processing: General Purpose Graphics Processing Units
(GPGPU) is a novel approach based on massive parallel computing. The intense
computation power provided by Graphics Processing Units (GPU) is expected to
reduce the computation time and to speed-up the low-latency applications used
for fast decision taking. In particular, this approach could be hence used for
high-level triggering in very complex environments, like the typical inner
tracking systems of the multi-purpose experiments at LHC, where a large number
of charged particle tracks will be produced with the luminosity upgrade. In
this article we discuss a track pattern recognition algorithm based on the
Hough Transform, where a parallel approach is expected to reduce dramatically
the execution time.Comment: 6 pages, 4 figures, proceedings prepared for GPU-HEP 2014 conference,
submitted to DESY-PROC-201
Supporting Early-Safety Analysis of IoT Systems by Exploiting Testing Techniques
IoT systems complexity and susceptibility to failures pose significant
challenges in ensuring their reliable operation Failures can be internally
generated or caused by external factors impacting both the systems correctness
and its surrounding environment To investigate these complexities various
modeling approaches have been proposed to raise the level of abstraction
facilitating automation and analysis FailureLogic Analysis FLA is a technique
that helps predict potential failure scenarios by defining how a components
failure logic behaves and spreads throughout the system However manually
specifying FLA rules can be arduous and errorprone leading to incomplete or
inaccurate specifications In this paper we propose adopting testing
methodologies to improve the completeness and correctness of these rules How
failures may propagate within an IoT system can be observed by systematically
injecting failures while running test cases to collect evidence useful to add
complete and refine FLA rule
- …