TransFeatEx: a NLP pipeline for feature extraction

Franch Gutiérrez, Javier; Gallego Marfa, Agustí; Marco Gómez, Jordi; Motger de la Encarnación, Joaquim

TransFeatEx: a NLP pipeline for feature extraction

Authors: Javier Franch Gutiérrez
Agustí Gallego Marfa
Jordi Marco Gómez
Joaquim Motger de la Encarnación
Publication date: 1 January 2023
Publisher: CEUR-WS.org

Abstract

Mobile app stores provide centralized access to a large data set of mobile app related natural language textual data, including developer’s documentation (e.g., descriptions, changelogs) and user-generated data (e.g., user reviews). Motivated by this context, multiple studies have focused on data-driven elicitation processes for the automatic extraction of the set of features exposed by a catalogue of applications and the inferred, extended knowledge that can be derived from this information. Moreover, with the emerging and generalization of large language models, traditional linguistic-based approaches can be significantly improved by the potential of the knowledge embedded in this kind of models. In this paper, we present TransFeatEx, a NLP-based feature extraction pipeline that combines the use of a RoBERTa-based model with the application of consolidated syntactic and semantic techniques. The pipeline is designed as a customizable, standalone service to be used either as a playground, experimentation tool or as a software component to be embedded into a third-party software system for batch-processing large document corpora. An example of a demo plan is showcased here: https://youtu.be/gfFyi_i_uTwWith the support from the Secretariat for Universities and Research of the Ministry of Business and Knowledge of the Government of Catalonia and the European Social Fund. This paper has been funded by the Spanish Ministerio de Ciencia e Innovación under project / funding scheme PID2020-117191RB-I00 / AEI/10.13039/501100011033.Peer ReviewedPostprint (published version

Similar works

Full text

Open in the Core reader

Download PDF

Available Versions

UPCommons. Portal del coneixement obert de la UPC

oai:upcommons.upc.edu:2117/388...

Last time updated on 09/08/2023