Search CORE

4 research outputs found

The fairGRecs dataset : A dataset for producing health-related recommendations

Author: Kondylakis Haridimos
Stefanidis Kostas
Stratigi Maria
Publication venue: CEUR-WS
Publication date: 01/01/2018
Field of study

Trepo - Institutional Repository of Tampere University

Information between Data and Knowledge: Information Science and its Neighbors from Data Science to Digital Humanities

Author
Publication venue: Werner Hülsbusch
Publication date: 01/01/2021
Field of study

Digital humanities as well as data science as neighboring fields pose new challenges and opportunities for information science. The recent focus on data in the context of big data and deep learning brings along new tasks for information scientist for example in research data management. At the same time, information behavior changes in the light of the increasing digital availability of information in academia as well as in everyday life. In this volume, contributions from various fields like information behavior and information literacy, information retrieval, digital humanities, knowledge representation, emerging technologies, and information infrastructure showcase the development of information science research in recent years. Topics as diverse as social media analytics, fake news on Facebook, collaborative search practices, open educational resources or recent developments in research data management are some of the highlights of this volume. For more than 30 years, the International Symposium of Information Science has been the venue for bringing together information scientists from the German speaking countries. In addition to the regular scientific contributions, six of the best competitors for the prize for the best information science master thesis present their work

University of Regensburg Publication Server

Analysis and Application of Language Models to Human-Generated Textual Content

Author: Di Giovanni Marco <1993>
Publication venue: Alma Mater Studiorum - Università di Bologna
Publication date: 21/03/2022
Field of study

Social networks are enormous sources of human-generated content. Users continuously create information, useful but hard to detect, extract, and categorize. Language Models (LMs) have always been among the most useful and used approaches to process textual data. Firstly designed as simple unigram models, they improved through the years until the recent release of BERT, a pre-trained Transformer-based model reaching state-of-the-art performances in many heterogeneous benchmark tasks, such as text classification and tagging. In this thesis, I apply LMs to textual content publicly shared on social media. I selected Twitter as the principal source of data for the performed experiments since its users mainly share short and noisy texts. My goal is to build models that generate meaningful representations of users encoding their syntactic and semantic features. Once appropriate embeddings are defined, I compute similarities between users to perform higher-level analyses. Tested tasks include the extraction of emerging knowledge, represented by users similar to a given set of well-known accounts, controversy detection, obtaining controversy scores for topics discussed online, community detection and characterization, clustering similar users and detecting outliers, and stance classification of users and tweets (e.g., political inclination, COVID-19 vaccines position). The obtained results suggest that publicly available data contains delicate information about users, and Language Models can now extract it, threatening users' privacy

AMS Tesi di Dottorato

Grundlagen der Informationswissenschaft

Author
Publication venue: 'Walter de Gruyter GmbH'
Publication date
Field of study

OAPEN Library