Search CORE

5 research outputs found

Recommended from our members

Style over substance: A psychologically informed approach to feature selection and generalisability for author classification

Author: Cribbin T
Ferenczi N
Holmes I
Publication venue: 'Elsevier BV'
Publication date: 13/01/2023
Field of study

Data availability: Data will be made available on request.Copyright © 2023 The Authors. Author profiling, or classifying user generated content based on demographic or other personal attributes, is a key task in social media-based research. Whilst high-accuracy has been achieved on many attributes, most studies tend to train and test models on a single domain only, ignoring cross-domain performance and research shows that models often transfer poorly into new domains as they tend to depend heavily on topic-specific (i.e., lexical) features. Knowledge specific to the field (e.g., Psychology, Political Science) is often ignored, with a reliance on data driven algorithms for feature development and selection. Focusing on political affiliation, we evaluate an approach that selects stylistic features according to known psychological correlates (personality traits) of this attribute. Training data was collected from Reddit posts made by regular users of the political subreddits of r/republican and r/democrat. A second, non-political dataset, was created by collecting posts by the same users but in different subreddits. Our results show that introducing domain specific knowledge in the form of psychologically informed stylistic features resulted in better out of training domain performance than lexical or more commonly used stylistic features

Brunel University Research Archive

Making Predictions with Textual Contents

Author: Brito Indira
Publication venue
Publication date: 01/04/2014
Field of study

Forecasting real-world quantities with basis on information from textual descriptions has recently attracted significant interest as a research problem, although previous studies have focused on applications involving only the English language. This document presents an experimental study on the subject of making predictions with textual contents written in Portuguese, using documents from three distinct domains. I specifically report on experiments using different types of regression models, using state-of-the-art feature weighting schemes, and using features derived from cluster-based word representations. Through controlled experiments, I have shown that prediction models using the textual information achieve better results than simple baselines such as taking the average value over the training data, and that richer document representations (i.e., using Brown clusters and the Delta- TF-IDF feature weighting scheme) result in slight performance improvements

Portal do Conhecimento

Sociolinguistic bibliography of European countries 2011:Soziolinguistische Bibliographie europäischer Länder für 2011

Author: Archakis Argyris
Augusto Maria Celeste
Berruto Gaetano
Borbély Anna
Broermann Marianne
Bugarski Ranko
Darquennes Jeroen
Druviete Ina
Gilles Peter
Goebl Hans
Goutsos Dionysos
Held Gurdrun
Jorgensen Jens Normann
Kaderka Petr
Kalediene Laima
Karlsson Anna-Malin
Kellermeier-Rehbein Birte
Kroon Sjaak
Ledegen Gudrun
Lüdi Georges
Novak-Lukanovic Sonja
Oakes Leigh
Pachev Angel
Pärn Hele
Sandoy Helge
Selas Magnhild
Skelin-Horvath Anita
Troschina Natalia
Zamora Francisco
Publication venue: 'Walter de Gruyter GmbH'
Publication date: 01/01/2013
Field of study

Repository of the University of Namur

Opinion Mining of Sociopolitical Comments from Social Media

Author: GOTTIPATI Swapna
Publication venue: Singapore Management University
Publication date: 01/08/2014
Field of study

Institutional Knowledge at Singapore Management University

Workshop Proceedings of the 12th edition of the KONVENS conference

Author: Faaß Gertrud
Ruppenhofer Josef
Publication venue: Mannheim : Leibniz-Institut für Deutsche Sprache (IDS)
Publication date: 11/07/2023
Field of study

The 2014 issue of KONVENS is even more a forum for exchange: its main topic is the interaction between Computational Linguistics and Information Science, and the synergies such interaction, cooperation and integrated views can produce. This topic at the crossroads of different research traditions which deal with natural language as a container of knowledge, and with methods to extract and manage knowledge that is linguistically represented is close to the heart of many researchers at the Institut für Informationswissenschaft und Sprachtechnologie of Universität Hildesheim: it has long been one of the institute’s research topics, and it has received even more attention over the last few years

Publikationsserver des Instituts für Deutsche Sprache