Search CORE

814 research outputs found

Towards robust multi-tool tagging: an OWL/DL-based approach

Author: Chiarcos Christian
Publication venue
Publication date: 19/05/2023
Field of study

Deep Learning for User Comment Moderation

Author: Androutsopoulos Ion
Malakasiotis Prodromos
Pavlopoulos John
Publication venue
Publication date: 01/01/2017
Field of study

Experimenting with a new dataset of 1.6M user comments from a Greek news portal and existing datasets of English Wikipedia comments, we show that an RNN outperforms the previous state of the art in moderation. A deep, classification-specific attention mechanism improves further the overall performance of the RNN. We also compare against a CNN and a word-list baseline, considering both fully automatic and semi-automatic moderation

arXiv.org e-Print Archive

Crossref

A Multilingual Text Normalization Approach

Author: Bigi Brigitte
Publication venue: Springer Berlin Heidelberg
Publication date: 01/01/2014
Field of study

International audienceThe creation of text corpora requires a sequence of processing steps in order to constitute, normalize, and then to directly exploit it by a given application. This paper presents a generic approach for text normalization and concentrates on the aspects of methodology and linguistic engineering, which serve to develop a multipurpose multilingual text corpus. This approach was applied to French, English, Spanish, Vietnamese, Khmer and Chinese. It consists in splitting the text normalization problem in a set of minor sub-problems as language-independent as possible. A set of text corpus normalization tools with linked resources and a document structuring method are proposed.<BR /

HAL AMU

Annotating the Dutch Parallel Corpus

Author: Paulussen Hans
Publication venue
Publication date: 30/11/2010
Field of study

Proceedings of the Workshop on Annotation and Exploitation of Parallel Corpora AEPC 2010. Editors: Lars Ahrenberg, Jörg Tiedemann and Martin Volk. NEALT Proceedings Series, Vol. 10 (2010), 63-72. © 2010 The editors and contributors. Published by Northern European Association for Language Technology (NEALT) http://omilia.uio.no/nealt . Electronically published at Tartu University Library (Estonia) http://hdl.handle.net/10062/15893

DSpace at Tartu University Library

Data Representativeness in Accessibility Datasets: A Meta-Analysis

Author: Kacorri Hernisa
Kamikubo Rie
Mahmood Amnah
Marte Crystal
Wang Lining
Publication venue: 'Association for Computing Machinery (ACM)'
Publication date: 19/09/2022
Field of study

As data-driven systems are increasingly deployed at scale, ethical concerns have arisen around unfair and discriminatory outcomes for historically marginalized groups that are underrepresented in training data. In response, work around AI fairness and inclusion has called for datasets that are representative of various demographic groups. In this paper, we contribute an analysis of the representativeness of age, gender, and race & ethnicity in accessibility datasets - datasets sourced from people with disabilities and older adults - that can potentially play an important role in mitigating bias for inclusive AI-infused applications. We examine the current state of representation within datasets sourced by people with disabilities by reviewing publicly-available information of 190 datasets, we call these accessibility datasets. We find that accessibility datasets represent diverse ages, but have gender and race representation gaps. Additionally, we investigate how the sensitive and complex nature of demographic variables makes classification difficult and inconsistent (e.g., gender, race & ethnicity), with the source of labeling often unknown. By reflecting on the current challenges and opportunities for representation of disabled data contributors, we hope our effort expands the space of possibility for greater inclusion of marginalized communities in AI-infused systems.Comment: Preprint, The 24th International ACM SIGACCESS Conference on Computers and Accessibility (ASSETS 2022), 15 page

arXiv.org e-Print Archive

Developing a multilayer semantic annotation scheme based on ISO standards for the visualization of a newswire corpus

Author: Cantante Inês
Jorge Alípio Mário
Leal António
Oliveira Fátima
Silva Maria de Fátima Henriques da
Silvano Maria da Purificação
Publication venue
Publication date: 01/01/2021
Field of study

In this paper, we describe the process of developing a multilayer semantic annotation scheme designed for extracting information from a European Portuguese corpus of news articles, at three levels, temporal, referential and semantic role labelling. The novelty of this scheme is the harmonization of parts 1, 4 and 9 of the ISO 24617 Language resource management - Semantic annotation framework. This annotation framework includes a set of entity structures (participants, events, times) and a set of links (temporal, aspectual, subordination, objectal and semantic roles) with several tags and attribute values that ensure adequate semantic and visual representations of news stories

Repositório Aberto da Universidade do Porto