814 research outputs found
Deep Learning for User Comment Moderation
Experimenting with a new dataset of 1.6M user comments from a Greek news
portal and existing datasets of English Wikipedia comments, we show that an RNN
outperforms the previous state of the art in moderation. A deep,
classification-specific attention mechanism improves further the overall
performance of the RNN. We also compare against a CNN and a word-list baseline,
considering both fully automatic and semi-automatic moderation
A Multilingual Text Normalization Approach
International audienceThe creation of text corpora requires a sequence of processing steps in order to constitute, normalize, and then to directly exploit it by a given application. This paper presents a generic approach for text normalization and concentrates on the aspects of methodology and linguistic engineering, which serve to develop a multipurpose multilingual text corpus. This approach was applied to French, English, Spanish, Vietnamese, Khmer and Chinese. It consists in splitting the text normalization problem in a set of minor sub-problems as language-independent as possible. A set of text corpus normalization tools with linked resources and a document structuring method are proposed.<BR /
Annotating the Dutch Parallel Corpus
Proceedings of the Workshop on Annotation and
Exploitation of Parallel Corpora AEPC 2010.
Editors: Lars Ahrenberg, Jörg Tiedemann and Martin Volk.
NEALT Proceedings Series, Vol. 10 (2010), 63-72.
© 2010 The editors and contributors.
Published by
Northern European Association for Language
Technology (NEALT)
http://omilia.uio.no/nealt .
Electronically published at
Tartu University Library (Estonia)
http://hdl.handle.net/10062/15893
Data Representativeness in Accessibility Datasets: A Meta-Analysis
As data-driven systems are increasingly deployed at scale, ethical concerns
have arisen around unfair and discriminatory outcomes for historically
marginalized groups that are underrepresented in training data. In response,
work around AI fairness and inclusion has called for datasets that are
representative of various demographic groups. In this paper, we contribute an
analysis of the representativeness of age, gender, and race & ethnicity in
accessibility datasets - datasets sourced from people with disabilities and
older adults - that can potentially play an important role in mitigating bias
for inclusive AI-infused applications. We examine the current state of
representation within datasets sourced by people with disabilities by reviewing
publicly-available information of 190 datasets, we call these accessibility
datasets. We find that accessibility datasets represent diverse ages, but have
gender and race representation gaps. Additionally, we investigate how the
sensitive and complex nature of demographic variables makes classification
difficult and inconsistent (e.g., gender, race & ethnicity), with the source of
labeling often unknown. By reflecting on the current challenges and
opportunities for representation of disabled data contributors, we hope our
effort expands the space of possibility for greater inclusion of marginalized
communities in AI-infused systems.Comment: Preprint, The 24th International ACM SIGACCESS Conference on
Computers and Accessibility (ASSETS 2022), 15 page
Developing a multilayer semantic annotation scheme based on ISO standards for the visualization of a newswire corpus
In this paper, we describe the process of
developing a multilayer semantic
annotation scheme designed for extracting
information from a European Portuguese
corpus of news articles, at three levels,
temporal, referential and semantic role
labelling. The novelty of this scheme is the
harmonization of parts 1, 4 and 9 of the ISO
24617 Language resource management -
Semantic annotation framework. This
annotation framework includes a set of
entity structures (participants, events,
times) and a set of links (temporal,
aspectual, subordination, objectal and
semantic roles) with several tags and
attribute values that ensure adequate
semantic and visual representations of news
stories
- …