39 research outputs found
Is writing style predictive of scientific fraud?
The problem of detecting scientific fraud using machine learning was recently
introduced, with initial, positive results from a model taking into account
various general indicators. The results seem to suggest that writing style is
predictive of scientific fraud. We revisit these initial experiments, and show
that the leave-one-out testing procedure they used likely leads to a slight
over-estimate of the predictability, but also that simple models can outperform
their proposed model by some margin. We go on to explore more abstract
linguistic features, such as linguistic complexity and discourse structure,
only to obtain negative results. Upon analyzing our models, we do see some
interesting patterns, though: Scientific fraud, for examples, contains less
comparison, as well as different types of hedging and ways of presenting
logical reasoning.Comment: To appear in the Proceedings of the Workshop on Stylistic Variation
2017 (EMNLP), 6 page
Detection of Abusive Language from Tweets in Social Networks
Detection of abusive language in user generated online con-tent has become an issue of increasing importance in recent years. Most current commercial methods make use of black-lists and regular expressions, however these measures fall short when contending with more subtle, less ham-fisted ex-samples of hate speech. In this work, we develop a machine learning based method to detect hate speech on online user comments from two domains which outperforms a state-of-the-art deep learning approach. We also develop a corpus of user comments annotated for abusive language, the first of its kind. Finally, we use our detection tool to analyze abusive language over time and in different settings to further enhance our knowledge of this behavior
Discourse Structures and Language Technologies
Proceedings of the 18th Nordic Conference of Computational Linguistics
NODALIDA 2011.
Editors: Bolette Sandford Pedersen, Gunta Nešpore and Inguna Skadiņa.
NEALT Proceedings Series, Vol. 11 (2011), 12-16.
© 2011 The editors and contributors.
Published by
Northern European Association for Language
Technology (NEALT)
http://omilia.uio.no/nealt .
Electronically published at
Tartu University Library (Estonia)
http://hdl.handle.net/10062/16955
GumDrop at the DISRPT2019 Shared Task: A Model Stacking Approach to Discourse Unit Segmentation and Connective Detection
In this paper we present GumDrop, Georgetown University's entry at the DISRPT
2019 Shared Task on automatic discourse unit segmentation and connective
detection. Our approach relies on model stacking, creating a heterogeneous
ensemble of classifiers, which feed into a metalearner for each final task. The
system encompasses three trainable component stacks: one for sentence
splitting, one for discourse unit segmentation and one for connective
detection. The flexibility of each ensemble allows the system to generalize
well to datasets of different sizes and with varying levels of homogeneity.Comment: Proceedings of Discourse Relation Parsing and Treebanking
(DISRPT2019
Discovery of Ambiguous and Unambiguous Discourse Connectives via Annotation Projection
Proceedings of the Workshop on Annotation and
Exploitation of Parallel Corpora AEPC 2010.
Editors: Lars Ahrenberg, Jörg Tiedemann and Martin Volk.
NEALT Proceedings Series, Vol. 10 (2010), 83-92.
© 2010 The editors and contributors.
Published by
Northern European Association for Language
Technology (NEALT)
http://omilia.uio.no/nealt .
Electronically published at
Tartu University Library (Estonia)
http://hdl.handle.net/10062/15893
HLT-FBK: a Complete Temporal Processing System for QA TempEval
The HLT-FBK system is a suite of SVMs-based classification models for extracting time expressions, events and temporal relations, each with a set of features obtained with the NewsReader NLP pipeline. HLT-FBK’s best system runs ranked 1st in all three domains, with a recall of 0.30 over all domains. Our attempts on increasing recall by considering all SRL predicates as events as well as utilizing event co-reference information in extracting temporal links result in significant improvements