4,886 research outputs found
Event Detection and Classification in Hungarian Natural Texts
The detection and analysis of events in natural language texts plays an important role in several NLP applications such as summarization and question answering. This paper focuses on introducing a machine learningbased approach that can detect and classify verbal and infinitival events in Hungarian texts. First, the multiword noun + verb and noun + infinitive expressions were identified. Then the events are detected and the identified events are classified. For each problem, binary classifiers were applied based on rich feature sets. The models were expanded with rule-based methods
Event Detection and Classification in Natural Texts
The detection and analysis of events in natural language texts plays an important role in several NLP applications such as summarization and guestion answering. In this study we introduce a machine leaming-based approach that can detect and classify verbal and infinitival events in Hungarian texts. First we identify the multiword noun 4 verb and noun 4 infinitive expressions. Then the events are detected and the identified events are classified. For each problem, we applied binary classifiers based on rich feature sets. The models were expanded with rule-based methods too
Multilingual Twitter Sentiment Classification: The Role of Human Annotators
What are the limits of automated Twitter sentiment classification? We analyze
a large set of manually labeled tweets in different languages, use them as
training data, and construct automated classification models. It turns out that
the quality of classification models depends much more on the quality and size
of training data than on the type of the model trained. Experimental results
indicate that there is no statistically significant difference between the
performance of the top classification models. We quantify the quality of
training data by applying various annotator agreement measures, and identify
the weakest points of different datasets. We show that the model performance
approaches the inter-annotator agreement when the size of the training set is
sufficiently large. However, it is crucial to regularly monitor the self- and
inter-annotator agreements since this improves the training datasets and
consequently the model performance. Finally, we show that there is strong
evidence that humans perceive the sentiment classes (negative, neutral, and
positive) as ordered
Exploring Different Dimensions of Attention for Uncertainty Detection
Neural networks with attention have proven effective for many natural
language processing tasks. In this paper, we develop attention mechanisms for
uncertainty detection. In particular, we generalize standardly used attention
mechanisms by introducing external attention and sequence-preserving attention.
These novel architectures differ from standard approaches in that they use
external resources to compute attention weights and preserve sequence
information. We compare them to other configurations along different dimensions
of attention. Our novel architectures set the new state of the art on a
Wikipedia benchmark dataset and perform similar to the state-of-the-art model
on a biomedical benchmark which uses a large set of linguistic features.Comment: accepted at EACL 201
Dynamics of conflicts in Wikipedia
In this work we study the dynamical features of editorial wars in Wikipedia
(WP). Based on our previously established algorithm, we build up samples of
controversial and peaceful articles and analyze the temporal characteristics of
the activity in these samples. On short time scales, we show that there is a
clear correspondence between conflict and burstiness of activity patterns, and
that memory effects play an important role in controversies. On long time
scales, we identify three distinct developmental patterns for the overall
behavior of the articles. We are able to distinguish cases eventually leading
to consensus from those cases where a compromise is far from achievable.
Finally, we analyze discussion networks and conclude that edit wars are mainly
fought by few editors only.Comment: Supporting information adde
A multimodal deep learning architecture for smoking detection with a small data approach
Introduction: Covert tobacco advertisements often raise regulatory measures.
This paper presents that artificial intelligence, particularly deep learning,
has great potential for detecting hidden advertising and allows unbiased,
reproducible, and fair quantification of tobacco-related media content.
Methods: We propose an integrated text and image processing model based on deep
learning, generative methods, and human reinforcement, which can detect smoking
cases in both textual and visual formats, even with little available training
data. Results: Our model can achieve 74\% accuracy for images and 98\% for
text. Furthermore, our system integrates the possibility of expert intervention
in the form of human reinforcement. Conclusions: Using the pre-trained
multimodal, image, and text processing models available through deep learning
makes it possible to detect smoking in different media even with few training
data
- …