69 research outputs found
Czech Text Document Corpus v 2.0
This paper introduces "Czech Text Document Corpus v 2.0", a collection of
text documents for automatic document classification in Czech language. It is
composed of the text documents provided by the Czech News Agency and is freely
available for research purposes at http://ctdc.kiv.zcu.cz/. This corpus was
created in order to facilitate a straightforward comparison of the document
classification approaches on Czech data. It is particularly dedicated to
evaluation of multi-label document classification approaches, because one
document is usually labelled with more than one label. Besides the information
about the document classes, the corpus is also annotated at the morphological
layer. This paper further shows the results of selected state-of-the-art
methods on this corpus to offer the possibility of an easy comparison with
these approaches.Comment: Accepted for LREC 201
Оптимизация вычисления одновременной маскировки речевого сигнала
Paper proposes optimization of calculation of frequency masking of speech signal for use in real time applications. The complexity of circular convolution is shown for iterative Toom-Cook algorithm with length 4 and an algorithm based on the FFT. The conclusion about the efficiency of the proposed solutions is drawn on computation complexity and memory.В настоящей статье предлагается оптимизация вычисления одновременноймаскировки речевого сигнала для реализации в задачах реального времени. Показанатрудоемкость циклической свертки для итерационного алгоритма Тоома-Кука длины 4 иалгоритма на основе БПФ. Делается вывод об эффективности предлагаемых решений повычислительной сложности и по объемам занимаемой память
Modelling of Diagnostics Influence on Control System Safety
If the control system besides the standard control functions also realizes the functions (known as safety functions), failures of which can influence safety of the controlled process, then the control system may be a source of risk for assets, that are within the scope of the controlled process. Early detection of these failures and subsequent negation of their effects can have a significant influence on the safety integrity level of the safety function and thus also on the elimination of risks related to the controlled process. Therefore, the diagnostics is the means which, if appropriately applied, can increase not only the availability, but also the safety of the control system. The paper deals with using the homogeneous Markov chains to influence the evaluation of on-line diagnostics on the hardware safety integrity of the safety function, depending on the application method of several simultaneously operating diagnostics mechanisms and their basic parameters - the failures diagnostic coverage coefficient and the failure diagnostics time
An Empirical Analysis of the Role of Amplifiers, Downtoners, and Negations in Emotion Classification in Microblogs
The effect of amplifiers, downtoners, and negations has been studied in
general and particularly in the context of sentiment analysis. However, there
is only limited work which aims at transferring the results and methods to
discrete classes of emotions, e. g., joy, anger, fear, sadness, surprise, and
disgust. For instance, it is not straight-forward to interpret which emotion
the phrase "not happy" expresses. With this paper, we aim at obtaining a better
understanding of such modifiers in the context of emotion-bearing words and
their impact on document-level emotion classification, namely, microposts on
Twitter. We select an appropriate scope detection method for modifiers of
emotion words, incorporate it in a document-level emotion classification model
as additional bag of words and show that this approach improves the performance
of emotion classification. In addition, we build a term weighting approach
based on the different modifiers into a lexical model for the analysis of the
semantics of modifiers and their impact on emotion meaning. We show that
amplifiers separate emotions expressed with an emotion- bearing word more
clearly from other secondary connotations. Downtoners have the opposite effect.
In addition, we discuss the meaning of negations of emotion-bearing words. For
instance we show empirically that "not happy" is closer to sadness than to
anger and that fear-expressing words in the scope of downtoners often express
surprise.Comment: Accepted for publication at The 5th IEEE International Conference on
Data Science and Advanced Analytics (DSAA), https://dsaa2018.isi.it
Profiling a set of personality traits of text author: what our words reveal about us
Authorship profiling, i.e. revealing information about an unknown author by analyzing their text, is a task of growing importance. One of the most urgent problems of authorship profiling (AP) is selecting text parameters which may correlate to an author’s personality. Most researchers’ selection of these is not underpinned by any theory. This article proposes an approach to AP which applies neuroscience data. The aim of the study is to assess the probability of self-destructive behaviour of an individual via formal parameters of their texts. Here we have used the “Personality Corpus”, which consists of Russian-language texts. A set of correlations between scores on the Freiburg Personality Inventory scales that are known to be indicative of self-destructive behaviour (“Spontaneous Aggressiveness”, “Depressiveness”, “Emotional Lability”, and “Composedness”) and text variables (average sentence length, lexical diversity etc.) has been calculated. Further, a mathematical model which predicts the probability of self-destructive behaviour has been obtained
Media monitoring and information extraction for the highly inflected agglutinative language Hungarian
The Europe Media Monitor (EMM) is a fully-automatic system that analyses written online news by gathering articles in over 70 languages and by applying text analysis software for currently 21 languages, without using linguistic tools such as parsers, part-of-speech taggers or morphological analysers. In this paper, we describe the effort of adding to EMM Hungarian text mining tools for news gathering; document categorisation; named entity recognition and classification for persons, organisations and locations; name lemmatisation; quotation recognition; and cross-lingual linking of related news clusters. The major challenge of dealing with the Hungarian language is its high degree of inflection and agglutination. We present several experiments where we apply linguistically light-weight methods to deal with inflection and we propose a method to overcome the challenges. We also present detailed frequency lists of Hungarian person and location name suffixes, as found in real-life news texts. This empirical data can be used to draw further conclusions and to improve existing Named Entity Recognition software. Within EMM, the solutions described here will also be applied to other morphologically complex languages such as those of the Slavic language family. The media monitoring and analysis system EMM is freely accessible online via the web pag
RuSentNE-2023: Evaluating Entity-Oriented Sentiment Analysis on Russian News Texts
The paper describes the RuSentNE-2023 evaluation devoted to targeted
sentiment analysis in Russian news texts. The task is to predict sentiment
towards a named entity in a single sentence. The dataset for RuSentNE-2023
evaluation is based on the Russian news corpus RuSentNE having rich
sentiment-related annotation. The corpus is annotated with named entities and
sentiments towards these entities, along with related effects and emotional
states. The evaluation was organized using the CodaLab competition framework.
The main evaluation measure was macro-averaged measure of positive and negative
classes. The best results achieved were of 66% Macro F-measure
(Positive+Negative classes). We also tested ChatGPT on the test set from our
evaluation and found that the zero-shot answers provided by ChatGPT reached 60%
of the F-measure, which corresponds to 4th place in the evaluation. ChatGPT
also provided detailed explanations of its conclusion. This can be considered
as quite high for zero-shot application.Comment: 12 pages, 5 tables, 3 figure
- …