69 research outputs found

    Czech Text Document Corpus v 2.0

    Full text link
    This paper introduces "Czech Text Document Corpus v 2.0", a collection of text documents for automatic document classification in Czech language. It is composed of the text documents provided by the Czech News Agency and is freely available for research purposes at http://ctdc.kiv.zcu.cz/. This corpus was created in order to facilitate a straightforward comparison of the document classification approaches on Czech data. It is particularly dedicated to evaluation of multi-label document classification approaches, because one document is usually labelled with more than one label. Besides the information about the document classes, the corpus is also annotated at the morphological layer. This paper further shows the results of selected state-of-the-art methods on this corpus to offer the possibility of an easy comparison with these approaches.Comment: Accepted for LREC 201

    Оптимизация вычисления одновременной маскировки речевого сигнала

    Get PDF
    Paper proposes optimization of calculation of frequency masking of speech signal for use in real time applications. The complexity of circular convolution is shown for iterative Toom-Cook algorithm with length 4 and an algorithm based on the FFT. The conclusion about the efficiency of the proposed solutions is drawn on computation complexity and memory.В настоящей статье предлагается оптимизация вычисления одновременноймаскировки речевого сигнала для реализации в задачах реального времени. Показанатрудоемкость циклической свертки для итерационного алгоритма Тоома-Кука длины 4 иалгоритма на основе БПФ. Делается вывод об эффективности предлагаемых решений повычислительной сложности и по объемам занимаемой память

    Modelling of Diagnostics Influence on Control System Safety

    Get PDF
    If the control system besides the standard control functions also realizes the functions (known as safety functions), failures of which can influence safety of the controlled process, then the control system may be a source of risk for assets, that are within the scope of the controlled process. Early detection of these failures and subsequent negation of their effects can have a significant influence on the safety integrity level of the safety function and thus also on the elimination of risks related to the controlled process. Therefore, the diagnostics is the means which, if appropriately applied, can increase not only the availability, but also the safety of the control system. The paper deals with using the homogeneous Markov chains to influence the evaluation of on-line diagnostics on the hardware safety integrity of the safety function, depending on the application method of several simultaneously operating diagnostics mechanisms and their basic parameters - the failures diagnostic coverage coefficient and the failure diagnostics time

    An Empirical Analysis of the Role of Amplifiers, Downtoners, and Negations in Emotion Classification in Microblogs

    Full text link
    The effect of amplifiers, downtoners, and negations has been studied in general and particularly in the context of sentiment analysis. However, there is only limited work which aims at transferring the results and methods to discrete classes of emotions, e. g., joy, anger, fear, sadness, surprise, and disgust. For instance, it is not straight-forward to interpret which emotion the phrase "not happy" expresses. With this paper, we aim at obtaining a better understanding of such modifiers in the context of emotion-bearing words and their impact on document-level emotion classification, namely, microposts on Twitter. We select an appropriate scope detection method for modifiers of emotion words, incorporate it in a document-level emotion classification model as additional bag of words and show that this approach improves the performance of emotion classification. In addition, we build a term weighting approach based on the different modifiers into a lexical model for the analysis of the semantics of modifiers and their impact on emotion meaning. We show that amplifiers separate emotions expressed with an emotion- bearing word more clearly from other secondary connotations. Downtoners have the opposite effect. In addition, we discuss the meaning of negations of emotion-bearing words. For instance we show empirically that "not happy" is closer to sadness than to anger and that fear-expressing words in the scope of downtoners often express surprise.Comment: Accepted for publication at The 5th IEEE International Conference on Data Science and Advanced Analytics (DSAA), https://dsaa2018.isi.it

    Profiling a set of personality traits of text author: what our words reveal about us

    Get PDF
    Authorship profiling, i.e. revealing information about an unknown author by analyzing their text, is a task of growing importance. One of the most urgent problems of authorship profiling (AP) is selecting text parameters which may correlate to an author’s personality. Most researchers’ selection of these is not underpinned by any theory. This article proposes an approach to AP which applies neuroscience data. The aim of the study is to assess the probability of self-destructive behaviour of an individual via formal parameters of their texts. Here we have used the “Personality Corpus”, which consists of Russian-language texts. A set of correlations between scores on the Freiburg Personality Inventory scales that are known to be indicative of self-destructive behaviour (“Spontaneous Aggressiveness”, “Depressiveness”, “Emotional Lability”, and “Composedness”) and text variables (average sentence length, lexical diversity etc.) has been calculated. Further, a mathematical model which predicts the probability of self-destructive behaviour has been obtained

    Media monitoring and information extraction for the highly inflected agglutinative language Hungarian

    Get PDF
    The Europe Media Monitor (EMM) is a fully-automatic system that analyses written online news by gathering articles in over 70 languages and by applying text analysis software for currently 21 languages, without using linguistic tools such as parsers, part-of-speech taggers or morphological analysers. In this paper, we describe the effort of adding to EMM Hungarian text mining tools for news gathering; document categorisation; named entity recognition and classification for persons, organisations and locations; name lemmatisation; quotation recognition; and cross-lingual linking of related news clusters. The major challenge of dealing with the Hungarian language is its high degree of inflection and agglutination. We present several experiments where we apply linguistically light-weight methods to deal with inflection and we propose a method to overcome the challenges. We also present detailed frequency lists of Hungarian person and location name suffixes, as found in real-life news texts. This empirical data can be used to draw further conclusions and to improve existing Named Entity Recognition software. Within EMM, the solutions described here will also be applied to other morphologically complex languages such as those of the Slavic language family. The media monitoring and analysis system EMM is freely accessible online via the web pag

    RuSentNE-2023: Evaluating Entity-Oriented Sentiment Analysis on Russian News Texts

    Full text link
    The paper describes the RuSentNE-2023 evaluation devoted to targeted sentiment analysis in Russian news texts. The task is to predict sentiment towards a named entity in a single sentence. The dataset for RuSentNE-2023 evaluation is based on the Russian news corpus RuSentNE having rich sentiment-related annotation. The corpus is annotated with named entities and sentiments towards these entities, along with related effects and emotional states. The evaluation was organized using the CodaLab competition framework. The main evaluation measure was macro-averaged measure of positive and negative classes. The best results achieved were of 66% Macro F-measure (Positive+Negative classes). We also tested ChatGPT on the test set from our evaluation and found that the zero-shot answers provided by ChatGPT reached 60% of the F-measure, which corresponds to 4th place in the evaluation. ChatGPT also provided detailed explanations of its conclusion. This can be considered as quite high for zero-shot application.Comment: 12 pages, 5 tables, 3 figure
    corecore