3 research outputs found

    Sentiment Classification of Documents in Serbian: The Effects of Morphological Normalization and Word Embeddings

    No full text
    An open issue in the sentiment classification of texts written in Serbian is the effect of different forms of morphological normalization and the usefulness of leveraging large amounts of unlabeled texts. In this paper, we assess the impact of lemmatizers and stemmers for Serbian on classifiers trained and evaluated on the Serbian Movie Review Dataset. We also consider the effectiveness of using word embeddings, generated from a large unlabeled corpus, as classification features

    Evaluation and Classification of Syntax Usage in Determining Short-Text Semantic Similarity

    No full text
    This paper outlines and categorizes ways of using syntactic information in a number of algorithms for determining the semantic similarity of short texts. We consider the use of word order information, part-of-speech tagging, parsing and semantic role labeling. We analyze and evaluate the effects of syntax usage on algorithm performance by utilizing the results of a paraphrase detection test on the Microsoft Research Paraphrase Corpus. We also propose a new classification of algorithms based on their applicability to languages with scarce natural language processing tools
    corecore