2,582 research outputs found

    Building English-to-Serbian machine translation system for IMDb movie reviews

    Get PDF
    This paper reports the results of the first experiment dealing with the challenges of building a machine translation system for user-generated content involving a complex South Slavic language. We focus on translation of English IMDb user movie reviews into Serbian, in a low-resource scenario. We explore potentials and limits of (i) phrase-based and neural machine translation systems trained on out-of-domain clean parallel data from news articles (ii) creating additional synthetic in-domain parallel corpus by machine-translating the English IMDb corpus into Serbian. Our main findings are that morphology and syntax are better handled by the neural approach than by the phrase-based approach even in this low-resource mismatched domain scenario, however the situation is different for the lexical aspect, especially for person names. This finding also indicates that in general, machine translation of person names into Slavic languages (especially those which require/allow transcription) should be investigated more systematically

    An overview of Sentiment Analysis of Twitter Data

    Get PDF
    In the last few years, social media has seen tremendous growth in the number of users. In particular, Twitter has revealed to be one of the most widespread microblogging services for instantly publishing and sharing opinions, feedbacks, ratings, etc., contributing to the development of the emerging role of users as sensors. Twitter has become the largest source of obtaining data worldwide. This project proposes a method to predict the future of the entertainment industry, telecommunication industry, and other various industries. However, due to the huge amount of data to be collected and analyzed and limitations on data access imposed by Twitter public APIs, more efficient requirements are needed for analytics tools, both in terms of data ingestion and processing, as well as for the computation of analysis metrics, to be provided for deeper statistic insights and further investigations. This project evaluates people's feelings about different products related to various industries. Twitter API is used to access the tweets directly from Twitter and form a model for sentiment classification. The result of the analysis is characterized by positive, negative, and neutral observation from the user's opinions

    Data analytics 2016: proceedings of the fifth international conference on data analytics

    Get PDF

    GreekPolitics: Sentiment Analysis on Greek Politically Charged Tweets

    Get PDF
    The rapid growth of on-line social media platforms has rendered opinion mining/sentiment analysis a critical area of research. This paper focuses on analyzing Twitter posts (tweets), written in the Greek language and politically charged in content. This is a rather underexplored topic, due to the inadequacy of publicly available annotated datasets. Thus, we present and release GreekPolitics: a dataset of Greek tweets with politically charged content, annotated for four different sentiments: polarity, figurativeness, aggressiveness and bias. GreekPolitics has been evaluated comprehensively using state-of-the-art Deep Neural Networks (DNNs) and data augmentation methods. This paper details the dataset, the evaluation process and the experimental results
    corecore