3 research outputs found

    Creating a morphological and syntactic tagged corpus for the Uzbek language

    Full text link
    Nowadays, creation of the tagged corpora is becoming one of the most important tasks of Natural Language Processing (NLP). There are not enough tagged corpora to build machine learning models for the low-resource Uzbek language. In this paper, we tried to fill that gap by developing a novel Part Of Speech (POS) and syntactic tagset for creating the syntactic and morphologically tagged corpus of the Uzbek language. This work also includes detailed description and presentation of a web-based application to work on a tagging as well. Based on the developed annotation tool and the software, we share our experience results of the first stage of the tagged corpus creatio

    A Survey of Cross-Lingual Sentiment Analysis Based on Pre-Trained Models

    Get PDF
    With the technology development of natural language processing, many researchers have studied Machine Learning (ML), Deep Learning (DL), monolingual Sentiment Analysis (SA) widely. However, there is not much work on Cross-Lingual SA (CLSA), although it is beneficial when dealing with low resource languages (e.g., Tamil, Malayalam, Hindi, and Arabic). This paper surveys the main challenges and issues of CLSA based on some pre-trained language models and mentions the leading methods to cope with CLSA. In particular, we compare and analyze their pros and cons. Moreover, we summarize the valuable cross-lingual resources and point out the main problems researchers need to solve in the future
    corecore