5,268 research outputs found

    Transfer Learning for Low-Resource Sentiment Analysis

    Full text link
    Sentiment analysis is the process of identifying and extracting subjective information from text. Despite the advances to employ cross-lingual approaches in an automatic way, the implementation and evaluation of sentiment analysis systems require language-specific data to consider various sociocultural and linguistic peculiarities. In this paper, the collection and annotation of a dataset are described for sentiment analysis of Central Kurdish. We explore a few classical machine learning and neural network-based techniques for this task. Additionally, we employ an approach in transfer learning to leverage pretrained models for data augmentation. We demonstrate that data augmentation achieves a high F1_1 score and accuracy despite the difficulty of the task.Comment: 14 pages - under review at ACM TALLI

    A Method for Proper Noun Extraction in Kurdish

    Get PDF
    This paper suggests a method for proper noun identification in Kurdish texts. Kurdish proper nouns are not capitalized and they also assume other part-of-speech roles, which leads to a broad ambiguity that should be addressed in Kurdish proper noun recognition applications. Kurdish is also among less-resourced languages. We developed an application based on an architecture which includes a number of name lists, a set of rules, and a set of processes that recognizes Kurdish person names. This can help the study of Information Retrieval (IR) in Kurdish to advance and can also be used in Kurdish machine translation. We conducted several experiments which showed that the precision of the method is more than 95%, the recall is between 40% to 80%, and the F-measure is close to 60% to more than 80%. The reason for the low recall precision was because our name lists were not exhaustive enough to cover the vast majority of the Kurdish names

    PARALLEL CREATION OF GIGAWORD CORPORA FOR MEDIUM DENSITY LANGUAGES: AN INTERIM REPORT

    Get PDF
    For increased speed in developing gigaword language resources for medium resource density languages we integrated several FOSS tools in the HUN * toolkit. While the speed and efficiency of the resulting pipeline has surpassed our expectations, our experience in developing LDC-style resource packages for Uzbek and Kurdish makes clear that neither the data collection nor the subsequent processing stages can be fully automated. 1

    Political parties and the press in the kurdistan region of Iraq

    Get PDF
    Tese de doutoramento, Ciência Política (Política Comparada), Universidade de Lisboa, Instituto de Ciências Sociais, 2018This thesis studies the political system in the Kurdistan Region of Iraq (KRI), specifically in what is related to the media system and the interplay between both. The research is one of the very first attempts to present comparative study of politics and press in the KRI to understand the dynamics of the media systems and participate in the theoretical discussion of media and politics. A triangulation of methods and different sources are employed, such as qualitative analyses of current and archived laws, party and media documents, as well as personal semi-structured interviews and anonymous questionnaires conducted as part of this research. The framework adopted for studying the case was of the Hallin and Mancini’s (2004, 2012). The attempt was not to fully apply this framework, but to use the variables to help deepen the understanding the KRI media system. The results show that political parallelism is high which explains full party ownership of the media. The interdependence of media and politics is inevitable and one is not able to easily survive without the other. In addition, the journalists do not necessarily meet the professional requirements and being a member of one of the dominant parties which owns the media is sufficient. The state plays an important role in controlling and media related legislations remain mostly on paper rather than being fully implemented. Due to the party ownership, finding a market is the least priority for the majority of the press in the KRI. This thesis employs categories and dimensions used in comparative studies. It uses the theoretical framework developed on the basis of Western cases which makes it possible for a new case to be available on the map of comparative scholars, a case that otherwise would not be studied.Fundação Calouste Gulbenkia

    Hierarchical Character-Word Models for Language Identification

    Full text link
    Social media messages' brevity and unconventional spelling pose a challenge to language identification. We introduce a hierarchical model that learns character and contextualized word-level representations for language identification. Our method performs well against strong base- lines, and can also reveal code-switching
    corecore