Search CORE

5,268 research outputs found

Transfer Learning for Low-Resource Sentiment Analysis

Author: Ahmadi Sina
Daneshfar Fatemeh
Hameed Razhan
Publication venue
Publication date: 10/04/2023
Field of study

Sentiment analysis is the process of identifying and extracting subjective information from text. Despite the advances to employ cross-lingual approaches in an automatic way, the implementation and evaluation of sentiment analysis systems require language-specific data to consider various sociocultural and linguistic peculiarities. In this paper, the collection and annotation of a dataset are described for sentiment analysis of Central Kurdish. We explore a few classical machine learning and neural network-based techniques for this task. Additionally, we employ an approach in transfer learning to leverage pretrained models for data augmentation. We demonstrate that data augmentation achieves a high F

_1

score and accuracy despite the difficulty of the task.Comment: 14 pages - under review at ACM TALLI

arXiv.org e-Print Archive

A Method for Proper Noun Extraction in Kurdish

Author: Hassani Hossein
Publication venue: OASIcs - OpenAccess Series in Informatics. 6th Symposium on Languages, Applications and Technologies (SLATE 2017)
Publication date: 01/01/2017
Field of study

This paper suggests a method for proper noun identification in Kurdish texts. Kurdish proper nouns are not capitalized and they also assume other part-of-speech roles, which leads to a broad ambiguity that should be addressed in Kurdish proper noun recognition applications. Kurdish is also among less-resourced languages. We developed an application based on an architecture which includes a number of name lists, a set of rules, and a set of processes that recognizes Kurdish person names. This can help the study of Information Retrieval (IR) in Kurdish to advance and can also be used in Kurdish machine translation. We conducted several experiments which showed that the precision of the method is more than 95%, the recall is between 40% to 80%, and the F-measure is close to 60% to more than 80%. The reason for the low recall precision was because our name lists were not exhaustive enough to cover the vast majority of the Kurdish names

Dagstuhl Research Online Publication Server

Sticks, carrots and great expectations: human rights conditionality and Turkey\u2019s path towards membership of the European Union

Author: Zalwski Piotr
Publication venue
Publication date: 01/01/2004
Field of study

Policy Documentation Center

PARALLEL CREATION OF GIGAWORD CORPORA FOR MEDIUM DENSITY LANGUAGES: AN INTERIM REPORT

Author: Halácsy Péter
Kornai András
NEMETH P
Varga Dániel
Publication venue
Publication date: 01/01/2008
Field of study

For increased speed in developing gigaword language resources for medium resource density languages we integrated several FOSS tools in the HUN * toolkit. While the speed and efficiency of the resulting pipeline has surpassed our expectations, our experience in developing LDC-style resource packages for Uzbek and Kurdish makes clear that neither the data collection nor the subsequent processing stages can be fully automated. 1

CiteSeerX

SZTAKI Publication Repository

Political parties and the press in the kurdistan region of Iraq

Author: Yaseen Taha Mohammedali
Publication venue
Publication date: 01/01/2017
Field of study

Tese de doutoramento, Ciência Política (Política Comparada), Universidade de Lisboa, Instituto de Ciências Sociais, 2018This thesis studies the political system in the Kurdistan Region of Iraq (KRI), specifically in what is related to the media system and the interplay between both. The research is one of the very first attempts to present comparative study of politics and press in the KRI to understand the dynamics of the media systems and participate in the theoretical discussion of media and politics. A triangulation of methods and different sources are employed, such as qualitative analyses of current and archived laws, party and media documents, as well as personal semi-structured interviews and anonymous questionnaires conducted as part of this research. The framework adopted for studying the case was of the Hallin and Mancini’s (2004, 2012). The attempt was not to fully apply this framework, but to use the variables to help deepen the understanding the KRI media system. The results show that political parallelism is high which explains full party ownership of the media. The interdependence of media and politics is inevitable and one is not able to easily survive without the other. In addition, the journalists do not necessarily meet the professional requirements and being a member of one of the dominant parties which owns the media is sufficient. The state plays an important role in controlling and media related legislations remain mostly on paper rather than being fully implemented. Due to the party ownership, finding a market is the least priority for the majority of the press in the KRI. This thesis employs categories and dimensions used in comparative studies. It uses the theoretical framework developed on the basis of Western cases which makes it possible for a new case to be available on the map of comparative scholars, a case that otherwise would not be studied.Fundação Calouste Gulbenkia

Universidade de Lisboa: Repositório.UL

Hierarchical Character-Word Models for Language Identification

Author: Hathi Shobhit
Jaech Aaron
Mulcaire George
Ostendorf Mari
Smith Noah A.
Publication venue
Publication date: 01/01/2016
Field of study

Social media messages' brevity and unconventional spelling pose a challenge to language identification. We introduce a hierarchical model that learns character and contextualized word-level representations for language identification. Our method performs well against strong base- lines, and can also reveal code-switching

arXiv.org e-Print Archive

Crossref