5,268 research outputs found
Transfer Learning for Low-Resource Sentiment Analysis
Sentiment analysis is the process of identifying and extracting subjective
information from text. Despite the advances to employ cross-lingual approaches
in an automatic way, the implementation and evaluation of sentiment analysis
systems require language-specific data to consider various sociocultural and
linguistic peculiarities. In this paper, the collection and annotation of a
dataset are described for sentiment analysis of Central Kurdish. We explore a
few classical machine learning and neural network-based techniques for this
task. Additionally, we employ an approach in transfer learning to leverage
pretrained models for data augmentation. We demonstrate that data augmentation
achieves a high F score and accuracy despite the difficulty of the task.Comment: 14 pages - under review at ACM TALLI
A Method for Proper Noun Extraction in Kurdish
This paper suggests a method for proper noun identification in Kurdish texts. Kurdish proper nouns are not capitalized and they also assume other part-of-speech roles, which leads to a broad ambiguity that should be addressed in Kurdish proper noun recognition applications. Kurdish is also among less-resourced languages. We developed an application based on an architecture which includes a number of name lists, a set of rules, and a set of processes that recognizes Kurdish person names. This can help the study of Information Retrieval (IR) in Kurdish to advance and can also be used in Kurdish machine translation. We conducted several experiments which showed that the precision of the method is more than 95%, the recall is between 40% to 80%, and the F-measure is close to 60% to more than 80%. The reason for the low recall precision was because our name lists were not exhaustive enough to cover the vast majority of the Kurdish names
PARALLEL CREATION OF GIGAWORD CORPORA FOR MEDIUM DENSITY LANGUAGES: AN INTERIM REPORT
For increased speed in developing gigaword language resources for medium resource density languages we integrated several FOSS tools in the HUN * toolkit. While the speed and efficiency of the resulting pipeline has surpassed our expectations, our experience in developing LDC-style resource packages for Uzbek and Kurdish makes clear that neither the data collection nor the subsequent processing stages can be fully automated. 1
Political parties and the press in the kurdistan region of Iraq
Tese de doutoramento, Ciência Política (Política Comparada), Universidade de Lisboa, Instituto de Ciências Sociais, 2018This thesis studies the political system in the Kurdistan Region of Iraq (KRI), specifically in what is related to the media system and the interplay between both. The research is one of the very first attempts to present comparative study of politics and press in the KRI to understand the dynamics of the media systems and participate in the theoretical discussion of media and politics. A triangulation of methods and different sources are employed, such as qualitative analyses of current and archived laws, party and media documents, as well as personal semi-structured interviews and anonymous questionnaires conducted as part of this research. The framework adopted for studying the case was of the Hallin and Mancini’s (2004, 2012). The attempt was not to fully apply this framework, but to use the variables to help deepen the understanding the KRI media system. The results show that political parallelism is high which explains full party ownership of the media. The interdependence of media and politics is inevitable and one is not able to easily survive without the other. In addition, the journalists do not necessarily meet the professional requirements and being a member of one of the dominant parties which owns the media is sufficient. The state plays an important role in controlling and media related legislations remain mostly on paper rather than being fully implemented. Due to the party ownership, finding a market is the least priority for the majority of the press in the KRI. This thesis employs categories and dimensions used in comparative studies. It uses the theoretical framework developed on the basis of Western cases which makes it possible for a new case to be available on the map of comparative scholars, a case that otherwise would not be studied.Fundação Calouste Gulbenkia
Hierarchical Character-Word Models for Language Identification
Social media messages' brevity and unconventional spelling pose a challenge
to language identification. We introduce a hierarchical model that learns
character and contextualized word-level representations for language
identification. Our method performs well against strong base- lines, and can
also reveal code-switching
- …