18 research outputs found
A survey on sentiment analysis in Urdu: A resource-poor language
© 2020 Background/introduction: The dawn of the internet opened the doors to the easy and widespread sharing of information on subject matters such as products, services, events and political opinions. While the volume of studies conducted on sentiment analysis is rapidly expanding, these studies mostly address English language concerns. The primary goal of this study is to present state-of-art survey for identifying the progress and shortcomings saddling Urdu sentiment analysis and propose rectifications. Methods: We described the advancements made thus far in this area by categorising the studies along three dimensions, namely: text pre-processing lexical resources and sentiment classification. These pre-processing operations include word segmentation, text cleaning, spell checking and part-of-speech tagging. An evaluation of sophisticated lexical resources including corpuses and lexicons was carried out, and investigations were conducted on sentiment analysis constructs such as opinion words, modifiers, negations. Results and conclusions: Performance is reported for each of the reviewed study. Based on experimental results and proposals forwarded through this paper provides the groundwork for further studies on Urdu sentiment analysis
Recommended from our members
Sentiment analysis of dialectical Arabic social media content using a hybrid linguistic-machine learning approach
Despite the enormous increase in the number of Arabic posts on social networks, the sentiment analysis research into extracting opinions from these posts lags behind that for the English language. This is largely attributed to the challenges in processing the morphologically complex Arabic natural language and the scarcity of Arabic NLP tools and resources. This complex task is further exacerbated when analysing dialectal Arabic that do not abide by the formal grammatical structure. Based on the semantic modelling of the target domain’s knowledge and multi-factor lexicon-based sentiment analysis, the intent of this research is to use a hybrid approach, integrating linguistic and machine learning methods for sentiment analysis classification of dialectal Arabic. First, a dataset of dialectal Arabic tweets was collected focusing on the unemployment domain, which is annotated manually. The tweets cover different dialectal Arabic in Saudi Arabia for which a comprehensive Arabic sentiment lexicon was constructed. This approach to sentiment analysis also integrated a novel light stemming mechanism towards improved Saudi dialectal Arabic stemming. Subsequently, a novel multi-factor lexicon-based sentiment analysis algorithm was developed for domain-specific social media posts written in dialectal Arabic. The algorithm considers several factors (emoji, intensifiers, negations, supplications) to improve the accuracy of the classifications. Applying this model to a central problem of sentiment analysis in dialectical Arabic, these operational techniques were deployed in order to assess analytical performance across social media channels which are vulnerable to semantic and colloquial variations. Finally, this study presented a new hybrid approach to sentiment analysis where domain knowledge is utilised in two methods to combine computational linguistics and machine learning; the first method integrates the problem domain semantic knowledgebase in the machine learning training features set, while the second uses the outcome of the lexicon-based sentiment classification in the training of the machine learning methods. By integrating these techniques into a single, hybridised solution, a greater degree of accuracy and consistency was achieved than applying each approach independently, confirming a pragmatic solution to sentiment classification in dialectical Arabic text
Creating large semantic lexical resources for the Finnish language
Finnish belongs into the Finno-Ugric language family, and it is spoken by the vast majority of the people living in Finland. The motivation for this thesis is to contribute to the development of a semantic tagger for Finnish. This tool is a parallel of the English Semantic Tagger which has been developed at the University Centre for Computer Corpus Research on Language (UCREL) at Lancaster University since the beginning of the 1990s and which has over the years proven to be a very powerful tool in automatic semantic analysis of English spoken and written data. The English Semantic Tagger has various successful applications in the fields of natural language processing and corpus linguistics, and new application areas emerge all the time. The semantic lexical resources that I have created in this thesis provide the knowledge base for the Finnish Semantic Tagger. My main contributions are the lexical resources themselves, along with a set of methods and guidelines for their creation and expansion as a general language resource and as tailored for domain-specific applications. Furthermore, I propose and carry out several methods for evaluating semantic lexical resources. In addition to the English Semantic Tagger, which was developed first, and the Finnish Semantic Tagger second, equivalent semantic taggers have now been developed for Czech, Chinese, Dutch, French, Italian, Malay, Portuguese, Russian, Spanish, Urdu, and Welsh. All these semantic taggers taken together form a program framework called the UCREL Semantic Analysis System (USAS) which enables the development of not only monolingual but also various types of multilingual applications. Large-scale semantic lexical resources designed for Finnish using semantic fields as the organizing principle have not been attempted previously. Thus, the Finnish semantic lexicons created in this thesis are a unique and novel resource. The lexical coverage on the test corpora containing general modern standard Finnish, which has been the focus of the lexicon development, ranges from 94.58% to 97.91%. However, the results are also very promising in the analysis of domain-specific text (95.36%), older Finnish text (92.11–93.05%), and Internet discussions (91.97–94.14%). The results of the evaluation of lexical coverage are comparable to the results obtained with the English equivalents and thus indicate that the Finnish semantic lexical resources indeed cover the majority of core Finnish vocabulary
Sentiment Analysis for Social Media
Sentiment analysis is a branch of natural language processing concerned with the study of the intensity of the emotions expressed in a piece of text. The automated analysis of the multitude of messages delivered through social media is one of the hottest research fields, both in academy and in industry, due to its extremely high potential applicability in many different domains. This Special Issue describes both technological contributions to the field, mostly based on deep learning techniques, and specific applications in areas like health insurance, gender classification, recommender systems, and cyber aggression detection
Beyond the “Bhai-Bhai” Rhetoric : China-India Literary Relations, 1950-1990
This thesis examines the multi-layered relationship between the literary spheres of the People’s Republic of China (1949-) and the Republic of India (1947-) from the 1950s to the 1980s. Drawing on previously underexplored materials in Chinese, Hindi, and English, this thesis focuses on a range of writerly, textual, and readerly contacts — three aspects of what, following Karen Thornber (2009), I call “literary relations” — between the two newly established Asian nation-states. Considering literary relations as inextricable from political relations, I argue that China and India embarked on similar and related paths since 1950, but in order to understand these relations we need to keep multiple frames in mind: of each country’s national culture and foreign policy; of bilateral relations; and of broader leftist internationalism, the anti-imperialist Third World solidarity movement, and Cold War world politics. Specifically, I identify and analyse five different yet overlapping trajectories that tied modern Chinese and Indian literatures together: first, a bilateral mechanism of writerly contact intended to enhance the China-India friendship; second, a multinational forum of Afro-Asian writers designed to advance cultural self-determination and literary solidarity in the Third World; third, India’s enthusiastic import of modern Chinese fiction under the rubric of “revolutionary” with the Foreign Languages Press acting as the main text provider; fourth, China’s systematic reception of “progressive” Indian fiction as part of the PRC’s model of world literature; and fifth, a counter-intuitive yet strikingly productive and cross-media transplantation of Hindi popular fiction in 1980s China. Although post-1950 China and India shared considerable common grounds for developing literary contact, nevertheless the ways they engaged with each other’s modern literature differed significantly due to their different literary cultures, political systems, and Cold War ideologies. The result is a landscape of literary relations that is markedly horizontal but nonetheless asymmetrical
Developing a large scale FrameNet for Italian - The IFrameNet experience
In this thesis we present the development and the current status of the IFrameNet project, aimed at the construction of a large-scale lexical semantic resource for the Italian language based on Frame Semantics theories. We will begin by contextualizing our work in the wider context of Frame Semantics and of the FrameNet project, which, since 1997, has attempted to apply these theories to lexicography. We will then analyse and discuss the applicability of the structure of the American resource to Italian and more specifically we will focus on the domain of fear, worry, and anxiety. We will finally propose some modifications aimed at improving this domain of the resource in relation to its coherence, its ability to accurately represent the linguistic reality and in particular in order to make it possible to apply it to Italian