186,705 research outputs found

    Bertutur Santun Melalui Ttl

    Full text link
    Courtesy ( politeness ) is one of the recalled strategies to maintain good relations between speaker and hearer . In this study, politeness is defined as the awareness of speakers will image the hearer; a concept called ‘the face\u27 (Brown and Levinson, 1987). To express politeness, one of which is realized with indirect speech act (TTL), for example, to declare a function directive, speakers can use direct speech (TL) with the imperative sentences and use TTL with declarative or interrogative sentences. This study aims to find a form of directive utterances in Japanese as well as politeness strategies. The benefit of this research is to provide choice to the learner how to speak Japanese, especially for express orders using TTL. Data obtained through the identification process to find speech that is suspected to contain commands mean. This step begins by identifying and marking the discourse in the form of dialogues that contains the event said directive . Directive speech is then transcribed (romanization) , which over the alphabet of Japanese characters into Latin letters. After transcription, triangulation to native speakers. Subsequently translation (transliteration) of the Japanese language as the source language (BS) into the Indonesian language as the target (BT). The translation process includes : (1) translation literally, is glossed words each forming the speech or discourse; (2) a free translation, the translation is bound context that focuses on BT. This is done so that the translation is communicative. Based on the results of the study found seven forms of expression TTL directive to express politeness in Japanese , namely : Form [ VTE ] , [ ~ mashō ] , [ ~ kara ] , [ ~ te hoshii ] , [ ~ yattorun ? ] , [ ~ U / yo ] , and [ ~ yoni suru shikanai ]

    Natural language processing for similar languages, varieties, and dialects: A survey

    Get PDF
    There has been a lot of recent interest in the natural language processing (NLP) community in the computational processing of language varieties and dialects, with the aim to improve the performance of applications such as machine translation, speech recognition, and dialogue systems. Here, we attempt to survey this growing field of research, with focus on computational methods for processing similar languages, varieties, and dialects. In particular, we discuss the most important challenges when dealing with diatopic language variation, and we present some of the available datasets, the process of data collection, and the most common data collection strategies used to compile datasets for similar languages, varieties, and dialects. We further present a number of studies on computational methods developed and/or adapted for preprocessing, normalization, part-of-speech tagging, and parsing similar languages, language varieties, and dialects. Finally, we discuss relevant applications such as language and dialect identification and machine translation for closely related languages, language varieties, and dialects.Non peer reviewe

    Genetic Algorithm (GA) in Feature Selection for CRF Based Manipuri Multiword Expression (MWE) Identification

    Full text link
    This paper deals with the identification of Multiword Expressions (MWEs) in Manipuri, a highly agglutinative Indian Language. Manipuri is listed in the Eight Schedule of Indian Constitution. MWE plays an important role in the applications of Natural Language Processing(NLP) like Machine Translation, Part of Speech tagging, Information Retrieval, Question Answering etc. Feature selection is an important factor in the recognition of Manipuri MWEs using Conditional Random Field (CRF). The disadvantage of manual selection and choosing of the appropriate features for running CRF motivates us to think of Genetic Algorithm (GA). Using GA we are able to find the optimal features to run the CRF. We have tried with fifty generations in feature selection along with three fold cross validation as fitness function. This model demonstrated the Recall (R) of 64.08%, Precision (P) of 86.84% and F-measure (F) of 73.74%, showing an improvement over the CRF based Manipuri MWE identification without GA application.Comment: 14 pages, 6 figures, see http://airccse.org/journal/jcsit/1011csit05.pd

    Способи відтворення англомовних висловлювань з ознаками мови ворожнечі українською мовою

    Get PDF
    Стаття присвячена вивченню особливостей перекладу висловлювань з ознаками мови ворожнечі засобами української мови. Серед методів дослідження чільне місце посідає структурний метод, завдяки якому було розглянуто мову ворожнечі як систему, з характерними їй компонентами: вербальними та невербальними засобами й формами реалізації у мові. Використання зіставного методу дозволило встановити подібності, відмінності та специфіку англомовних засобів вираження мови ворожнечі через системне порівняння з українською мовою. Домінуючим методом представленого дослідження є кількісний аналіз текстів оригіналу й перекладу, який було використано для визначення найчастотніших способів відтворення засобів мови ворожнечі при перекладі. Особлива увага приділяється виявленню найчастотніших способів перекладу мови ворожнечі, що уможливлюють збереження її лінгвопрагматичних та соціокультурних особливостей. Розглянуто також лінгвопрагматичний та соціокультурні аспекти. У статті представлено результати кількісного аналізу використання перекладацьких трансформацій при відтворенні мови ворожнечі для іноземної лінгвокультури. У представленому дослідженні було продемонстровано результати кількісного аналізу, що базується на майже 400 англомовних висловлюваннях, які було зафіксовано у різних сферах використання та у різних формах прояву. Найяскравіші приклади було детально проаналізовано, обгрунтовуючи той чи інший вибір способу перекладу. Аналіз перекладу було представлено за такою структурою: мова ворожнечі у сучасних соціальних мережах; прояви мови ворожнечі, зафіксовані у сучасних рекламних кампаніях та вінтажній рекламі; прояви мови ворожнечі, викликані COVID-19; мова ворожнечі, зафіксована у телешоу, серіалах та кіно; прояви мови ворожнечі у Інтернет-мемах. Феномен «мови ворожнечі» розглядається також як актуальне міждисциплінарне явище, яке потребує ґрунтовних науковолінгвістичних розвідок, для запобігання його небезпечним наслідкам.The article is devoted to the study of the peculiarities of translation of hate speech by means of the Ukrainian language. Among the research methods, the structural method occupies a prominent place, thanks to which the language of hostility as a system is considered, with its characteristic components: verbal and nonverbal means and forms of implementation in language. The use of a comparative method allows us to establish similarities, differences and specifics of the English-language means of expression of hate speech through systematic comparison with the Ukrainian language. The dominant method of the presented research is the quantitative analysis of the texts of the original and the translation, which is used to determine the most frequent ways of reproducing the means of hate speech in translation. Particular attention is paid to the identification of the most frequent methods of translation of hate speech, which allows the preservation of its linguo pragmatic and socio-cultural features. The linguo pragmatic and sociocultural aspects are also considered. The article presents the results of the quantitative analysis of the use of translation transformations in the reproduction of hate speech for foreign linguaculture. The presented study demonstrated the results of a quantitative analysis based on nearly 400 English-language utterances, which were recorded in different uses and in different forms of manifestation. The most prominent examples were analyzed in detail, justifying one or another choice of mode of translation. The analysis of translation was presented according to the following structure: hate speech in modern social networks; manifestations of hate speech recorded in modern advertising campaigns and vintage advertisements; manifestations of hate speech caused by COVID-19; hate speech recorded in television shows, TV shows, and movies; manifestations of hate speech in Internet memes The phenomenon of "hate speech" is also seen as an actual interdisciplinary phenomenon, which requires thorough scientific and linguistic research, to prevent its dangerous consequences

    Mavericks at NADI 2023 Shared Task: Unravelling Regional Nuances through Dialect Identification using Transformer-based Approach

    Full text link
    In this paper, we present our approach for the "Nuanced Arabic Dialect Identification (NADI) Shared Task 2023". We highlight our methodology for subtask 1 which deals with country-level dialect identification. Recognizing dialects plays an instrumental role in enhancing the performance of various downstream NLP tasks such as speech recognition and translation. The task uses the Twitter dataset (TWT-2023) that encompasses 18 dialects for the multi-class classification problem. Numerous transformer-based models, pre-trained on Arabic language, are employed for identifying country-level dialects. We fine-tune these state-of-the-art models on the provided dataset. The ensembling method is leveraged to yield improved performance of the system. We achieved an F1-score of 76.65 (11th rank on the leaderboard) on the test dataset.Comment: 5 pages, 1 figure, accepted at the NADI ArabicNLP Workshop, EMNLP 202

    A Computational Study of Speech Acts in Social Media

    Get PDF
    Speech acts are expressed by humans in daily communication that perform an action (e.g. requesting, suggesting, promising, apologizing). Modeling speech acts is important for improving natural language understanding (i.e. human-computer interaction through computers’ comprehension of human language) and developing other natural language processing (NLP) tasks such as question answering and machine translation. Analyzing speech acts on large scale using computational methods could benefit linguists and social scientists in getting insights into human language and behavior. Speech acts such as suggesting, questioning and irony have aroused great attention in previous NLP research. However, two common speech acts, complaining and bragging, have remained under explored. Complaints are used to express a mismatch between reality and expectations towards an entity or event. Previous research has only focused on binary complaint identification (i.e. whether a social media post contains a complaint or not) using traditional machine learning models with feature engineering. Bragging is one of the most common ways of self-presentation, which aims to create a favorable image by disclosing positive statements about speakers or their in-group. Previous studies on bragging have been limited to manual analyses of small data sets, e.g. fewer than 300 posts. The main aim of this thesis is to enrich the study of speech acts in computational linguistics. First, we introduce the task of classifying complaint severity levels and propose a method for injecting external linguistic information into novel pretrained neural language models (e.g. BERT). We show that incorporating linguistic features is beneficial to complaint severity classification. We also improve the performance of binary complaint prediction with the help of complaint severity information in multi-task learning settings (i.e. jointly model these two tasks). Second, we introduce the task of identifying bragging and classifying their types as well as a new annotated data set. We analyze linguistic patterns of bragging and their types and present error analysis to identify model limitations. Finally, we examine the relationship between online bragging and a range of common socio-demographic factors including gender, age, education, income and popularity
    corecore