2,882 research outputs found

    Opinion-Mining on Marglish and Devanagari Comments of YouTube Cookery Channels Using Parametric and Non-Parametric Learning Models

    Get PDF
    YouTube is a boon, and through it people can educate, entertain, and express themselves about various topics. YouTube India currently has millions of active users. As there are millions of active users it can be understood that the data present on the YouTube will be large. With India being a very diverse country, many people are multilingual. People express their opinions in a code-mix form. Code-mix form is the mixing of two or more languages. It has become a necessity to perform Sentiment Analysis on the code-mix languages as there is not much research on Indian code-mix language data. In this paper, Sentiment Analysis (SA) is carried out on the Marglish (Marathi + English) as well as Devanagari Marathi comments which are extracted from the YouTube API from top Marathi channels. Several machine-learning models are applied on the dataset along with 3 different vectorizing techniques. Multilayer Perceptron (MLP) with Count vectorizer provides the best accuracy of 62.68% on the Marglish dataset and Bernoulli Naïve Bayes along with the Count vectorizer, which gives accuracy of 60.60% on the Devanagari dataset. Multilayer Perceptron and Bernoulli Naïve Bayes are considered to be the best performing algorithms. 10-fold cross-validation and statistical testing was also carried out on the dataset to confirm the results

    Crisis translation: considering language needs in multilingual disaster settings

    Get PDF
    Purpose: The purpose of this conceptual paper is to highlight the role that language translation can play in disaster prevention and management and to make the case for increased attention to language translation in crisis communication. Approach: The article draws on literature relating to disaster management to suggest that translation is a perennial issue in crisis communication. Findings: Although communication with multicultural and multilinguistic communities is seen as being in urgent need of attention, we find that the role of translation in enabling this is underestimated, if not unrecognised. Value: This article raises awareness of the need for urgent attention to be given by scholars and practitioners to the role of translation in crisis communication

    FinTech, blockchain and Islamic finance : an extensive literature review

    Get PDF
    Purpose: The paper aims to review the academic research work done in the area of Islamic financial technology. The Islamic FinTech area has been classified into three broad categories of the Islamic FinTech, Islamic Financial technology opportunities and challenges, Cryptocurrency/Blockchain sharia compliance and law/regulation. Finally, the study identifies and highlights the opportunities and challenges that Islamic Financial institutions can learn from the conventional FinTech organization across the world. Approach/Methodology/Design: The study collected 133 research studies (50 from Social Science Research Network (SSRN), 30 from Research gate, 33 from Google Scholar and 20 from other sources) in the area of Islamic Financial Technology. The study presents the systematic review of the above studies. Findings: The study classifies the Islamic FinTech into three broad categories namely, Islamic FinTech opportunities and challenges, Cryptocurrency/Blockchain sharia compliance and law/regulation. The study identifies that the sharia compliance related to the cryptocurrency/Blockchain is the biggest challenge which Islamic FinTech organizations are facing. During our review we also find that Islamic FinTech organizations are to be considered as partners by the Islamic Financial Institutions (IFI’s) than the competitors. If Islamic Financial institutions want to increase efficiency, transparency and customer satisfaction they have to adopt FinTech and become partners with the FinTech companies. Practical Implications: The study will contribute positively to the understanding of Islamic Fintech for the academia, industry, regulators, investors and other FinTech users. Originality/Value: The study believes to contribute positively to understanding of Fintech based technology like cryptocurrency/Blockchain from sharia perspective.peer-reviewe

    Proceedings of the 17th Annual Conference of the European Association for Machine Translation

    Get PDF
    Proceedings of the 17th Annual Conference of the European Association for Machine Translation (EAMT

    Probing Multilingual BERT for Genetic and Typological Signals

    Full text link
    We probe the layers in multilingual BERT (mBERT) for phylogenetic and geographic language signals across 100 languages and compute language distances based on the mBERT representations. We 1) employ the language distances to infer and evaluate language trees, finding that they are close to the reference family tree in terms of quartet tree distance, 2) perform distance matrix regression analysis, finding that the language distances can be best explained by phylogenetic and worst by structural factors and 3) present a novel measure for measuring diachronic meaning stability (based on cross-lingual representation variability) which correlates significantly with published ranked lists based on linguistic approaches. Our results contribute to the nascent field of typological interpretability of cross-lingual text representations.Comment: COLING 202

    DravidianCodeMix: Sentiment Analysis and Offensive Language Identification Dataset for Dravidian Languages in Code-Mixed Text

    Get PDF
    This paper describes the development of a multilingual, manually annotated dataset for three under-resourced Dravidian languages generated from social media comments. The dataset was annotated for sentiment analysis and offensive language identification for a total of more than 60,000 YouTube comments. The dataset consists of around 44,000 comments in Tamil-English, around 7,000 comments in Kannada-English, and around 20,000 comments in Malayalam-English. The data was manually annotated by volunteer annotators and has a high inter-annotator agreement in Krippendorff's alpha. The dataset contains all types of code-mixing phenomena since it comprises user-generated content from a multilingual country. We also present baseline experiments to establish benchmarks on the dataset using machine learning methods. The dataset is available on Github (https://github.com/bharathichezhiyan/DravidianCodeMix-Dataset) and Zenodo (https://zenodo.org/record/4750858\#.YJtw0SYo\_0M).Comment: 36 page

    Layer or representation space: what makes BERT-based evaluation metrics robust?

    Get PDF
    The evaluation of recent embedding-based evaluation metrics for text generation is primarily based on measuring their correlation with human evaluations on standard benchmarks. However, these benchmarks are mostly from similar domains to those used for pretraining word embeddings. This raises concerns about the (lack of) generalization of embedding-based metrics to new and noisy domains that contain a different vocabulary than the pretraining data. In this paper, we examine the robustness of BERTScore, one of the most popular embedding-based metrics for text generation. We show that (a) an embedding-based metric that has the highest correlation with human evaluations on a standard benchmark can have the lowest correlation if the amount of input noise or unknown tokens increases, (b) taking embeddings from the first layer of pretrained models improves the robustness of all metrics, and (c) the highest robustness is achieved when using character-level embeddings, instead of token-based embeddings, from the first layer of the pretrained model
    corecore