Search CORE

68 research outputs found

Applying dynamic Bayesian networks in transliteration detection and generation

Author: Nabende Peter
Publication venue: s.n.
Publication date: 01/01/2011
Field of study

ARTS repository - University of Groningen

Applying dynamic Bayesian networks in transliteration detection and generation

Author: Nabende Peter
Publication venue: s.n.
Publication date: 01/01/2011
Field of study

Dissertations of the University of Groningen

Applying dynamic Bayesian networks in transliteration detection and generation

Author: Nabende Peter
Publication venue: s.n.
Publication date: 01/01/2011
Field of study

Proceedings - University of Groningen

Opinion-Mining on Marglish and Devanagari Comments of YouTube Cookery Channels Using Parametric and Non-Parametric Learning Models

Author: Kaushik Abhishek
Shah Janice
Shah Sonali
Sharma Shubham
Publication venue: Technological University Dublin
Publication date: 01/01/2020
Field of study

YouTube is a boon, and through it people can educate, entertain, and express themselves about various topics. YouTube India currently has millions of active users. As there are millions of active users it can be understood that the data present on the YouTube will be large. With India being a very diverse country, many people are multilingual. People express their opinions in a code-mix form. Code-mix form is the mixing of two or more languages. It has become a necessity to perform Sentiment Analysis on the code-mix languages as there is not much research on Indian code-mix language data. In this paper, Sentiment Analysis (SA) is carried out on the Marglish (Marathi + English) as well as Devanagari Marathi comments which are extracted from the YouTube API from top Marathi channels. Several machine-learning models are applied on the dataset along with 3 different vectorizing techniques. Multilayer Perceptron (MLP) with Count vectorizer provides the best accuracy of 62.68% on the Marglish dataset and Bernoulli Naïve Bayes along with the Count vectorizer, which gives accuracy of 60.60% on the Devanagari dataset. Multilayer Perceptron and Bernoulli Naïve Bayes are considered to be the best performing algorithms. 10-fold cross-validation and statistical testing was also carried out on the dataset to confirm the results

Multidisciplinary Digital Publishing Institute

Arrow@TUDublin

A word sense disambiguation corpus for Urdu

Author: Nawab Rao Muhammad Adeel
Rayson Paul
Saeed Ali
Stevenson Mark
Publication venue: 'Springer Science and Business Media LLC'
Publication date: 01/09/2019
Field of study

The aim of word sense disambiguation (WSD) is to correctly identify the meaning of a word in context. All natural languages exhibit word sense ambiguities and these are often hard to resolve automatically. Consequently WSD is considered an important problem in natural language processing (NLP). Standard evaluation resources are needed to develop, evaluate and compare WSD methods. A range of initiatives have lead to the development of benchmark WSD corpora for a wide range of languages from various language families. However, there is a lack of benchmark WSD corpora for South Asian languages including Urdu, despite there being over 300 million Urdu speakers and a large amounts of Urdu digital text available online. To address that gap, this study describes a novel benchmark corpus for the Urdu Lexical Sample WSD task. This corpus contains 50 target words (30 nouns, 11 adjectives, and 9 verbs). A standard, manually crafted dictionary called Urdu Lughat is used as a sense inventory. Four baseline WSD approaches were applied to the corpus. The results show that the best performance was obtained using a simple Bag of Words approach. To encourage NLP research on the Urdu language the corpus is freely available to the research community

Lancaster E-Prints

A word sense disambiguation corpus for Urdu

Author: A Daud
A McEnery
A Naseer
AI Arieff
Ali Saeed
BD Prasad
E McKean
H Schütze
J Jiang
JP Gee
M Abid
M Anand Kumar
M Sharjeel
M Sokolova
Mark Stevenson
N Mishra
NS Altman
P Edmonds
Paul Rayson
R Lior
R Navigli
Rao Muhammad Adeel Nawab
S Landes
SN Khan
SZ Arif
T Sreeganesh
UD Board
WN Francis
WS McCulloch
Publication venue: 'Springer Science and Business Media LLC'
Publication date: 15/09/2019
Field of study

Crossref

White Rose Research Online