Search CORE

366 research outputs found

Natural language understanding: instructions for (Present and Future) use

Author: Navigli R.
Publication venue: 'International Joint Conferences on Artificial Intelligence'
Publication date: 01/01/2018
Field of study

In this paper I look at Natural Language Understanding, an area of Natural Language Processing aimed at making sense of text, through the lens of a visionary future: what do we expect a machine should be able to understand? and what are the key dimensions that require the attention of researchers to make this dream come true

Crossref

ZENODO

NEUROSURGERY ENTHUSIASTIC WOMEN SOCIETY

Archivio della ricerca- Università di Roma La Sapienza

Jumping Finite Automata for Tweet Comprehension

Author: Ade-Ibijola Abejide
Obare Stephen
Okeyo George
Publication venue: 'Institute of Electrical and Electronics Engineers (IEEE)'
Publication date: 15/09/2019
Field of study

Every day, over one billion social media text messages are generated worldwide, which provides abundant information that can lead to improvements in lives of people through evidence-based decision making. Twitter is rich in such data but there are a number of technical challenges in comprehending tweets including ambiguity of the language used in tweets which is exacerbated in under resourced languages. This paper presents an approach based on Jumping Finite Automata for automatic comprehension of tweets. We construct a WordNet for the language of Kenya (WoLK) based on analysis of tweet structure, formalize the space of tweet variation and abstract the space on a Finite Automata. In addition, we present a software tool called Automata-Aided Tweet Comprehension (ATC) tool that takes raw tweets as input, preprocesses, recognise the syntax and extracts semantic information to 86% success rate

Crossref

De Montfort University Open Research Archive

Resource-light Bantu part-of-speech tagging

Author: De Pauw Guy
de Schryver Gilles-Maurice
van de Loo Janneke
Publication venue: European Language Resources Association
Publication date: 01/01/2012
Field of study

Ghent University Academic Bibliography

Institutional Repository Universiteit Antwerpen

Inducing Multilingual Text Analysis Tools Using Bidirectional Recurrent Neural Networks

Author: Besacier Laurent
Semmar Nasredine
Zennaki Othman
Publication venue: HAL CCSD
Publication date: 29/09/2016
Field of study

International audienceThis work focuses on the rapid development of linguistic annotation tools for resource-poor languages. We experiment several cross-lingual annotation projection methods using Recurrent Neural Networks (RNN) models. The distinctive feature of our approach is that our multilingual word representation requires only a parallel corpus between the source and target language. More precisely, our method has the following characteristics: (a) it does not use word alignment information, (b) it does not assume any knowledge about foreign languages, which makes it applicable to a wide range of resource-poor languages, (c) it provides truly multilingual taggers. We investigate both uni-and bi-directional RNN models and propose a method to include external information (for instance low level information from POS) in the RNN to train higher level taggers (for instance, super sense taggers). We demonstrate the validity and genericity of our model by using parallel corpora (obtained by manual or automatic translation). Our experiments are conducted to induce cross-lingual POS and super sense taggers

arXiv.org e-Print Archive

Hal - Université Grenoble Alpes

HAL-CEA

Unsupervised Machine Learning Approach for Tigrigna Word Sense Disambiguation

Author: Reda Meresa Mebrahtu
Publication venue: The International Institute for Science, Technology and Education (IISTE)
Publication date: 30/07/2018
Field of study

All human languages have words that can mean different things in different contexts. Word sense disambiguation (WSD) is an open problem of natural language processing, which governs the process of identifying which sense of a word (i.e. meaning) is used in a sentence, when the word has multiple meanings (polysemy). We use unsupervised machine learning techniques to address the problem of automatically deciding the correct sense of an ambiguous word Tigrigna texts based on its surrounding context. And we report experiments on four selected Tigrigna ambiguous words due to lack of sufficient training data; these are መደብ read as “medeb” has three different meaning (Program, Traditional bed and Grouping), ሓለፈ read as “halefe”; has four dissimilar meanings (Pass, Promote, Boss and Pass away), ሃደመ read as “hademe”; has two different meaning (Running and Building house) and, ከበረ read as “kebere”; has two different meaning (Respecting and Expensive).Finally we tested five clustering algorithms (simple k means, hierarchical agglomerative: Single, Average and complete link and Expectation Maximization algorithms) in the existing implementation of Weka 3.8.1 package. “Use training set” evaluation mode was selected to learn the selected algorithms in the preprocessed dataset. We have evaluated the algorithms for the four ambiguous words and achieved the best accuracy within the range of 67 to 83.3 for EM which is encouraging result. Keywords: Attribute- Relation File Format, Cross Validation, Consonant Vowel, Machine Readable Dictionary, Natural Language Processing, System for Ethiopic Representation in ASCII, Word Sense Disambiguatio

International Institute for Science, Technology and Education (IISTE): E-Journals

A Word Sense Disambiguation Model for Amharic Words using Semi-Supervised Learning Paradigm

Author: Meshesha M
Ramesh BP
Teferra S
Wassie G
Publication venue: 'African Journals Online (AJOL)'
Publication date: 18/11/2014
Field of study

The main objective of this research was to design a WSD (word sense disambiguation) prototype model for Amharic words using semi-supervised learning method to extract training sets which minimizes the amount of the required human intervention and it can produce considerable improvement in learning accuracy. Due to the unavailability of Amharic word net, only five words were selected. These words were atena (አጠና), derese (ደረሰ), tenesa (ተነሳ), bela (በላ) and ale (አለ). A separate data sets using five ambiguous words were prepared for the development of this Amharic WSD prototype. The final classification task was done on fully labelled training set using Adaboost, bagging, and AD tree classification algorithms on WEKA package.Keywords: Ambiguity Bootstrapping Word Sense disambiguatio

AJOL - African Journals Online

A Simple and Effective Method of Cross-Lingual Plagiarism Detection

Author: Avetisyan Arutyun
Avetisyan Karen
Ghukasyan Tsolak
Malajyan Arthur
Publication venue
Publication date: 05/04/2023
Field of study

We present a simple cross-lingual plagiarism detection method applicable to a large number of languages. The presented approach leverages open multilingual thesauri for candidate retrieval task and pre-trained multilingual BERT-based language models for detailed analysis. The method does not rely on machine translation and word sense disambiguation when in use, and therefore is suitable for a large number of languages, including under-resourced languages. The effectiveness of the proposed approach is demonstrated for several existing and new benchmarks, achieving state-of-the-art results for French, Russian, and Armenian languages

arXiv.org e-Print Archive