Search CORE

7 research outputs found

Hizkuntza-ulermenari ekarpenak: N-gramen arteko atentzio eta lerrokatzeak antzekotasun eta inferentzia interpretagarrirako.

Author: López Gazpio Iñigo
Publication venue
Publication date: 01/01/2018
Field of study

148 p.Hizkuntzaren Prozesamenduaren bitartez hezkuntzaren alorreko sistemaadimendunak hobetzea posible da, ikasleen eta irakasleen lan-karganabarmenki arinduz. Tesi honetan esaldi-mailako hizkuntza-ulermena aztertueta proposamen berrien bitartez sistema adimendunen hizkuntza-ulermenaareagotzen dugu, sistemei erabiltzailearen esaldiak modu zehatzagoaninterpretatzeko gaitasuna emanez. Esaldiak modu finean interpretatzekogaitasunak feedbacka modu automatikoan sortzeko aukera ematen baitu.Tesi hau garatzeko hizkuntza-ulermenean sakondu dugu antzekotasunsemantikoari eta inferentzia logikoari dagokien ezaugarriak eta sistemakaztertuz. Bereziki, esaldi barneko hitzak multzotan egituratuz eta lerrokatuzesaldiak hobeto modelatu daitezkeela erakutsi dugu. Horretarako, hitz solteaklerrokatzen dituen aurrekarien egoerako neurona-sare sistema batinplementatu eta n-grama arbitrarioak lerrokatzeko moldaketak egin ditugu.Hitzen arteko lerrokatzea aspalditik ezaguna bada ere, tesi honek, lehen aldiz,n-grama arbitrarioak atentzio-mekanismo baten bitartez lerrokatzekoproposamenak plazaratzen ditu.Gainera, esaldien arteko antzekotasunak eta desberdintasunak moduzehatzean identifikatzeko, esaldien interpretagarritasuna areagotzeko etaikasleei feedback zehatza emateko geruza berri bat sortu dugu: iSTS.Antzekotasun semantikoa eta inferentzia logikoa biltzen dituen geruzahorrekin chunkak lerrokatu ditugu, eta ikasleei feedback zehatza emateko gaiizan garela frogatu dugu hezkuntzaren testuinguruko bi ebaluazioeszenariotan.Tesi honekin batera hainbat sistema eta datu-multzo argitaratu diraetorkizunean komunitate zientifikoak ikertzen jarrai dezan

LAReferencia - Red Federada de Repositorios Institucionales de Publicaciones Científicas Latinoamericanas

Archivo Digital para la Docencia y la Investigación

The N-Grams Based Text Similarity Detection Approach Using Self-Organizing Maps and Similarity Measures

Author: Kurasova Olga
Stefanovič Pavel
Štrimaitis Rokas
Publication venue: 'MDPI AG'
Publication date: 01/01/2019
Field of study

In the paper the word-level n-grams based approach is proposed to find similarity between texts. The approach is a combination of two separate and independent techniques: self-organizing map (SOM) and text similarity measures. SOM’s uniqueness is that the obtained results of data clustering, as well as dimensionality reduction, are presented in a visual form. The four measures have been evaluated: cosine, dice, extended Jaccard’s, and overlap. First of all, texts have to be converted to numerical expression. For that purpose, the text has been split into the word-level n-grams and after that, the bag of n-grams has been created. The n-grams’ frequencies are calculated and the frequency matrix of dataset is formed. Various filters are used to create a bag of n-grams: stemming algorithms, number and punctuation removers, stop words, etc. All experimental investigation has been made using a corpus of plagiarized short answers dataset.This article belongs to the Special Issue Advances in Deep Learnin

Vilniaus Gedimino Technikos Universitetas: VGTU Talpykla / Vilnius Gediminas Technical University: VGTU Repository

Extending persian sentiment lexicon with idiomatic expressions for sentiment analysis

Author: Dashtipour Kia
Gelbukh Alexander
Gogate Mandar
Hussain Amir
Publication venue: Springer
Publication date: 01/01/2021
Field of study

Nowadays, it is important for buyers to know other customer opinions to make informed decisions on buying a product or service. In addition, companies and organizations can exploit customer opinions to improve their products and services. However, the Quintilian bytes of the opinions generated every day cannot be manually read and summarized. Sentiment analysis and opinion mining techniques offer a solution to automatically classify and summarize user opinions. However, current sentiment analysis research is mostly focused on English, with much fewer resources available for other languages like Persian. In our previous work, we developed PerSent, a publicly available sentiment lexicon to facilitate lexicon-based sentiment analysis of texts in the Persian language. However, PerSent-based sentiment analysis approach fails to classify the real-world sentences consisting of idiomatic expressions. Therefore, in this paper, we describe an extension of the PerSent lexicon with more than 1000 idiomatic expressions, along with their polarity, and propose an algorithm to accurately classify Persian text. Comparative experimental results reveal the usefulness of the extended lexicon for sentiment analysis as compared to PerSent lexicon-based sentiment analysis as well as Persian-to-English translation-based approaches. The extended version of the lexicon will be made publicly available

Repository@Napier

Attention in Natural Language Processing

Author: Galassi Andrea
Lippi Marco
Torroni Paolo
Publication venue: 'Institute of Electrical and Electronics Engineers (IEEE)'
Publication date: 01/01/2020
Field of study

Attention is an increasingly popular mechanism used in a wide range of neural architectures. The mechanism itself has been realized in a variety of formats. However, because of the fast-paced advances in this domain, a systematic overview of attention is still missing. In this article, we define a unified model for attention architectures in natural language processing, with a focus on those designed to work with vector representations of the textual data. We propose a taxonomy of attention models according to four dimensions: the representation of the input, the compatibility function, the distribution function, and the multiplicity of the input and/or output. We present the examples of how prior information can be exploited in attention models and discuss ongoing research efforts and open challenges in the area, providing the first extensive categorization of the vast body of literature in this exciting domain

arXiv.org e-Print Archive

Archivio istituzionale della ricerca - Alma Mater Studiorum Università di Bologna

Archivio istituzionale della ricerca - Università di Modena e Reggio Emilia

Word n-gram attention models for sentence similarity and inference

Author: Agirre Bengoa Eneko
Lapata Mirella
López Gazpio Iñigo
Maritxalar Anglada Montse
Publication venue: Elsevier
Publication date: 22/04/2019
Field of study

Semantic Textual Similarity and Natural Language Inference are two popular natural language understanding tasks used to benchmark sentence representation models where two sentences are paired. In such tasks sentences are represented as bag of words, sequences, trees or convolutions, but the attention model is based on word pairs. In this article we introduce the use of word n-grams in the attention model. Our results on five datasets show an error reduction of up to 41% with respect to the word-based attention model. The improvements are especially relevant with low data regimes and, in the case of natural language inference, on the recently released hard subset of Natural Language Inference datasets.This research is supported by a doctoral grant from MINECO. We also thank for technical and human support provided by IZO-SGI SGIker of UPV/EHU and European funding (ERDF and ESF), and, also, we gratefully acknowledge the support of NVIDIA Corporation with the donation of a Pascal Titan X GPU used for this research. This research was also partially supported by the UPV/EHU (excellence research group) and the Spanish Research Agency LIHLITH project (PCIN-2017-118 / AEI) in the framework of EU ERA-Net CHIST-ERA. Finally, we would like to thank Siva Reddy16 for his time, interest and fruitful comments made to the present work

Archivo Digital para la Docencia y la Investigación

WITHDRAWN: Word n-gram attention models for sentence similarity and inference

Author: Abney
Agirre
Bahdanau
Berg-Kirkpatrick
Bergstra
Bowman
Burrows
Cer
Chen
Cho
Choi
Conneau
Dagan
Duchi
Gong
Gururangan
Hochreiter
Kingma
Kiros
Lopez-Gazpio
Luong
Marelli
Mikolov
Nangia
Pagliardini
Parikh
Pennington
Press
Rajpurkar
Riordan
Shen
Stolcke
Tai
White
Williams
Yang
Yin
Zhao
Zhao
Publication venue: 'Elsevier BV'
Publication date
Field of study

Crossref