7 research outputs found

    Hizkuntza-ulermenari ekarpenak: N-gramen arteko atentzio eta lerrokatzeak antzekotasun eta inferentzia interpretagarrirako.

    Get PDF
    148 p.Hizkuntzaren Prozesamenduaren bitartez hezkuntzaren alorreko sistemaadimendunak hobetzea posible da, ikasleen eta irakasleen lan-karganabarmenki arinduz. Tesi honetan esaldi-mailako hizkuntza-ulermena aztertueta proposamen berrien bitartez sistema adimendunen hizkuntza-ulermenaareagotzen dugu, sistemei erabiltzailearen esaldiak modu zehatzagoaninterpretatzeko gaitasuna emanez. Esaldiak modu finean interpretatzekogaitasunak feedbacka modu automatikoan sortzeko aukera ematen baitu.Tesi hau garatzeko hizkuntza-ulermenean sakondu dugu antzekotasunsemantikoari eta inferentzia logikoari dagokien ezaugarriak eta sistemakaztertuz. Bereziki, esaldi barneko hitzak multzotan egituratuz eta lerrokatuzesaldiak hobeto modelatu daitezkeela erakutsi dugu. Horretarako, hitz solteaklerrokatzen dituen aurrekarien egoerako neurona-sare sistema batinplementatu eta n-grama arbitrarioak lerrokatzeko moldaketak egin ditugu.Hitzen arteko lerrokatzea aspalditik ezaguna bada ere, tesi honek, lehen aldiz,n-grama arbitrarioak atentzio-mekanismo baten bitartez lerrokatzekoproposamenak plazaratzen ditu.Gainera, esaldien arteko antzekotasunak eta desberdintasunak moduzehatzean identifikatzeko, esaldien interpretagarritasuna areagotzeko etaikasleei feedback zehatza emateko geruza berri bat sortu dugu: iSTS.Antzekotasun semantikoa eta inferentzia logikoa biltzen dituen geruzahorrekin chunkak lerrokatu ditugu, eta ikasleei feedback zehatza emateko gaiizan garela frogatu dugu hezkuntzaren testuinguruko bi ebaluazioeszenariotan.Tesi honekin batera hainbat sistema eta datu-multzo argitaratu diraetorkizunean komunitate zientifikoak ikertzen jarrai dezan

    The N-Grams Based Text Similarity Detection Approach Using Self-Organizing Maps and Similarity Measures

    Get PDF
    In the paper the word-level n-grams based approach is proposed to find similarity between texts. The approach is a combination of two separate and independent techniques: self-organizing map (SOM) and text similarity measures. SOM’s uniqueness is that the obtained results of data clustering, as well as dimensionality reduction, are presented in a visual form. The four measures have been evaluated: cosine, dice, extended Jaccard’s, and overlap. First of all, texts have to be converted to numerical expression. For that purpose, the text has been split into the word-level n-grams and after that, the bag of n-grams has been created. The n-grams’ frequencies are calculated and the frequency matrix of dataset is formed. Various filters are used to create a bag of n-grams: stemming algorithms, number and punctuation removers, stop words, etc. All experimental investigation has been made using a corpus of plagiarized short answers dataset.This article belongs to the Special Issue Advances in Deep Learnin

    Extending persian sentiment lexicon with idiomatic expressions for sentiment analysis

    Get PDF
    Nowadays, it is important for buyers to know other customer opinions to make informed decisions on buying a product or service. In addition, companies and organizations can exploit customer opinions to improve their products and services. However, the Quintilian bytes of the opinions generated every day cannot be manually read and summarized. Sentiment analysis and opinion mining techniques offer a solution to automatically classify and summarize user opinions. However, current sentiment analysis research is mostly focused on English, with much fewer resources available for other languages like Persian. In our previous work, we developed PerSent, a publicly available sentiment lexicon to facilitate lexicon-based sentiment analysis of texts in the Persian language. However, PerSent-based sentiment analysis approach fails to classify the real-world sentences consisting of idiomatic expressions. Therefore, in this paper, we describe an extension of the PerSent lexicon with more than 1000 idiomatic expressions, along with their polarity, and propose an algorithm to accurately classify Persian text. Comparative experimental results reveal the usefulness of the extended lexicon for sentiment analysis as compared to PerSent lexicon-based sentiment analysis as well as Persian-to-English translation-based approaches. The extended version of the lexicon will be made publicly available

    Attention in Natural Language Processing

    Get PDF
    Attention is an increasingly popular mechanism used in a wide range of neural architectures. The mechanism itself has been realized in a variety of formats. However, because of the fast-paced advances in this domain, a systematic overview of attention is still missing. In this article, we define a unified model for attention architectures in natural language processing, with a focus on those designed to work with vector representations of the textual data. We propose a taxonomy of attention models according to four dimensions: the representation of the input, the compatibility function, the distribution function, and the multiplicity of the input and/or output. We present the examples of how prior information can be exploited in attention models and discuss ongoing research efforts and open challenges in the area, providing the first extensive categorization of the vast body of literature in this exciting domain

    Word n-gram attention models for sentence similarity and inference

    No full text
    Semantic Textual Similarity and Natural Language Inference are two popular natural language understanding tasks used to benchmark sentence representation models where two sentences are paired. In such tasks sentences are represented as bag of words, sequences, trees or convolutions, but the attention model is based on word pairs. In this article we introduce the use of word n-grams in the attention model. Our results on five datasets show an error reduction of up to 41% with respect to the word-based attention model. The improvements are especially relevant with low data regimes and, in the case of natural language inference, on the recently released hard subset of Natural Language Inference datasets.This research is supported by a doctoral grant from MINECO. We also thank for technical and human support provided by IZO-SGI SGIker of UPV/EHU and European funding (ERDF and ESF), and, also, we gratefully acknowledge the support of NVIDIA Corporation with the donation of a Pascal Titan X GPU used for this research. This research was also partially supported by the UPV/EHU (excellence research group) and the Spanish Research Agency LIHLITH project (PCIN-2017-118 / AEI) in the framework of EU ERA-Net CHIST-ERA. Finally, we would like to thank Siva Reddy16 for his time, interest and fruitful comments made to the present work
    corecore