Search CORE

851 research outputs found

BRENT: Bidirectional Retrieval Enhanced Norwegian Transformer

Author: Charpentier Lucas Georges Gabriel
Rønningstad Egil
Samuel David
Wold Sondre
Publication venue
Publication date: 19/04/2023
Field of study

Retrieval-based language models are increasingly employed in question-answering tasks. These models search in a corpus of documents for relevant information instead of having all factual knowledge stored in its parameters, thereby enhancing efficiency, transparency, and adaptability. We develop the first Norwegian retrieval-based model by adapting the REALM framework and evaluating it on various tasks. After training, we also separate the language model, which we call the reader, from the retriever components, and show that this can be fine-tuned on a range of downstream tasks. Results show that retrieval augmented language modeling improves the reader's performance on extractive question-answering, suggesting that this type of training improves language models' general ability to use context and that this does not happen at the expense of other abilities such as part-of-speech tagging, dependency parsing, named entity recognition, and lemmatization. Code, trained models, and data are made publicly available.Comment: Accepted for NoDaLiDa 2023, main conferenc

arXiv.org e-Print Archive

mARC: Memory by Association and Reinforcement of Contexts

Author: Descourt Patrice
Rimoux Norbert
Publication venue
Publication date: 10/12/2013
Field of study

This paper introduces the memory by Association and Reinforcement of Contexts (mARC). mARC is a novel data modeling technology rooted in the second quantization formulation of quantum mechanics. It is an all-purpose incremental and unsupervised data storage and retrieval system which can be applied to all types of signal or data, structured or unstructured, textual or not. mARC can be applied to a wide range of information clas-sification and retrieval problems like e-Discovery or contextual navigation. It can also for-mulated in the artificial life framework a.k.a Conway "Game Of Life" Theory. In contrast to Conway approach, the objects evolve in a massively multidimensional space. In order to start evaluating the potential of mARC we have built a mARC-based Internet search en-gine demonstrator with contextual functionality. We compare the behavior of the mARC demonstrator with Google search both in terms of performance and relevance. In the study we find that the mARC search engine demonstrator outperforms Google search by an order of magnitude in response time while providing more relevant results for some classes of queries

arXiv.org e-Print Archive

CiteSeerX

Encoding Sequential Information in Semantic Space Models: Comparing Holographic Reduced Representation and Random Permutation

Author: Gabriel Recchia
Magnus Sahlgren
Michael N. Jones
Pentti Kanerva
Publication venue: 'Hindawi Limited'
Publication date: 01/01/2015
Field of study

Circular convolution and random permutation have each been proposed as neurally plausible binding operators capable of encoding sequential information in semantic memory. We perform several controlled comparisons of circular convolution and random permutation as means of encoding paired associates as well as encoding sequential information. Random permutations outperformed convolution with respect to the number of paired associates that can be reliably stored in a single memory trace. Performance was equal on semantic tasks when using a small corpus, but random permutations were ultimately capable of achieving superior performance due to their higher scalability to large corpora. Finally, “noisy” permutations in which units are mapped to other units arbitrarily (no one-to-one mapping) perform nearly as well as true permutations. These findings increase the neurological plausibility of random permutations and highlight their utility in vector space models of semantics

Crossref

Directory of Open Access Journals

PubMed Central

Improving Retrieval-Based Question Answering with Deep Inference Models

Author: Pirtoaca George-Sebastian
Rebedea Traian
Ruseti Stefan
Publication venue
Publication date: 06/05/2019
Field of study

Question answering is one of the most important and difficult applications at the border of information retrieval and natural language processing, especially when we talk about complex science questions which require some form of inference to determine the correct answer. In this paper, we present a two-step method that combines information retrieval techniques optimized for question answering with deep learning models for natural language inference in order to tackle the multi-choice question answering in the science domain. For each question-answer pair, we use standard retrieval-based models to find relevant candidate contexts and decompose the main problem into two different sub-problems. First, assign correctness scores for each candidate answer based on the context using retrieval models from Lucene. Second, we use deep learning architectures to compute if a candidate answer can be inferred from some well-chosen context consisting of sentences retrieved from the knowledge base. In the end, all these solvers are combined using a simple neural network to predict the correct answer. This proposed two-step model outperforms the best retrieval-based solver by over 3% in absolute accuracy.Comment: 8 pages, 2 figures, 8 tables, accepted at IJCNN 201

arXiv.org e-Print Archive

Crossref

Named entity recognition on flemish audio-visual and news-paper archives

Author: De Moor An
Deleu Johannes
Demeester Piet
Demeester Thomas
Vermeulen Brecht
Publication venue: Ghent University, Department of Information technology
Publication date: 01/01/2012
Field of study

Ghent University Academic Bibliography

Living Knowledge

Author: Baldry Anthony
Dutta Biswanath
Giunchiglia Fausto
Maltese Vincenzo
Publication venue: Università degli Studi di Trento
Publication date: 01/03/2012
Field of study

Diversity, especially manifested in language and knowledge, is a function of local goals, needs, competences, beliefs, culture, opinions and personal experience. The Living Knowledge project considers diversity as an asset rather than a problem. With the project, foundational ideas emerged from the synergic contribution of different disciplines, methodologies (with which many partners were previously unfamiliar) and technologies flowed in concrete diversity-aware applications such as the Future Predictor and the Media Content Analyser providing users with better structured information while coping with Web scale complexities. The key notions of diversity, fact, opinion and bias have been defined in relation to three methodologies: Media Content Analysis (MCA) which operates from a social sciences perspective; Multimodal Genre Analysis (MGA) which operates from a semiotic perspective and Facet Analysis (FA) which operates from a knowledge representation and organization perspective. A conceptual architecture that pulls all of them together has become the core of the tools for automatic extraction and the way they interact. In particular, the conceptual architecture has been implemented with the Media Content Analyser application. The scientific and technological results obtained are described in the following

Unitn-eprints Research

A history and theory of textual event detection and recognition

Author: Chen Yanping
Ding Zehua
Huang Ruizhang
Qin Yongbin
Shah Nazaraf
Zheng Qinghua
Publication venue: 'Institute of Electrical and Electronics Engineers (IEEE)'
Publication date: 17/11/2020
Field of study

Coventry University Pure Portal