Search CORE

9,473 research outputs found

Analyzing and Interpreting Neural Networks for NLP: A Report on the First BlackboxNLP Workshop

Author: Alishahi Afra
Chrupała Grzegorz
Linzen Tal
Publication venue
Publication date: 05/04/2019
Field of study

The EMNLP 2018 workshop BlackboxNLP was dedicated to resources and techniques specifically developed for analyzing and understanding the inner-workings and representations acquired by neural models of language. Approaches included: systematic manipulation of input to neural networks and investigating the impact on their performance, testing whether interpretable knowledge can be decoded from intermediate representations acquired by neural networks, proposing modifications to neural network architectures to make their knowledge state or generated output more explainable, and examining the performance of networks on simplified or formal languages. Here we review a number of representative studies in each category

arXiv.org e-Print Archive

Tilburg University Repository

Learning with Latent Language

Author: Andreas Jacob
Klein Dan
Levine Sergey
Publication venue
Publication date: 01/11/2017
Field of study

The named concepts and compositional operators present in natural language provide a rich source of information about the kinds of abstractions humans use to navigate the world. Can this linguistic background knowledge improve the generality and efficiency of learned classifiers and control policies? This paper aims to show that using the space of natural language strings as a parameter space is an effective way to capture natural task structure. In a pretraining phase, we learn a language interpretation model that transforms inputs (e.g. images) into outputs (e.g. labels) given natural language descriptions. To learn a new concept (e.g. a classifier), we search directly in the space of descriptions to minimize the interpreter's loss on training examples. Crucially, our models do not require language data to learn these concepts: language is used only in pretraining to impose structure on subsequent learning. Results on image classification, text editing, and reinforcement learning show that, in all settings, models with a linguistic parameterization outperform those without

arXiv.org e-Print Archive

Crossref

Word contexts enhance the neural representation of individual letters in early visual cortex

Author: De Lange F.
Ekman M.
Hagoort P.
Heilbron M.
Richter D.
Publication venue: 'Springer Science and Business Media LLC'
Publication date: 01/01/2020
Field of study

Visual context facilitates perception, but how this is neurally implemented remains unclear. One example of contextual facilitation is found in reading, where letters are more easily identified when embedded in a word. Bottom-up models explain this word advantage as a post-perceptual decision bias, while top-down models propose that word contexts enhance perception itself. Here, we arbitrate between these accounts by presenting words and nonwords and probing the representational fidelity of individual letters using functional magnetic resonance imaging. In line with top-down models, we find that word contexts enhance letter representations in early visual cortex. Moreover, we observe increased coupling between letter information in visual cortex and brain activity in key areas of the reading network, suggesting these areas may be the source of the enhancement. Our results provide evidence for top-down representational enhancement in word recognition, demonstrating that word contexts can modulate perceptual processing already at the earliest visual regions

Radboud Repository

MPG.PuRe

A Comparison of Cartographic and Toponymic Databases in a Multilingual Environment: A Methodology for Detecting Redundancies Using ETL and GIS Tools

Author: Amaro Mellado José Lázaro
Mitxelena Hoyos Oihana
Publication venue: 'MDPI AG'
Publication date: 01/02/2023
Field of study

Toponymy, a transversal discipline for geography, linguistics, and history, finds one of its main supports in cartography. Due to exhaustiveness on the territory, cadastral cartography and its toponymy have the ideal characteristics to develop systematic geographical analyses. Moreover, cadastre and geographical names are part of the geographic reference data according to Annex 1 of the INSPIRE directive. This work presents the design, implementation, and application of a methodology based on Geographic Information Systems and Extract, Transform, and Load (ETL) tools for detecting coincidences between the cadastral geoinformation and the official gazetteer corresponding to the province of Gipuzkoa, Spain. Methodologically, this study proposes a solution to the issues raised by bilingualism in the study area. This problem is approached a priori, in the previous data treatment, and a posteriori, applying semantic criteria. The results show a match between the datasets of close to 40%. In this way, the uniqueness and richness of the analyzed source and its outstanding contribution to the potential integration of the official toponymic corpus are evidenced

Directory of Open Access Journals

Archivo Digital para la Docencia y la Investigación

idUS. Depósito de Investigación Universidad de Sevilla

A Unified Kernel Approach For Learning Typed Sentence Rewritings

Author: Gleize Martin
Grau Brigitte
Publication venue: 'Association for Computational Linguistics (ACL)'
Publication date: 01/01/2015
Field of study

International audienceMany high level natural language processing problems can be framed as determining if two given sentences are a rewriting of each other. In this paper, we propose a class of kernel functions, referred to as type-enriched string rewriting kernels, which, used in kernel-based machine learning algorithms, allow to learn sentence rewritings. Unlike previous work, this method can be fed external lexical semantic relations to capture a wider class of rewriting rules. It also does not assume preliminary syntactic parsing but is still able to provide a unified framework to capture syntactic structure and alignments between the two sentences. We experiment on three different natural sentence rewriting tasks and obtain state-of-the-art results for all of them

Crossref

Accurate Cardinality Estimation of Co-occurring Words Using Suffix Trees (Extended Version)

Author: Böhm Klemens
Schäler Martin
Willkomm Jens
Publication venue: Karlsruher Institut für Technologie
Publication date: 16/01/2021
Field of study

Estimating the cost of a query plan is one of the hardest problems in query optimization. This includes cardinality estimates of string search patterns, of multi-word strings like phrases or text snippets in particular. At first sight, suffix trees address this problem. To curb the memory usage of a suffix tree, one often prunes the tree to a certain depth. But this pruning method "takes away" more information from long strings than from short ones. This problem is particularly severe with sets of long strings, the setting studied here. In this article, we propose respective pruning techniques. Our approaches remove characters with low information value. The various variants determine a character\u27s information value in different ways, e.g., by using conditional entropy with respect to previous characters in the string. Our experiments show that, in contrast to the well-known pruned suffix tree, our technique provides significantly better estimations when the tree size is reduced by 60% or less. Due to the redundancy of natural language, our pruning techniques yield hardly any error for tree-size reductions of up to 50%

KITopen

Machine Learning Theory and Practice as a Source of Insight into Universal Grammar

Author: Lappin Shalom
Shieber S
Publication venue
Publication date: 01/01/2007
Field of study

Articl

SAS-SPACE

Querying and Efficiently Searching Large, Temporal Text Corpora

Author: Willkomm Jens
Publication venue: KIT-Bibliothek, Karlsruhe
Publication date: 21/10/2021
Field of study

KITopen