Search CORE

2,185 research outputs found

Mimicking Word Embeddings using Subword RNNs

Author: Eisenstein Jacob
Guthrie Robert
Pinter Yuval
Publication venue
Publication date: 01/01/2017
Field of study

Word embeddings improve generalization over lexical features by placing each word in a lower-dimensional space, using distributional information obtained from unlabeled data. However, the effectiveness of word embeddings for downstream NLP tasks is limited by out-of-vocabulary (OOV) words, for which embeddings do not exist. In this paper, we present MIMICK, an approach to generating OOV word embeddings compositionally, by learning a function from spellings to distributional embeddings. Unlike prior work, MIMICK does not require re-training on the original word embedding corpus; instead, learning is performed at the type level. Intrinsic and extrinsic evaluations demonstrate the power of this simple approach. On 23 languages, MIMICK improves performance over a word-based baseline for tagging part-of-speech and morphosyntactic attributes. It is competitive with (and complementary to) a supervised character-based model in low-resource settings.Comment: EMNLP 201

arXiv.org e-Print Archive

Crossref

Assessment of timber extraction distance and skid road network in steep karst terrain

Author: \u110uka Andreja
Grigolato Stefano
Papa Ivica
Pentek Tibor
Por\u161insky Tomislav
Publication venue: 'Italian Society of Sivilculture and Forest Ecology (SISEF)'
Publication date: 01/01/2017
Field of study

This study aims to define a simple and effective method to calculate skidding distances on steep karst terrain, rich in ground obstacles (stoniness and rockiness) to support decision planning of secondary and primary forest infrastructure network for timber extraction in productive selective cut forests. Variations between geometrical extraction distances and actual distances were highlighted on the operational planning level (i.e., compartment level) through GIS-related calculation models, focusing on cable skidder timber extraction. Automation in defining geometrical and real extraction distances, as well as relative forest openness were achieved by geo-processing workflows in GIS environment. Due to variation of extraction correction factors at the compartment level from a minimum of 1.19 to a maximum of 5.05 in the same management unit, it can be concluded that planning harvesting operations (timber extraction) at operational level should not include the use of correction factors previously obtained for entire terrain (topographical) categories, sub-categories or even management units

Archivio istituzionale della ricerca - Università di Padova

Hierarchical Character-Word Models for Language Identification

Author: Hathi Shobhit
Jaech Aaron
Mulcaire George
Ostendorf Mari
Smith Noah A.
Publication venue
Publication date: 01/01/2016
Field of study

Social media messages' brevity and unconventional spelling pose a challenge to language identification. We introduce a hierarchical model that learns character and contextualized word-level representations for language identification. Our method performs well against strong base- lines, and can also reveal code-switching

arXiv.org e-Print Archive

Crossref

Modeling of the Acute Toxicity of Benzene Derivatives by Complementary QSAR Methods

Author: Bertinetto Carlo
Duce Celia
Héberger Károly
Solaro Roberto
Publication venue: Institut für Strahlenchemie, Max-Planck-Institut für Kohlenforschung
Publication date: 01/01/2013
Field of study

A data set containing acute toxicity values (96-h LC50) of 69 substituted benzenes for fathead minnow (Pimephales promelas) was investigated with two Quantitative Structure- Activity Relationship (QSAR) models, either using or not using molecular descriptors, respectively. Recursive Neural Networks (RNN) derive a QSAR by direct treatment of the molecular structure, described through an appropriate graphical tool (variable-size labeled rooted ordered trees) by defining suitable representation rules. The input trees are encoded by an adaptive process able to learn, by tuning its free parameters, from a given set of structureactivity training examples. Owing to the use of a flexible encoding approach, the model is target invariant and does not need a priori definition of molecular descriptors. The results obtained in this study were analyzed together with those of a model based on molecular descriptors, i.e. a Multiple Linear Regression (MLR) model using CROatian MultiRegression selection of descriptors (CROMRsel). The comparison revealed interesting similarities that could lead to the development of a combined approach, exploiting the complementary characteristics of the two approaches

Repository of the Academy's Library

In search of isoglosses: continuous and discrete language embeddings in Slavic historical phonology

Author: Cathcart Chundra A.
Wandl Florian
Publication venue
Publication date: 01/01/2020
Field of study

This paper investigates the ability of neural network architectures to effectively learn diachronic phonological generalizations in a multilingual setting. We employ models using three different types of language embedding (dense, sigmoid, and straight-through). We find that the Straight-Through model outperforms the other two in terms of accuracy, but the Sigmoid model's language embeddings show the strongest agreement with the traditional subgrouping of the Slavic languages. We find that the Straight-Through model has learned coherent, semi-interpretable information about sound change, and outline directions for future research

arXiv.org e-Print Archive

Crossref

ZORA