Search CORE

8,143 research outputs found

Experimental Support for a Categorical Compositional Distributional Model of Meaning

Author: Grefenstette Edward
Sadrzadeh Mehrnoosh
Publication venue
Publication date: 01/01/2011
Field of study

Modelling compositional meaning for sentences using empirical distributional methods has been a challenge for computational linguists. We implement the abstract categorical model of Coecke et al. (arXiv:1003.4394v1 [cs.CL]) using data from the BNC and evaluate it. The implementation is based on unsupervised learning of matrices for relational words and applying them to the vectors of their arguments. The evaluation is based on the word disambiguation task developed by Mitchell and Lapata (2008) for intransitive sentences, and on a similar new experiment designed for transitive sentences. Our model matches the results of its competitors in the first experiment, and betters them in the second. The general improvement in results with increase in syntactic complexity showcases the compositional power of our model.Comment: 11 pages, to be presented at EMNLP 2011, to be published in Proceedings of the 2011 Conference on Empirical Methods in Natural Language Processin

arXiv.org e-Print Archive

CiteSeerX

Oxford University Research Archive

Multiple factor analysis of distributional data

Author: Irpino Antonio
Verde Rosanna
Publication venue
Publication date: 19/04/2018
Field of study

In the framework of Symbolic Data Analysis (SDA), distribution-variables are a particular case of multi-valued variables: each unit is represented by a set of distributions (e.g. histograms, density functions or quantile functions), one for each variable. Factor analysis (FA) methods are primary exploratory tools for dimension reduction and visualization. In the present work, we use Multiple Factor Analysis (MFA) approach for the analysis of data described by distributional variables. Each distributional variable induces a set new numeric variable related to the quantiles of each distribution. We call these new variables as \textit{quantile variables} and the set of quantile variables related to a distributional one is a block in the MFA approach. Thus, MFA is performed on juxtaposed tables of quantile variables. \\ We show that the criterion decomposed in the analysis is an approximation of the variability based on a suitable metrics between distributions: the squared

L_2

Wasserstein distance. \\ Applications on simulated and real distributional data corroborate the method. The interpretation of the results on the factorial planes is performed by new interpretative tools that are related to the several characteristics of the distributions (location, scale and shape).Comment: Accepted from STATSTICA APPLICATA: Italian Journal of Applied Statistics on 12/201

arXiv.org e-Print Archive

Archivio Istituzionale della Ricerca - Università degli Studi della Campania "Luigi Vanvitelli"

Mathematical Foundations for a Compositional Distributional Model of Meaning

Author: Clark Stephen
Coecke Bob
Sadrzadeh Mehrnoosh
Publication venue
Publication date: 01/01/2010
Field of study

We propose a mathematical framework for a unification of the distributional theory of meaning in terms of vector space models, and a compositional theory for grammatical types, for which we rely on the algebra of Pregroups, introduced by Lambek. This mathematical framework enables us to compute the meaning of a well-typed sentence from the meanings of its constituents. Concretely, the type reductions of Pregroups are `lifted' to morphisms in a category, a procedure that transforms meanings of constituents into a meaning of the (well-typed) whole. Importantly, meanings of whole sentences live in a single space, independent of the grammatical structure of the sentence. Hence the inner-product can be used to compare meanings of arbitrary sentences, as it is for comparing the meanings of words in the distributional model. The mathematical structure we employ admits a purely diagrammatic calculus which exposes how the information flows between the words in a sentence in order to make up the meaning of the whole sentence. A variation of our `categorical model' which involves constraining the scalars of the vector spaces to the semiring of Booleans results in a Montague-style Boolean-valued semantics.Comment: to appea

arXiv.org e-Print Archive

Oxford University Research Archive

From Words to Understanding

Author: Karlgren Jussi
Sahlgren Magnus
Publication venue: CSLI Publications
Publication date: 01/01/2001
Field of study

RISE – Research Institutes of Sweden

Digitala Vetenskapliga Arkivet - Academic Archive On-line

Swedish Institute of Computer Science Publications Database

Software institutes' Online Digital Archive

Incremental dimension reduction of tensors with random index

Author: B Emruli
Blerim Emruli
D Achlioptas
DM Kane
E Velldal
Fredrik Sandin
I Fronza
J Karlgren
J Matoušek
K Lund
M Baroni
M Berry
M Sahlgren
M Wan
Magnus Sahlgren
MWM Boyd
N Goel
N Halko
P Frankl
P Kanerva
P Kanerva
PD Turney
RG Baraniuk
S Dasgupta
S Deerwester
Science Staff
SS Vempala
T Cohen
T Cohen
TG Kolda
TK Landauer
V Vasuki
W Johnson
Publication venue: 'Springer Science and Business Media LLC'
Publication date: 18/03/2011
Field of study

We present an incremental, scalable and efficient dimension reduction technique for tensors that is based on sparse random linear coding. Data is stored in a compactified representation with fixed size, which makes memory requirements low and predictable. Component encoding and decoding are performed on-line without computationally expensive re-analysis of the data set. The range of tensor indices can be extended dynamically without modifying the component representation. This idea originates from a mathematical model of semantic memory and a method known as random indexing in natural language processing. We generalize the random-indexing algorithm to tensors and present signal-to-noise-ratio simulations for representations of vectors and matrices. We present also a mathematical analysis of the approximate orthogonality of high-dimensional ternary vectors, which is a property that underpins this and other similar random-coding approaches to dimension reduction. To further demonstrate the properties of random indexing we present results of a synonym identification task. The method presented here has some similarities with random projection and Tucker decomposition, but it performs well at high dimensionality only (n>10^3). Random indexing is useful for a range of complex practical problems, e.g., in natural language processing, data mining, pattern recognition, event detection, graph searching and search engines. Prototype software is provided. It supports encoding and decoding of tensors of order >= 1 in a unified framework, i.e., vectors, matrices and higher order tensors.Comment: 36 pages, 9 figure

arXiv.org e-Print Archive

Crossref

Springer - Publisher Connector

Digitala Vetenskapliga Arkivet - Academic Archive On-line

Luleå University of Technology Publications

Analytical Challenges in Modern Tax Administration: A Brief History of Analytics at the IRS

Author: Butler Jeff
Publication venue: Ohio State University. Moritz College of Law
Publication date: 01/01/2020
Field of study

KnowledgeBank at OSU