4,125 research outputs found
mARC: Memory by Association and Reinforcement of Contexts
This paper introduces the memory by Association and Reinforcement of Contexts
(mARC). mARC is a novel data modeling technology rooted in the second
quantization formulation of quantum mechanics. It is an all-purpose incremental
and unsupervised data storage and retrieval system which can be applied to all
types of signal or data, structured or unstructured, textual or not. mARC can
be applied to a wide range of information clas-sification and retrieval
problems like e-Discovery or contextual navigation. It can also for-mulated in
the artificial life framework a.k.a Conway "Game Of Life" Theory. In contrast
to Conway approach, the objects evolve in a massively multidimensional space.
In order to start evaluating the potential of mARC we have built a mARC-based
Internet search en-gine demonstrator with contextual functionality. We compare
the behavior of the mARC demonstrator with Google search both in terms of
performance and relevance. In the study we find that the mARC search engine
demonstrator outperforms Google search by an order of magnitude in response
time while providing more relevant results for some classes of queries
From Word to Sense Embeddings: A Survey on Vector Representations of Meaning
Over the past years, distributed semantic representations have proved to be
effective and flexible keepers of prior knowledge to be integrated into
downstream applications. This survey focuses on the representation of meaning.
We start from the theoretical background behind word vector space models and
highlight one of their major limitations: the meaning conflation deficiency,
which arises from representing a word with all its possible meanings as a
single vector. Then, we explain how this deficiency can be addressed through a
transition from the word level to the more fine-grained level of word senses
(in its broader acceptation) as a method for modelling unambiguous lexical
meaning. We present a comprehensive overview of the wide range of techniques in
the two main branches of sense representation, i.e., unsupervised and
knowledge-based. Finally, this survey covers the main evaluation procedures and
applications for this type of representation, and provides an analysis of four
of its important aspects: interpretability, sense granularity, adaptability to
different domains and compositionality.Comment: 46 pages, 8 figures. Published in Journal of Artificial Intelligence
Researc
Polly's Polyhedral Scheduling in the Presence of Reductions
The polyhedral model provides a powerful mathematical abstraction to enable
effective optimization of loop nests with respect to a given optimization goal,
e.g., exploiting parallelism. Unexploited reduction properties are a frequent
reason for polyhedral optimizers to assume parallelism prohibiting dependences.
To our knowledge, no polyhedral loop optimizer available in any production
compiler provides support for reductions. In this paper, we show that
leveraging the parallelism of reductions can lead to a significant performance
increase. We give a precise, dependence based, definition of reductions and
discuss ways to extend polyhedral optimization to exploit the associativity and
commutativity of reduction computations. We have implemented a
reduction-enabled scheduling approach in the Polly polyhedral optimizer and
evaluate it on the standard Polybench 3.2 benchmark suite. We were able to
detect and model all 52 arithmetic reductions and achieve speedups up to
2.21 on a quad core machine by exploiting the multidimensional
reduction in the BiCG benchmark.Comment: Presented at the IMPACT15 worksho
Corpus specificity in LSA and Word2vec: the role of out-of-domain documents
Latent Semantic Analysis (LSA) and Word2vec are some of the most widely used
word embeddings. Despite the popularity of these techniques, the precise
mechanisms by which they acquire new semantic relations between words remain
unclear. In the present article we investigate whether LSA and Word2vec
capacity to identify relevant semantic dimensions increases with size of
corpus. One intuitive hypothesis is that the capacity to identify relevant
dimensions should increase as the amount of data increases. However, if corpus
size grow in topics which are not specific to the domain of interest, signal to
noise ratio may weaken. Here we set to examine and distinguish these
alternative hypothesis. To investigate the effect of corpus specificity and
size in word-embeddings we study two ways for progressive elimination of
documents: the elimination of random documents vs. the elimination of documents
unrelated to a specific task. We show that Word2vec can take advantage of all
the documents, obtaining its best performance when it is trained with the whole
corpus. On the contrary, the specialization (removal of out-of-domain
documents) of the training corpus, accompanied by a decrease of dimensionality,
can increase LSA word-representation quality while speeding up the processing
time. Furthermore, we show that the specialization without the decrease in LSA
dimensionality can produce a strong performance reduction in specific tasks.
From a cognitive-modeling point of view, we point out that LSA's word-knowledge
acquisitions may not be efficiently exploiting higher-order co-occurrences and
global relations, whereas Word2vec does
A Context-theoretic Framework for Compositionality in Distributional Semantics
Techniques in which words are represented as vectors have proved useful in
many applications in computational linguistics, however there is currently no
general semantic formalism for representing meaning in terms of vectors. We
present a framework for natural language semantics in which words, phrases and
sentences are all represented as vectors, based on a theoretical analysis which
assumes that meaning is determined by context.
In the theoretical analysis, we define a corpus model as a mathematical
abstraction of a text corpus. The meaning of a string of words is assumed to be
a vector representing the contexts in which it occurs in the corpus model.
Based on this assumption, we can show that the vector representations of words
can be considered as elements of an algebra over a field. We note that in
applications of vector spaces to representing meanings of words there is an
underlying lattice structure; we interpret the partial ordering of the lattice
as describing entailment between meanings. We also define the context-theoretic
probability of a string, and, based on this and the lattice structure, a degree
of entailment between strings.
We relate the framework to existing methods of composing vector-based
representations of meaning, and show that our approach generalises many of
these, including vector addition, component-wise multiplication, and the tensor
product.Comment: Submitted to Computational Linguistics on 20th January 2010 for
revie
A WSDL-Based Type System for WS-BPEL
We tackle the problem of providing rigorous formal foundations to current software engineering technologies for web services. We focus on two of the most used XML-based languages for web services: WSDL and WS-BPEL. To this aim, first we select an expressive subset of WS-BPEL, with special concern for modeling the interactions among web service instances in a network context, and define its operational semantics. We call ws-calculus the resulting formalism. Then, we put forward a rigorous typing discipline that formalizes the relationship existing between ws-calculus terms and the associated WSDL documents and supports verification of their compliance. We prove that the type system and the operational semantics of ws-calculus are ‘sound’ and apply our approach to an example application involving three interacting web services
- …