2,877 research outputs found
Onto.PT: Automatic Construction of a Lexical Ontology for Portuguese
This ongoing research presents an alternative to the man-
ual creation of lexical resources and proposes an approach towards
the automatic construction of a lexical ontology for Portuguese. Tex-
tual sources are exploited in order to obtain a lexical network based
on terms and, after clustering and mapping, a wordnet-like lexical on-
tology is created. At the end of the paper, current results are shown
On the Utility of Word Embeddings for Enriching OpenWordNet-PT
The maintenance of wordnets and lexical knwoledge bases typically relies on time-consuming manual effort. In order to minimise this issue, we propose the exploitation of models of distributional semantics, namely word embeddings learned from corpora, in the automatic identification of relation instances missing in a wordnet. Analogy-solving methods are first used for learning a set of relations from analogy tests focused on each relation. Despite their low accuracy, we noted that a portion of the top-given answers are good suggestions of relation instances that could be included in the wordnet. This procedure is applied to the enrichment of OpenWordNet-PT, a public Portuguese wordnet. Relations are learned from data acquired from this resource, and illustrative examples are provided. Results are promising for accelerating the identification of missing relation instances, as we estimate that about 17% of the potential suggestions are good, a proportion that almost doubles if some are automatically invalidated
Finding Function in Form: Compositional Character Models for Open Vocabulary Word Representation
We introduce a model for constructing vector representations of words by
composing characters using bidirectional LSTMs. Relative to traditional word
representation models that have independent vectors for each word type, our
model requires only a single vector per character type and a fixed set of
parameters for the compositional model. Despite the compactness of this model
and, more importantly, the arbitrary nature of the form-function relationship
in language, our "composed" word representations yield state-of-the-art results
in language modeling and part-of-speech tagging. Benefits over traditional
baselines are particularly pronounced in morphologically rich languages (e.g.,
Turkish)
Cross-lingual RST Discourse Parsing
Discourse parsing is an integral part of understanding information flow and
argumentative structure in documents. Most previous research has focused on
inducing and evaluating models from the English RST Discourse Treebank.
However, discourse treebanks for other languages exist, including Spanish,
German, Basque, Dutch and Brazilian Portuguese. The treebanks share the same
underlying linguistic theory, but differ slightly in the way documents are
annotated. In this paper, we present (a) a new discourse parser which is
simpler, yet competitive (significantly better on 2/3 metrics) to state of the
art for English, (b) a harmonization of discourse treebanks across languages,
enabling us to present (c) what to the best of our knowledge are the first
experiments on cross-lingual discourse parsing.Comment: To be published in EACL 2017, 13 page
Text Summarization Techniques: A Brief Survey
In recent years, there has been a explosion in the amount of text data from a
variety of sources. This volume of text is an invaluable source of information
and knowledge which needs to be effectively summarized to be useful. In this
review, the main approaches to automatic text summarization are described. We
review the different processes for summarization and describe the effectiveness
and shortcomings of the different methods.Comment: Some of references format have update
Integrating Form and Meaning: A Multi-Task Learning Model for Acoustic Word Embeddings
Models of acoustic word embeddings (AWEs) learn to map variable-length spoken
word segments onto fixed-dimensionality vector representations such that
different acoustic exemplars of the same word are projected nearby in the
embedding space. In addition to their speech technology applications, AWE
models have been shown to predict human performance on a variety of auditory
lexical processing tasks. Current AWE models are based on neural networks and
trained in a bottom-up approach that integrates acoustic cues to build up a
word representation given an acoustic or symbolic supervision signal.
Therefore, these models do not leverage or capture high-level lexical knowledge
during the learning process. In this paper, we propose a multi-task learning
model that incorporates top-down lexical knowledge into the training procedure
of AWEs. Our model learns a mapping between the acoustic input and a lexical
representation that encodes high-level information such as word semantics in
addition to bottom-up form-based supervision. We experiment with three
languages and demonstrate that incorporating lexical knowledge improves the
embedding space discriminability and encourages the model to better separate
lexical categories.Comment: Accepted in INTERSPEECH 202
Relating folksonomies with Dublin Core
This article presents a research carried out to continue the project Kinds of Tags,
which intends to identify elements required for metadata originating from folksonomies.
It will provide information that may be used by intelligent applications to assign tags to
metadata elements. Despite the unquestionably high value of DC and DC Terms, the pilot study
revealed a significant number of tags for which no corresponding properties yet existed. A need
for new properties was determined. This article presents the problem, motivation and
methodology of the underlying research. It further presents and discusses the findings from the
pilot study.(undefined
- …