Search CORE

2,877 research outputs found

Onto.PT: Automatic Construction of a Lexical Ontology for Portuguese

Author: Gomes Paulo
Oliveira Hugo Gonçalo
Publication venue
Publication date: 01/08/2010
Field of study

This ongoing research presents an alternative to the man- ual creation of lexical resources and proposes an approach towards the automatic construction of a lexical ontology for Portuguese. Tex- tual sources are exploited in order to obtain a lexical network based on terms and, after clustering and mapping, a wordnet-like lexical on- tology is created. At the end of the paper, current results are shown

Estudo Geral

On the Utility of Word Embeddings for Enriching OpenWordNet-PT

Author: Aguiar Fredson Silva de Souza
Rademaker Alexandre
Publication venue: OASIcs - OpenAccess Series in Informatics. 3rd Conference on Language, Data and Knowledge (LDK 2021)
Publication date: 01/01/2021
Field of study

The maintenance of wordnets and lexical knwoledge bases typically relies on time-consuming manual effort. In order to minimise this issue, we propose the exploitation of models of distributional semantics, namely word embeddings learned from corpora, in the automatic identification of relation instances missing in a wordnet. Analogy-solving methods are first used for learning a set of relations from analogy tests focused on each relation. Despite their low accuracy, we noted that a portion of the top-given answers are good suggestions of relation instances that could be included in the wordnet. This procedure is applied to the enrichment of OpenWordNet-PT, a public Portuguese wordnet. Relations are learned from data acquired from this resource, and illustrative examples are provided. Results are promising for accelerating the identification of missing relation instances, as we estimate that about 17% of the potential suggestions are good, a proportion that almost doubles if some are automatically invalidated

Dagstuhl Research Online Publication Server

Finding Function in Form: Compositional Character Models for Open Vocabulary Word Representation

Author: Amir Silvio
Astudillo Ramón Fernandez
Black Alan W.
Dyer Chris
Ling Wang
Luís Tiago
Marujo Luís
Trancoso Isabel
Publication venue
Publication date: 01/01/2015
Field of study

We introduce a model for constructing vector representations of words by composing characters using bidirectional LSTMs. Relative to traditional word representation models that have independent vectors for each word type, our model requires only a single vector per character type and a fixed set of parameters for the compositional model. Despite the compactness of this model and, more importantly, the arbitrary nature of the form-function relationship in language, our "composed" word representations yield state-of-the-art results in language modeling and part-of-speech tagging. Benefits over traditional baselines are particularly pronounced in morphologically rich languages (e.g., Turkish)

arXiv.org e-Print Archive

Crossref

Cross-lingual RST Discourse Parsing

Author: Braud Chloé
Coavoux Maximin
Søgaard Anders
Publication venue
Publication date: 01/01/2017
Field of study

Discourse parsing is an integral part of understanding information flow and argumentative structure in documents. Most previous research has focused on inducing and evaluating models from the English RST Discourse Treebank. However, discourse treebanks for other languages exist, including Spanish, German, Basque, Dutch and Brazilian Portuguese. The treebanks share the same underlying linguistic theory, but differ slightly in the way documents are annotated. In this paper, we present (a) a new discourse parser which is simpler, yet competitive (significantly better on 2/3 metrics) to state of the art for English, (b) a harmonization of discourse treebanks across languages, enabling us to present (c) what to the best of our knowledge are the first experiments on cross-lingual discourse parsing.Comment: To be published in EACL 2017, 13 page

arXiv.org e-Print Archive

Crossref

Copenhagen University Research Information System

Text Summarization Techniques: A Brief Survey

Author: Allahyari Mehdi
Assefi Mehdi
Gutierrez Juan B.
Kochut Krys
Pouriyeh Seyedamin
Safaei Saeid
Trippe Elizabeth D.
Publication venue
Publication date: 01/01/2017
Field of study

In recent years, there has been a explosion in the amount of text data from a variety of sources. This volume of text is an invaluable source of information and knowledge which needs to be effectively summarized to be useful. In this review, the main approaches to automatic text summarization are described. We review the different processes for summarization and describe the effectiveness and shortcomings of the different methods.Comment: Some of references format have update

arXiv.org e-Print Archive

Georgia Southern University: Digital Commons@Georgia Southern

Integrating Form and Meaning: A Multi-Task Learning Model for Acoustic Word Embeddings

Author: Abdullah Badr M.
Klakow Dietrich
Möbius Bernd
Publication venue
Publication date: 18/09/2022
Field of study

Models of acoustic word embeddings (AWEs) learn to map variable-length spoken word segments onto fixed-dimensionality vector representations such that different acoustic exemplars of the same word are projected nearby in the embedding space. In addition to their speech technology applications, AWE models have been shown to predict human performance on a variety of auditory lexical processing tasks. Current AWE models are based on neural networks and trained in a bottom-up approach that integrates acoustic cues to build up a word representation given an acoustic or symbolic supervision signal. Therefore, these models do not leverage or capture high-level lexical knowledge during the learning process. In this paper, we propose a multi-task learning model that incorporates top-down lexical knowledge into the training procedure of AWEs. Our model learns a mapping between the acoustic input and a lexical representation that encodes high-level information such as word semantics in addition to bottom-up form-based supervision. We experiment with three languages and demonstrate that incorporating lexical knowledge improves the embedding space discriminability and encourages the model to better separate lexical categories.Comment: Accepted in INTERSPEECH 202

arXiv.org e-Print Archive

Relating folksonomies with Dublin Core

Author: Baptista Ana Alice
Catarino Maria Elisabete
Publication venue: 'Inderscience Publishers'
Publication date: 01/01/2010
Field of study

This article presents a research carried out to continue the project Kinds of Tags, which intends to identify elements required for metadata originating from folksonomies. It will provide information that may be used by intelligent applications to assign tags to metadata elements. Despite the unquestionably high value of DC and DC Terms, the pilot study revealed a significant number of tags for which no corresponding properties yet existed. A need for new properties was determined. This article presents the problem, motivation and methodology of the underlying research. It further presents and discusses the findings from the pilot study.(undefined

Universidade do Minho: RepositoriUM

Crossref