101 research outputs found

    A classificación of Spanish pyschological verbs

    Get PDF
    The present paper is presented within the context of the research currently being carried out within the field of . Computational Lexicography at the University of Barcelona Linguistics Department - in collaboration with the University of Maryland Computer Science Department and provisionally called PIRAPIDES. The research deals with the study of verbal diathesis, subcategorization frames, S-grids and the definition of a typology of S-roles apt for the description of the argumental structure

    Comparing distributional semantic models for identifying groups of semantically related words

    Get PDF
    Distributional Semantic Models (DSM) are growing in popularity in Computational Linguistics. DSM use corpora of language use to automatically induce formal representations of word meaning. This article focuses on one of the applications of DSM: identifying groups of semantically related words. We compare two models for obtaining formal representations: a well known approach (CLUTO) and a more recently introduced one (Word2Vec). We compare the two models with respect to the PoS coherence and the semantic relatedness of the words within the obtained groups. We also proposed a way to improve the results obtained by Word2Vec through corpus preprocessing. The results show that: a) CLUTO outperformsWord2Vec in both criteria for corpora of medium size; b) The preprocessing largely improves the results for Word2Vec with respect to both criteria

    WRPA: A system for relational paraphrase acquisition from Wikipedia

    Get PDF
    In this paper we present WRPA, a system for Relational Paraphrase Acquisition from Wikipedia. WRPA extracts paraphrasing patterns that hold a particular relation between two entities taking advantage of Wikipedia structure. What is new in this system is that Wikipedia's exploitation goes beyond infoboxes, reaching itemized information embedded in Wikipedia pages. WRPA is language independent, assuming that there exists Wikipedia and shallow linguistic tools for that particular language, and also independent of the relation addressed

    Polarity analisys od reviews based on the omission of asymetric sentences

    Get PDF
    In this paper, we present a novel approach to polarity analysis of product reviews which detects and removes sentences with the opposite polarity to that of the entire document (asymmetric sentences) as a previous step to identify positive and negative reviews. We postulate that asymmetric sentences are morpho-syntactically more complex than symmetric ones (sentences with the same polarity to that of the entire document) and that it is possible to improve the detection of the polarity orientation of reviews by removing asymmetric sentences from the text. To validate this hypothesis, we measured the syntactic complexity of both types of sentences in a multi-domain corpus of product reviews and contrasted three relevant data configurations based on inclusion and omission of asymmetric sentences from the reviews

    Information theory-based compositional distributional semantics

    Full text link
    In the context of text representation, Compositional Distributional Semantics models aim to fuse the Distributional Hypothesis and the Principle of Compositionality. Text embedding is based on co-ocurrence distributions and the representations are in turn combined by compositional functions taking into account the text structure. However, the theoretical basis of compositional functions is still an open issue. In this article we define and study the notion of Information Theory-based Compositional Distributional Semantics (ICDS): (i) We first establish formal properties for embedding, composition, and similarity functions based on Shannon's Information Theory; (ii) we analyze the existing approaches under this prism, checking whether or not they comply with the established desirable properties; (iii) we propose two parameterizable composition and similarity functions that generalize traditional approaches while fulfilling the formal properties; and finally (iv) we perform an empirical study on several textual similarity datasets that include sentences with a high and low lexical overlap, and on the similarity between words and their description. Our theoretical analysis and empirical results show that fulfilling formal properties affects positively the accuracy of text representation models in terms of correspondence (isometry) between the embedding and meaning spaces

    Colaboración entre información paradigmática y sintagmática en la Desambiguación Semántica Automática

    Get PDF
    [spa] Proponemos un método alternativo para la desambiguación semántica automática, centrado en la interacción entre la información sintagmática y paradigmática. Se toma como unidad en el proceso de desambiguación una ocurrencia ambigua integrada en un patrón sintagmático. La estrategia no necesita corpus etiquetado al nivel de sentido, presupone tan sólo un análisis previo de tipo morfosintáctico y agrupación por chunks, no usa información estadística y su potencial desambiguador es amplio. Ilustramos las dos implementaciones propuestas con ejemplos concretos y estudiamos posibilidades de refinamiento del método. [eng] We propose an alternative method for Word Sense Disambiguation, based on the interaction between syntagmatic and paradigmatic information. The unit of the disambiguation process is taken to be an ambiguous occurrence integrated into a syntagmatic pattern. The strategy needs not a semantically annotated corpus, it supposes only a morphological analysis and chunking, does not make use of statistical information and has en wide disambiguating potential. We illustrate the two implementations proposed with concrete examples and study ways for refinement

    Focus of negation: Its identification in Spanish

    Full text link
    This article describes the criteria for identifying the focus of negation in Spanish. This work involved an in-depth linguistic analysis of the focus of negation through which we identified some 10 different types of criteria that account for a wide variety of constructions containing negation. These criteria account for all the cases that appear in the NewsCom corpus and were assessed in the annotation of this corpus. The NewsCom corpus consists of 2955 comments posted in response to 18 different news articles from online newspapers. The NewsCom corpus contains 2965 negative structures with their corresponding negation marker, scope, and focus. This is the first corpus annotated with focus in Spanish and it is freely available. It is a valuable resource that can be used both for the training and evaluation of systems that aim to automatically detect the scope and focus of negation and for the linguistic analysis of negation grounded in real data
    • …