5 research outputs found

    Descubrimiento de Colocaciones Utilizando Semántica

    Get PDF
    Collocations are combinations of two lexically dependent elements, of which one (the base) is freely chosen because of its meaning, and the choice of the other (the collocate) depends on the base. Collocations are difficult to master by language learners. This difficulty becomes evident in that even when learners know the meaning they want to express, they often struggle to choose the right collocate. Collocation dictionaries, in which collocates are grouped into semantic categories, are useful tools. However, they are scarce since they are the result of cost-intensive manual elaboration. In this paper, we present for Spanish an algorithm that automatically retrieves for a given base and a given semantic category the corresponding collocates.Las colocaciones, entendidas como combinaciones de dos elementos entre los cuales existe una dependencia léxica, es decir, donde uno de los elementos (la base) se escoge libremente por su significado, pero el otro (colocativo) depende de la base, suelen ser difíciles de utilizar por los hablantes no nativos de una lengua. Esta dificultad se hace visible en que estos, a menudo, aún sabiendo el significado que quieren expresar, tienen problemas a la hora de elegir el colocativo adecuado. Los diccionarios de colocaciones, donde los colocativos son agrupados en categorías semánticas son una herramienta muy útil, pero son recursos escasos y de costosa elaboración. En este artículo se presenta, para el español, un algoritmo que proporciona, dada una base y una categoría semántica, colocativos pertinentes a dicha categoría.The present work has been funded by the Spanish Ministry of Economy and Competitiveness (MINECO), through a predoctoral grant (BES-2012-057036) in the framework of the project HARenES (FFI2011-30219-C02-02) and the Maria de Maeztu Excellence Program (MDM-2015-0502)

    Aditza+izena unitate fraseologikoak gaztelaniatik euskarara: azterketa eta tratamendu konputazionala. 22

    Get PDF
    277 p.+156p (anexos)Unitate Fraseologikoak (UFak) hizkuntzek bere-bereak dituzten hitz-konbinazio idiomatikoak dira. Hizkuntzaren Prozesamenduko (HPko) tresnek kalitatezko emaitzak izan ditzaten, beharrezkoa da halakoak ondo tratatzea, baina lan horrek hainbat zailtasun ditu; besteak beste, hitzez hitzeko itzulgarritasun eza. Tesi-lan honetan, aditza+izena motako UFen azterketa linguistiko bat egin dugu, halakoek HPren alorrean sortzen dituzten bi arazo garrantzitsuri aurre egiten laguntzeko: batetik, corpusetan UFak automatikoki identifikatzeari, eta bestetik, UF horiek gaztelaniaren eta euskararen artean automatikoki itzultzeari. Azterketa linguistikotik ateratako informazioa bi atazetarako baliatu dugu, eta oso emaitza onak lortu ditugu bietan.Horrez gain, hizkuntza-baliabideen sorkuntzan ere, bi ekarpen egin ditugu tesi-lan honen baitan. Lehena, landutako UFak, ordainak eta haien inguruko informazio linguistikoa biltzen dituen datu-base bat sortzea eta sarean eskuragarri jartzea: Konbitzul. Eta bigarrena, euskarazko aditz-UFak corpus batean etiketatzea, PARSEME proiektu europarrak sorturiko irizpideei jarraituz; corpus hori ere publiko egin da, irizpide berberei jarraituz landutako beste 19 hizkuntzatako corpusekin batera

    Semantic vector representations of senses, concepts and entities and their applications in natural language processing

    Get PDF
    Representation learning lies at the core of Artificial Intelligence (AI) and Natural Language Processing (NLP). Most recent research has focused on develop representations at the word level. In particular, the representation of words in a vector space has been viewed as one of the most important successes of lexical semantics and NLP in recent years. The generalization power and flexibility of these representations have enabled their integration into a wide variety of text-based applications, where they have proved extremely beneficial. However, these representations are hampered by an important limitation, as they are unable to model different meanings of the same word. In order to deal with this issue, in this thesis we analyze and develop flexible semantic representations of meanings, i.e. senses, concepts and entities. This finer distinction enables us to model semantic information at a deeper level, which in turn is essential for dealing with ambiguity. In addition, we view these (vector) representations as a connecting bridge between lexical resources and textual data, encoding knowledge from both sources. We argue that these sense-level representations, similarly to the importance of word embeddings, constitute a first step for seamlessly integrating explicit knowledge into NLP applications, while focusing on the deeper sense level. Its use does not only aim at solving the inherent lexical ambiguity of language, but also represents a first step to the integration of background knowledge into NLP applications. Multilinguality is another key feature of these representations, as we explore the construction language-independent and multilingual techniques that can be applied to arbitrary languages, and also across languages. We propose simple unsupervised and supervised frameworks which make use of these vector representations for word sense disambiguation, a key application in natural language understanding, and other downstream applications such as text categorization and sentiment analysis. Given the nature of the vectors, we also investigate their effectiveness for improving and enriching knowledge bases, by reducing the sense granularity of their sense inventories and extending them with domain labels, hypernyms and collocations
    corecore