257 research outputs found
Uvid u automatsko izluÄivanje metaforiÄkih kolokacija
Collocations have been the subject of much scientific research over the years. The focus of this research is on a subset of collocations, namely metaphorical collocations. In metaphorical collocations, a semantic shift has taken place in one of the components, i.e., one of the components takes on a transferred meaning. The main goal of this paper is to review the existing literature and provide a systematic overview of the existing research on collocation extraction, as well as the overview of existing methods, measures, and resources. The existing research is classified according to the approach (statistical, hybrid, and distributional semantics) and presented in three separate sections. The insights gained from existing research serve as a first step in exploring the possibility of developing a method for automatic extraction of metaphorical collocations. The methods, tools, and resources that may prove useful for future work are highlighted.Kolokacije su veÄ dugi niz godina tema mnogih znanstvenih istraživanja. U fokusu ovoga istraživanja podskupina je kolokacija koju Äine metaforiÄke kolokacije. Kod metaforiÄkih je kolokacija kod jedne od sastavnica doÅ”lo do semantiÄkoga pomaka, tj. jedna od sastavnica poprima preneseno znaÄenje. Glavni su ciljevi ovoga rada istražiti postojeÄu literaturu te dati sustavan pregled postojeÄih istraživanja na temu izluÄivanja kolokacija i postojeÄih metoda, mjera i resursa. PostojeÄa istraživanja opisana su i klasificirana prema razliÄitim pristupima (statistiÄki, hibridni i zasnovani na distribucijskoj semantici). TakoÄer su opisane razliÄite asocijativne mjere i postojeÄi naÄini procjene rezultata automatskoga izluÄivanja kolokacija. Metode, alati i resursi koji su koriÅ”teni u prethodnim istraživanjima, a mogli bi biti korisni za naÅ” buduÄi rad posebno su istaknuti. SteÄeni uvidi u postojeÄa istraživanja Äine prvi korak u razmatranju moguÄnosti razvijanja postupka za automatsko izluÄivanje metaforiÄkih kolokacija
A general framework for detecting metaphorical collocations
This paper aims at identifying a specific set of collocations known under the term metaphorical collocations. In this type of collocations, a semantic shift has taken place in one of the components. Since the appropriate gold standard needs to be compiled prior to any serious endeavour to extract metaphorical collocations automatically, this paper first presents the steps taken to compile it, and then establishes appropriate evaluation framework. The process of compiling the gold standard is illustrated on one of the most frequent Croatian nouns, which resulted in the preliminary relation significance set. With the aim to investigate the possibility of facilitating the process, frequency, logDice, relation, and pretrained word embeddings are used as features in the classification task conducted on the logDice-based word sketch relation lists. Preliminary results are presented
What Makes Sense? : Searching for Strong WSD Predictors in Croatian
The goal of this research was to investigate and determine position of strong predictors for word sense disambiguation of Croatian nouns. Research was conducted using supervised learning methods and a corpus of around 70 million words. We have concluded that words in the immediate vicinity of an observed lexeme (1-5 words left and right) have the highest discriminative power. We have also measured the applicability and accuracy of the one-sense-per-discourse method and found it to be very successful as well as the impact of sentence boundaries which proved not to be a good criterion for selecting strong predictors
Mrežni sintaksno-semantiÄki okvir za izvlaÄenje leksiÄkih relacija deterministriÄkim modelom prirodnog jezika
Given the extraordinary growth in online documents, methods for automated extraction of semantic relations became popular, and shortly after, became necessary. This thesis proposes a new deterministic language model, with the associated artifact, which acts as an online Syntactic and Semantic Framework (SSF) for the extraction of morphosyntactic and semantic relations. The model covers all fundamental linguistic fields: Morphology (formation, composition, and word paradigms), Lexicography (storing words and their features in network lexicons), Syntax (the composition of words in meaningful parts: phrases, sentences, and pragmatics), and Semantics (determining the meaning of phrases). To achieve this, a new tagging system with more complex structures was developed. Instead of the commonly used vectored systems, this new tagging system uses tree-likeT-structures with hierarchical, grammatical Word of Speech (WOS), and Semantic of Word (SOW) tags. For relations extraction, it was necessary to develop a syntactic (sub)model of language, which ultimately is the foundation for performing semantic analysis. This was achieved by introducing a new āO-structureā, which represents the union of WOS/SOW features from T-structures of words and enables the creation of syntagmatic patterns. Such patterns are a powerful mechanism for the extraction of conceptual structures (e.g., metonymies, similes, or metaphors), breaking sentences into main and subordinate clauses, or detection of a sentenceās main construction parts (subject, predicate, and object). Since all program modules are developed as general and generative entities, SSF can be used for any of the Indo-European languages, although validation and network lexicons have been developed for the Croatian language only. The SSF has three types of lexicons (morphs/syllables, words, and multi-word expressions), and the main words lexicon is included in the Global Linguistic Linked Open Data (LLOD) Cloud, allowing interoperability with all other world languages. The SSF model and its artifact represent a complete natural language model which can be used to extract the lexical relations from single sentences, paragraphs, and also from large collections of documents.Pojavom velikoga broja digitalnih dokumenata u okružju virtualnih mreža (interneta i dr.), postali su zanimljivi, a nedugo zatim i nužni, naÄini identifikacije i strojnoga izvlaÄenja semantiÄkih relacija iz (digitalnih) dokumenata (tekstova). U ovome radu predlaže se novi, deterministiÄki jeziÄni model s pripadnim artefaktom (Syntactic and Semantic Framework - SSF), koji Äe služiti kao mrežni okvir za izvlaÄenje morfosintaktiÄkih i semantiÄkih relacija iz digitalnog teksta, ali i pružati mnoge druge jezikoslovne funkcije. Model pokriva sva temeljna podruÄja jezikoslovlja: morfologiju (tvorbu, sastav i paradigme rijeÄi) s leksikografijom (spremanjem rijeÄi i njihovih znaÄenja u mrežne leksikone), sintaksu (tj. skladnju rijeÄi u cjeline: sintagme, reÄenice i pragmatiku) i semantiku (odreÄivanje znaÄenja sintagmi). Da bi se to ostvarilo, bilo je nužno oznaÄiti rijeÄ složenijom strukturom, umjesto do sada koriÅ”tenih vektoriziranih gramatiÄkih obilježja predložene su nove T-strukture s hijerarhijskim, gramatiÄkim (Word of Speech - WOS) i semantiÄkim (Semantic of Word - SOW) tagovima. Da bi se relacije mogle pronalaziti bilo je potrebno osmisliti sintaktiÄki (pod)model jezika, na kojem Äe se u konaÄnici graditi i semantiÄka analiza. To je postignuto uvoÄenjem nove, tzv. O-strukture, koja predstavlja uniju WOS/SOW obilježja iz T-struktura pojedinih rijeÄi i omoguÄuje stvaranje sintagmatskih uzoraka. Takvi uzorci predstavljaju snažan mehanizam za izvlaÄenje konceptualnih struktura (npr. metonimija, simila ili metafora), razbijanje zavisnih reÄenica ili prepoznavanje reÄeniÄnih dijelova (subjekta, predikata i objekta). S obzirom da su svi programski moduli mrežnog okvira razvijeni kao opÄi i generativni entiteti, ne postoji nikakav problem koriÅ”tenje SSF-a za bilo koji od indoeuropskih jezika, premda su provjera njegovog rada i mrežni leksikoni izvedeni za sada samo za hrvatski jezik. Mrežni okvir ima tri vrste leksikona (morphovi/slogovi, rijeÄi i viÅ”erijeÄnice), a glavni leksikon rijeÄi veÄ je ukljuÄen u globalni lingvistiÄki oblak povezanih podataka, Å”to znaÄi da je interoperabilnost s drugim jezicima veÄ postignuta. S ovako osmiÅ”ljenim i realiziranim naÄinom, SSF model i njegov realizirani artefakt, predstavljaju potpuni model prirodnoga jezika s kojim se mogu izvlaÄiti leksiÄke relacije iz pojedinaÄne reÄenice, odlomka, ali i velikog korpusa (eng. big data) podataka
A Computational Lexicon and Representational Model for Arabic Multiword Expressions
The phenomenon of multiword expressions (MWEs) is increasingly recognised as a serious and challenging issue that has attracted the attention of researchers in various language-related disciplines. Research in these many areas has emphasised the primary role of MWEs in the process of analysing and understanding language, particularly in the computational treatment of natural languages. Ignoring MWE knowledge in any NLP system reduces the possibility of achieving high precision outputs. However, despite the enormous wealth of MWE research and language resources available for English and some other languages, research on Arabic MWEs (AMWEs) still faces multiple challenges, particularly in key computational tasks such as extraction, identification, evaluation, language resource building, and lexical representations.
This research aims to remedy this deficiency by extending knowledge of AMWEs and making noteworthy contributions to the existing literature in three related research areas on the way towards building a computational lexicon of AMWEs. First, this study develops a general understanding of AMWEs by establishing a detailed conceptual framework that includes a description of an adopted AMWE concept and its distinctive properties at multiple linguistic levels. Second, in the use of AMWE extraction and discovery tasks, the study employs a hybrid approach that combines knowledge-based and data-driven computational methods for discovering multiple types of AMWEs. Third, this thesis presents a representative system for AMWEs which consists of multilayer encoding of extensive linguistic descriptions.
This project also paves the way for further in-depth AMWE-aware studies in NLP and linguistics to gain new insights into this complicated phenomenon in standard Arabic. The implications of this research are related to the vital role of the AMWE lexicon, as a new lexical resource, in the improvement of various ANLP tasks and the potential opportunities this lexicon provides for linguists to analyse and explore AMWE phenomena
- ā¦