1,467 research outputs found
The logic and linguistic model for automatic extraction of collocation similarity
The article discusses the process of automatic identification of collocation similarity. The semantic analysis is one of the most advanced as well as the most difficult NLP task. The main problem of semantic processing is the determination of polysemy and synonymy of linguistic units. In addition, the task becomes complicated in case of word collocations. The paper suggests a logical and linguistic model for automatic determining semantic similarity between colocations in Ukraine and English languages. The proposed model formalizes semantic equivalence of collocations by means of semantic and grammatical characteristics of collocates. The basic idea of this approach is that morphological, syntactic and semantic characteristics of lexical units are to be taken into account for the identification of collocation similarity. Basic mathematical means of our model are logical-algebraic equations of the finite predicates algebra. Verb-noun and noun-adjective collocations in Ukrainian and English languages consist of words belonged to main parts of speech. These collocations are examined in the model. The model allows extracting semantically equivalent collocations from semi-structured and non-structured texts. Implementations of the model will allow to automatically recognize semantically equivalent collocations. Usage of the model allows increasing the effectiveness of natural language processing tasks such as information extraction, ontology generation, sentiment analysis and some others
Uvid u automatsko izlučivanje metaforičkih kolokacija
Collocations have been the subject of much scientific research over the years. The focus of this research is on a subset of collocations, namely metaphorical collocations. In metaphorical collocations, a semantic shift has taken place in one of the components, i.e., one of the components takes on a transferred meaning. The main goal of this paper is to review the existing literature and provide a systematic overview of the existing research on collocation extraction, as well as the overview of existing methods, measures, and resources. The existing research is classified according to the approach (statistical, hybrid, and distributional semantics) and presented in three separate sections. The insights gained from existing research serve as a first step in exploring the possibility of developing a method for automatic extraction of metaphorical collocations. The methods, tools, and resources that may prove useful for future work are highlighted.Kolokacije su već dugi niz godina tema mnogih znanstvenih istraživanja. U fokusu ovoga istraživanja podskupina je kolokacija koju čine metaforičke kolokacije. Kod metaforičkih je kolokacija kod jedne od sastavnica došlo do semantičkoga pomaka, tj. jedna od sastavnica poprima preneseno značenje. Glavni su ciljevi ovoga rada istražiti postojeću literaturu te dati sustavan pregled postojećih istraživanja na temu izlučivanja kolokacija i postojećih metoda, mjera i resursa. Postojeća istraživanja opisana su i klasificirana prema različitim pristupima (statistički, hibridni i zasnovani na distribucijskoj semantici). Također su opisane različite asocijativne mjere i postojeći načini procjene rezultata automatskoga izlučivanja kolokacija. Metode, alati i resursi koji su korišteni u prethodnim istraživanjima, a mogli bi biti korisni za naš budući rad posebno su istaknuti. Stečeni uvidi u postojeća istraživanja čine prvi korak u razmatranju mogućnosti razvijanja postupka za automatsko izlučivanje metaforičkih kolokacija
Using conceptual vectors to get Magn collocations (and using contrastive properties to get their translations)
International audienceThis paper presents a semi-automatic approach for extraction of collocations from corpora which uses the results of Conceptual Vectors as a semantic filter. First, this method estimates the ability of each co-occurrence to be a collocation, using a statistical measure based on the fact that it occurs more often than by chance. Then the results are automatically filtered (with conceptual vectors) to retain only one given semantic kind of collocations. Finally we perform a new filtering based on manually entered data. Our evaluation on monolingual and bilingual experiments shows the interest to combine automatic extraction and manual intervention to extract collocations (to fill multilingual lexical databases). It proves especially that the use of conceptual vectors to filter the candidates allows us to increase the precision noticeably
Architectures of Meaning, A Systematic Corpus Analysis of NLP Systems
This paper proposes a novel statistical corpus analysis framework targeted
towards the interpretation of Natural Language Processing (NLP) architectural
patterns at scale. The proposed approach combines saturation-based lexicon
construction, statistical corpus analysis methods and graph collocations to
induce a synthesis representation of NLP architectural patterns from corpora.
The framework is validated in the full corpus of Semeval tasks and demonstrated
coherent architectural patterns which can be used to answer architectural
questions on a data-driven fashion, providing a systematic mechanism to
interpret a largely dynamic and exponentially growing field.Comment: 20 pages, 6 figures, 9 supplementary figures, Lexicon.txt in the
appendi
Using distributional similarity to organise biomedical terminology
We investigate an application of distributional similarity techniques to the problem of structural organisation of biomedical terminology. Our application domain is the relatively small GENIA corpus. Using terms that have been accurately marked-up by hand within the corpus, we consider the problem of automatically determining semantic proximity. Terminological units are dened for our purposes as normalised classes of individual terms. Syntactic analysis of the corpus data is carried out using the Pro3Gres parser and provides the data required to calculate distributional similarity using a variety of dierent measures. Evaluation is performed against a hand-crafted gold standard for this domain in the form of the GENIA ontology. We show that distributional similarity can be used to predict semantic type with a good degree of accuracy
- …