265 research outputs found
Uvid u automatsko izluÄivanje metaforiÄkih kolokacija
Collocations have been the subject of much scientific research over the years. The focus of this research is on a subset of collocations, namely metaphorical collocations. In metaphorical collocations, a semantic shift has taken place in one of the components, i.e., one of the components takes on a transferred meaning. The main goal of this paper is to review the existing literature and provide a systematic overview of the existing research on collocation extraction, as well as the overview of existing methods, measures, and resources. The existing research is classified according to the approach (statistical, hybrid, and distributional semantics) and presented in three separate sections. The insights gained from existing research serve as a first step in exploring the possibility of developing a method for automatic extraction of metaphorical collocations. The methods, tools, and resources that may prove useful for future work are highlighted.Kolokacije su veÄ dugi niz godina tema mnogih znanstvenih istraživanja. U fokusu ovoga istraživanja podskupina je kolokacija koju Äine metaforiÄke kolokacije. Kod metaforiÄkih je kolokacija kod jedne od sastavnica doÅ”lo do semantiÄkoga pomaka, tj. jedna od sastavnica poprima preneseno znaÄenje. Glavni su ciljevi ovoga rada istražiti postojeÄu literaturu te dati sustavan pregled postojeÄih istraživanja na temu izluÄivanja kolokacija i postojeÄih metoda, mjera i resursa. PostojeÄa istraživanja opisana su i klasificirana prema razliÄitim pristupima (statistiÄki, hibridni i zasnovani na distribucijskoj semantici). TakoÄer su opisane razliÄite asocijativne mjere i postojeÄi naÄini procjene rezultata automatskoga izluÄivanja kolokacija. Metode, alati i resursi koji su koriÅ”teni u prethodnim istraživanjima, a mogli bi biti korisni za naÅ” buduÄi rad posebno su istaknuti. SteÄeni uvidi u postojeÄa istraživanja Äine prvi korak u razmatranju moguÄnosti razvijanja postupka za automatsko izluÄivanje metaforiÄkih kolokacija
Latent Dirichlet Allocation Utilization as a Text Mining Method to Elaborate Learning Effectiveness
Learning method is a way to explain the lesson materials to students so that the learning process can occur in students as an effort to achieve the goals. Learning methods can be said to be a success if students are active, both physically, mentally, and socially in the learning process, in addition to showing high enthusiasm for learning and having self-confidence. The purpose of this study is to classify the opinions of Indonesian students regarding the existing learning methods and what learning methods they expected. In order to evaluate existing learning methods using the latent dirichlet allocation method. The data used comes from tweets of Twitter users within the range of January to March 2022. The data is taken using the scrapping method through the help of the python twisel library and totaled to 3778 data, then preprocessed through the nltk and Sastrawi libraries. The results of this analysis stated that student opinions can be classified into 3 major topics which state students' opinions regarding effective learning methods, student difficulties in applicable learning methods, and high cross-departmental interest
Borders and boundaries in Bosnian, Croatian, Montenegrin and Serbian: Twitter data to the rescue
partially_open3siIn this paper we deal with the spatial distribution of 16 linguistic features known to vary between Bosnian, Croatian, Montenegrin, and Serbian. We perform our analyses on a dataset of geo-encoded Twitter status messages collected in the period from mid-2013 to the end of 2016. We perform two types of analyses. The first one finds boundaries in the spatial distribution of the linguistic variable levels through the kernel density estimation smoothing technique. These boundaries are then plotted over the state borders for a visual comparison. The second analysis deals with linguistic distance between the states. The groupings of linguistic variables and countries are calculated given the state borders and the Jensen-Shannon divergence between distributions of the 16 variables within each state. This analysis is completed with a measure of variable consistency for each country. These analyses are intended to show the extent to which current state borders correspond to linguistic boundaries. They suggest that Croatia and Serbia still represent the two extremes, reflecting a history of normative divergences, while Bosnia-Herzegovina and Montenegro, depending on the variable, lean to one or the other side.openNikola LjubeÅ”iÄ; Maja MiliÄeviÄ PetroviÄ; Tanja SamardžiÄNikola LjubeÅ”iÄ; Maja MiliÄeviÄ PetroviÄ; Tanja Samardži
Proceedings of the Sixth International Conference Formal Approaches to South Slavic and Balkan languages
Proceedings of the Sixth International Conference Formal Approaches to South Slavic and Balkan Languages publishes 22 papers that were presented at the conference organised in Dubrovnik, Croatia, 25-28 Septembre 2008
Open-source resources and standards for Arabic word structure analysis: Fine grained morphological analysis of Arabic text corpora
Morphological analyzers are preprocessors for text analysis. Many Text Analytics applications need them to perform their tasks. The aim of this thesis is to develop
standards, tools and resources that widen the scope of Arabic word structure analysis - particularly morphological analysis, to process Arabic text corpora of different domains, formats and genres, of both vowelized and non-vowelized text.
We want to morphologically tag our Arabic Corpus, but evaluation of existing morphological analyzers has highlighted shortcomings and shown that more research is
required. Tag-assignment is significantly more complex for Arabic than for many languages. The morphological analyzer should add the appropriate linguistic information
to each part or morpheme of the word (proclitic, prefix, stem, suffix and enclitic); in effect, instead of a tag for a word, we need a subtag for each part.
Very fine-grained distinctions may cause problems for automatic morphosyntactic analysis ā particularly probabilistic taggers which require training data, if some words can change grammatical tag depending on function and context; on the other hand, finegrained distinctions may actually help to disambiguate other words in the local context. The SALMA ā Tagger is a fine grained morphological analyzer which is mainly depends on linguistic information extracted from traditional Arabic grammar books and prior knowledge broad-coverage lexical resources; the SALMA ā ABCLexicon.
More fine-grained tag sets may be more appropriate for some tasks. The SALMA āTag Set is a theory standard for encoding, which captures long-established traditional
fine-grained morphological features of Arabic, in a notation format intended to be compact yet transparent.
The SALMA ā Tagger has been used to lemmatize the 176-million words Arabic Internet Corpus. It has been proposed as a language-engineering toolkit for Arabic lexicography and for phonetically annotating the Qurāan by syllable and primary stress information, as well as, fine-grained morphological tagging
Automated Implementation Process of Machine Translation System for Related Languages
The paper presents an attempt to automate all data creation processes of a rule-based shallow-transfer machine translation system. The presented methods were tested on four fully functional translation systems covering language pairs: Slovenian paired with Serbian, Czech, English and Estonian language. An extensive range of evaluation tests was performed to assess the applicability of the methods
Terminology Integration in Statistical Machine Translation
ElektroniskÄ versija nesatur pielikumusPromocijas darbs apraksta autora izpÄtÄ«tas metodes un izstrÄdÄtus rÄ«kus divvalodu terminoloÄ£ijas integrÄcijai statistiskÄs maŔīntulkoÅ”anas sistÄmÄs. Autors darbÄ piedÄvÄ inovatÄ«vas metodes terminu integrÄcijai SMT sistÄmu trenÄÅ”anas fÄzÄ (ar statiskas integrÄcijas palÄ«dzÄ«bu) un tulkoÅ”anas fÄzÄ (ar dinamiskas integrÄcijas palÄ«dzÄ«bu). DarbÄ uzmanÄ«ba pievÄrsta ne tikai metodÄm terminu integrÄcijai SMT, bet arÄ« metodÄm valodas resursu, kas nepiecieÅ”ami dažÄdu uzdevumu veikÅ”anai terminu integrÄcijas SMT darbplÅ«smÄs, ieguvei. PiedÄvÄtÄs metodes ir novÄrtÄtas automÄtiskas un manuÄlas novÄrtÄÅ”anas eksperimentos. IegÅ«tie rezultÄti parÄda, ka statiskÄs un dinamiskÄs integrÄcijas metodes ļauj bÅ«tiski uzlabot tulkoÅ”anas kvalitÄti. DarbÄ aprakstÄ«tie rezultÄti ir aprobÄti vairÄkos pÄtniecÄ«bas projektos un ieviesti praktiskos risinÄjumos. AtslÄgvÄrdi: statistiskÄ maŔīntulkoÅ”ana, terminoloÄ£ija, starpvalodu informÄcijas izvilkÅ”anaThe doctoral thesis describes methods and tools researched and developed by the author for bilingual terminology integration into statistical machine translation systems. The author presents novel methods for terminology integration in SMT systems during training (through static integration) and during translation (through dynamic integration). The work focusses not only on the SMT integration techniques, but also on methods for acquisition of linguistic resources that are necessary for different tasks involved in workflows for terminology integration in SMT systems. The proposed methods have been evaluated using automatic and manual evaluation methods. The results show that both static and dynamic integration methods allow increasing translation quality. The thesis describes also areas where the methods have been approbated in practice. Keywords: statistical machine translation, terminology, cross-lingual information extractio
Proceedings of the Seventh International Conference Formal Approaches to South Slavic and Balkan languages
Proceedings of the Seventh International Conference Formal Approaches to South Slavic and Balkan Languages publishes 17 papers that were presented at the conference organised in Dubrovnik, Croatia, 4-6 Octobre 2010
- ā¦