265 research outputs found

    Uvid u automatsko izlučivanje metaforičkih kolokacija

    Get PDF
    Collocations have been the subject of much scientific research over the years. The focus of this research is on a subset of collocations, namely metaphorical collocations. In metaphorical collocations, a semantic shift has taken place in one of the components, i.e., one of the components takes on a transferred meaning. The main goal of this paper is to review the existing literature and provide a systematic overview of the existing research on collocation extraction, as well as the overview of existing methods, measures, and resources. The existing research is classified according to the approach (statistical, hybrid, and distributional semantics) and presented in three separate sections. The insights gained from existing research serve as a first step in exploring the possibility of developing a method for automatic extraction of metaphorical collocations. The methods, tools, and resources that may prove useful for future work are highlighted.Kolokacije su već dugi niz godina tema mnogih znanstvenih istraživanja. U fokusu ovoga istraživanja podskupina je kolokacija koju čine metaforičke kolokacije. Kod metaforičkih je kolokacija kod jedne od sastavnica doÅ”lo do semantičkoga pomaka, tj. jedna od sastavnica poprima preneseno značenje. Glavni su ciljevi ovoga rada istražiti postojeću literaturu te dati sustavan pregled postojećih istraživanja na temu izlučivanja kolokacija i postojećih metoda, mjera i resursa. Postojeća istraživanja opisana su i klasificirana prema različitim pristupima (statistički, hibridni i zasnovani na distribucijskoj semantici). Također su opisane različite asocijativne mjere i postojeći načini procjene rezultata automatskoga izlučivanja kolokacija. Metode, alati i resursi koji su koriÅ”teni u prethodnim istraživanjima, a mogli bi biti korisni za naÅ” budući rad posebno su istaknuti. Stečeni uvidi u postojeća istraživanja čine prvi korak u razmatranju mogućnosti razvijanja postupka za automatsko izlučivanje metaforičkih kolokacija

    Latent Dirichlet Allocation Utilization as a Text Mining Method to Elaborate Learning Effectiveness

    Get PDF
    Learning method is a way to explain the lesson materials to students so that the learning process can occur in students as an effort to achieve the goals. Learning methods can be said to be a success if students are active, both physically, mentally, and socially in the learning process, in addition to showing high enthusiasm for learning and having self-confidence. The purpose of this study is to classify the opinions of Indonesian students regarding the existing learning methods and what learning methods they expected.  In order to evaluate existing learning methods using the latent dirichlet allocation method. The data used comes from tweets of Twitter users within the range of January to March 2022. The data is taken using the scrapping method through the help of the python twisel library and totaled to 3778 data, then preprocessed through the nltk and Sastrawi libraries. The results of this analysis stated that student opinions can be classified into 3 major topics which state students' opinions regarding effective learning methods, student difficulties in applicable learning methods, and high cross-departmental interest

    Borders and boundaries in Bosnian, Croatian, Montenegrin and Serbian: Twitter data to the rescue

    Get PDF
    partially_open3siIn this paper we deal with the spatial distribution of 16 linguistic features known to vary between Bosnian, Croatian, Montenegrin, and Serbian. We perform our analyses on a dataset of geo-encoded Twitter status messages collected in the period from mid-2013 to the end of 2016. We perform two types of analyses. The first one finds boundaries in the spatial distribution of the linguistic variable levels through the kernel density estimation smoothing technique. These boundaries are then plotted over the state borders for a visual comparison. The second analysis deals with linguistic distance between the states. The groupings of linguistic variables and countries are calculated given the state borders and the Jensen-Shannon divergence between distributions of the 16 variables within each state. This analysis is completed with a measure of variable consistency for each country. These analyses are intended to show the extent to which current state borders correspond to linguistic boundaries. They suggest that Croatia and Serbia still represent the two extremes, reflecting a history of normative divergences, while Bosnia-Herzegovina and Montenegro, depending on the variable, lean to one or the other side.openNikola LjubeÅ”ić; Maja Miličević Petrović; Tanja SamardžićNikola LjubeÅ”ić; Maja Miličević Petrović; Tanja Samardži

    Proceedings of the Sixth International Conference Formal Approaches to South Slavic and Balkan languages

    Get PDF
    Proceedings of the Sixth International Conference Formal Approaches to South Slavic and Balkan Languages publishes 22 papers that were presented at the conference organised in Dubrovnik, Croatia, 25-28 Septembre 2008

    Open-source resources and standards for Arabic word structure analysis: Fine grained morphological analysis of Arabic text corpora

    Get PDF
    Morphological analyzers are preprocessors for text analysis. Many Text Analytics applications need them to perform their tasks. The aim of this thesis is to develop standards, tools and resources that widen the scope of Arabic word structure analysis - particularly morphological analysis, to process Arabic text corpora of different domains, formats and genres, of both vowelized and non-vowelized text. We want to morphologically tag our Arabic Corpus, but evaluation of existing morphological analyzers has highlighted shortcomings and shown that more research is required. Tag-assignment is significantly more complex for Arabic than for many languages. The morphological analyzer should add the appropriate linguistic information to each part or morpheme of the word (proclitic, prefix, stem, suffix and enclitic); in effect, instead of a tag for a word, we need a subtag for each part. Very fine-grained distinctions may cause problems for automatic morphosyntactic analysis ā€“ particularly probabilistic taggers which require training data, if some words can change grammatical tag depending on function and context; on the other hand, finegrained distinctions may actually help to disambiguate other words in the local context. The SALMA ā€“ Tagger is a fine grained morphological analyzer which is mainly depends on linguistic information extracted from traditional Arabic grammar books and prior knowledge broad-coverage lexical resources; the SALMA ā€“ ABCLexicon. More fine-grained tag sets may be more appropriate for some tasks. The SALMA ā€“Tag Set is a theory standard for encoding, which captures long-established traditional fine-grained morphological features of Arabic, in a notation format intended to be compact yet transparent. The SALMA ā€“ Tagger has been used to lemmatize the 176-million words Arabic Internet Corpus. It has been proposed as a language-engineering toolkit for Arabic lexicography and for phonetically annotating the Qurā€™an by syllable and primary stress information, as well as, fine-grained morphological tagging

    Automated Implementation Process of Machine Translation System for Related Languages

    Get PDF
    The paper presents an attempt to automate all data creation processes of a rule-based shallow-transfer machine translation system. The presented methods were tested on four fully functional translation systems covering language pairs: Slovenian paired with Serbian, Czech, English and Estonian language. An extensive range of evaluation tests was performed to assess the applicability of the methods

    Terminology Integration in Statistical Machine Translation

    Get PDF
    Elektroniskā versija nesatur pielikumusPromocijas darbs apraksta autora izpētÄ«tas metodes un izstrādātus rÄ«kus divvalodu terminoloÄ£ijas integrācijai statistiskās maŔīntulkoÅ”anas sistēmās. Autors darbā piedāvā inovatÄ«vas metodes terminu integrācijai SMT sistēmu trenÄ“Å”anas fāzē (ar statiskas integrācijas palÄ«dzÄ«bu) un tulkoÅ”anas fāzē (ar dinamiskas integrācijas palÄ«dzÄ«bu). Darbā uzmanÄ«ba pievērsta ne tikai metodēm terminu integrācijai SMT, bet arÄ« metodēm valodas resursu, kas nepiecieÅ”ami dažādu uzdevumu veikÅ”anai terminu integrācijas SMT darbplÅ«smās, ieguvei. Piedāvātās metodes ir novērtētas automātiskas un manuālas novērtÄ“Å”anas eksperimentos. IegÅ«tie rezultāti parāda, ka statiskās un dinamiskās integrācijas metodes ļauj bÅ«tiski uzlabot tulkoÅ”anas kvalitāti. Darbā aprakstÄ«tie rezultāti ir aprobēti vairākos pētniecÄ«bas projektos un ieviesti praktiskos risinājumos. Atslēgvārdi: statistiskā maŔīntulkoÅ”ana, terminoloÄ£ija, starpvalodu informācijas izvilkÅ”anaThe doctoral thesis describes methods and tools researched and developed by the author for bilingual terminology integration into statistical machine translation systems. The author presents novel methods for terminology integration in SMT systems during training (through static integration) and during translation (through dynamic integration). The work focusses not only on the SMT integration techniques, but also on methods for acquisition of linguistic resources that are necessary for different tasks involved in workflows for terminology integration in SMT systems. The proposed methods have been evaluated using automatic and manual evaluation methods. The results show that both static and dynamic integration methods allow increasing translation quality. The thesis describes also areas where the methods have been approbated in practice. Keywords: statistical machine translation, terminology, cross-lingual information extractio

    Proceedings of the Seventh International Conference Formal Approaches to South Slavic and Balkan languages

    Get PDF
    Proceedings of the Seventh International Conference Formal Approaches to South Slavic and Balkan Languages publishes 17 papers that were presented at the conference organised in Dubrovnik, Croatia, 4-6 Octobre 2010
    • ā€¦
    corecore