Search CORE

31 research outputs found

System theoretic approach to sustainable development problems

Author: Batanović Vladan
Guberinić Slobodan
Petrović Radivoj
Publication venue: University of Belgrade
Publication date: 01/01/2011
Field of study

This paper shows that the concepts and methodology contained in the system theory and operations research are suitable for application in the planning and control of the sustainable development. The sustainable development problems can be represented using the state space concepts, such as the transition of system, from the given initial state to the final state. It is shown that sustainable development represents a specific control problem. The peculiarity of the sustainable development is that the target is to keep the system in the prescribed feasible region of the state space. The analysis of planning and control problems of sustainable development has also shown that methods developed in the operations research area, such as multicriteria optimization, dynamic processes simulation, non-conventional treatment of uncertainty etc. are adequate, exact base, suitable for resolution of these problems

Crossref

Directory of Open Access Journals

Fuzzy logic based algorithms for maximum covering location problems

Author: Batanović Vladan
Petrovic Dobrila
Petrovic Radijov
Publication venue: 'Elsevier BV'
Publication date: 01/01/2009
Field of study

Coventry University Pure Portal

Kompiliranje korpusa u digitalnim humanističkim znanostima u jezicima s ograničenim resursima: o praksi kompiliranja tematskih korpusa iz digitalnih medija za srpski, hrvatski i slovenski

Author: Batanović Vuk
Bogetić Ksenija
Ljubešić Nikola
Publication venue: 'Hrvatsko filolosko drustvo (Croatian Philological Society)'
Publication date: 01/01/2022
Field of study

The digital era has unlocked unprecedented possibilities of compiling corpora of social discourse, which has brought corpus linguistic methods into closer interaction with other methods of discourse analysis and the humanities. Even when not using any specific techniques of corpus linguistics, drawing on some sort of corpus is increasingly resorted to for empirically–grounded social–scientific analysis (sometimes dubbed ‘corpus–assisted discourse analysis’ or ‘corpus–based critical discourse analysis’, cf. Hardt–Mautner 1995; Baker 2016). In the post–Yugoslav space, recent corpus developments have brought table–turning advantages in many areas of discourse research, along with an ongoing proliferation of corpora and tools. Still, for linguists and discourse analysts who embark on collecting specialized corpora for their own research purposes, many questions persist – partly due to the fast–changing background of these issues, but also due to the fact that there is still a gap in the corpus method, and in guidelines for corpus compilation, when applied beyond the anglophone contexts. In this paper we aim to discuss some possible solutions to these difficulties, by presenting one step–by–step account of a corpus building procedure specifically for Croatian, Serbian and Slovenian, through an example of compiling a thematic corpus from digital media sources (news articles and reader comments). Following an overview of corpus types, uses and advantages in social sciences and digital humanities, we present the corpus compilation possibilities in the South Slavic language contexts, including data scraping options, permissions and ethical issues, the factors that facilitate or complicate automated collection, and corpus annotation and processing possibilities. The study shows expanding possibilities for work with the given languages, but also some persistently grey areas where researchers need to make decisions based on research expectations. Overall, the paper aims to recapitulate our own corpus compilation experience in the wider context of South–Slavic corpus linguistics and corpus linguistic approaches in the humanities more generallyDigitalno doba otvorilo je nove mogućnosti za sastavljanje korpusa društvenog diskursa, što je korpusnolingvističke metode približilo drugim metodama analize diskursa te humanističkim znanostima. Čak i kada se ne koriste nikakve specifične tehnike korpusne lingvistike, danas je za empirijski utemeljenu društveno–znanstvenu analizu sve učestalije korištenje neke vrste korpusa (‘korpusno–asistirana analiza diskursa’ ili ‘kritička korpusna analiza’, Hardt–Mautner 1995; Baker 2016). U postjugoslavenskom prostoru, nedavni razvoj korpusne lingvistike donio je prednosti u mnogim područjima istraživanja. Ipak, za lingviste i analitičare diskursa koji se upuštaju u prikupljanje specijaliziranih korpusa za vlastite istraživačke svrhe, i dalje ostaju otvorena mnoga pitanja – djelomično zbog pozadine korpusne lingvistike koja se brzo mijenja, ali i zbog činjenice da još uvijek postoji rascjep u poznavanju korpusnih metoda, kao i metodologije sastavljanja korpusa izvan anglofonskog konteksta. Ovim radom pokušavamo smanjiti spomenuti rascjep predstavljajući jedan postupni prikaz postupka izgradnje korpusa za hrvatski, srpski i slovenski, kroz primjer sastavljanja tematskog korpusa iz digitalnih medija (novinski članci i komentari čitatelja). Nakon pregleda tipova korpusa, korištenja i prednosti u društvenim znanostima i digitalnim humanističkim znanostima, predstavljamo mogućnosti sastavljanja korpusa u južnoslavenskim jezičnim kontekstima, uključujući opcije preuzimanja podataka s mreže, dozvola i etičkih pitanja, čimbenika koji olakšavaju ili otežavaju automatizirano prikupljanje i označavanje korpusa i mogućnosti obrade. Studija otkriva sve veće mogućnosti za rad s danim jezicima, ali i neka uporno siva područja u kojima istraživači trebaju donositi odluke na temelju istraživačkih očekivanja. Općenito, rad ima za cilj rekapitulirati vlastito iskustvo sastavljanja korpusa u širem kontekstu južnoslavenske korpusne lingvistike i korpusnih lingvističkih pristupa u humanističkim znanostima općenito

HRČAK - Portal of Croatian Scientific and Professional Journals

Hrčak - Portal of scientific journals of Croatia

Metodologija rešavanja semantičkih problema u obradi kratkih tekstova napisanih na prirodnim jezicima sa ograničenim resursima

Author: Batanović Vuk
Publication venue: Универзитет у Београду, Електротехнички факултет
Publication date: 23/12/2020
Field of study

Statistički pristupi obradi prirodnih jezika tipično zahtevaju značajne količine anotiranih podataka, a često i različite pomoćne jezičke alate, što ograničava njihovu primenu u resursno ograničenim situacijama. U ovoj disertaciji predstavljena je metodologija razvoja statističkih rešenja u semantičkoj obradi prirodnih jezika sa ograničenim resursima. Ovakvi jezici se odlikuju ne samo limitiranim brojem postojećih jezičkih resursa, već i ograničenim mogućnostima za razvoj novih skupova podataka i namenskih alata i algoritama. Predložena metodologija je usredsređena na kratke tekstove zbog njihove rasprostranjenosti u digitalnoj komunikaciji i zbog veće složenosti njihove semantičke obrade. Metodologija obuhvata sve faze izrade statističkih rešenja, od prikupljanja tekstualnog sadržaja, preko anotacije podataka, do formulisanja, obučavanja i evaluacije modela mašinskog učenja. Njena upotreba je detaljno ilustrovana na dva semantička problema – analizi sentimenta i određivanju semantičke sličnosti. Kao primer jezika sa ograničenim resursima korišćen je srpski jezik, ali se predložena metodologija može primeniti i na druge jezike iz ove kategorije. Pored opšte metodologije, u doprinose ove disertacije spada razvoj novog, fleksibilnog sistema označavanja sentimenta kratkih tekstova, nove metrike za utvrđivanje ekonomičnosti anotacije, kao i nekoliko novih modela za određivanje semantičke sličnosti kratkih tekstova. Rezultati disertacije uključuju i kreiranje prvih javno dostupnih anotiranih skupova podataka za probleme analize sentimenta i određivanja semantičke sličnosti kratkih tekstova na srpskom jeziku, razvoj i evaluaciju većeg broja modela na ovim problemima, i prvu komparativnu evaluaciju većeg broja alata za morfološku normalizaciju na kratkim tekstovima na srpskom jeziku.Statistical approaches to natural language processing typically require considerable amounts of labeled data, and often various auxiliary language tools as well, limiting their applicability in resource-limited settings. This thesis presents a methodology for developing statistical solutions in the semantic processing of natural languages with limited resources. In these languages, not only are existing language resources limited, but so are the capabilities for developing new datasets and dedicated tools and algorithms. The proposed methodology focuses on short texts due to their prevalence in digital communication, as well as the greater complexity regarding their semantic processing. The methodology encompasses all phases in the creation of statistical solutions, from the collection of textual content, to data annotation, to the formulation, training, and evaluation of machine learning models. Its use is illustrated in detail on two semantic tasks – sentiment analysis and semantic textual similarity. The Serbian language is utilized as an example of a language with limited resources, but the proposed methodology can also be applied to other languages in this category. In addition to the general methodology, the contributions of this thesis consist of the development of a new, flexible short-text sentiment annotation system, a new annotation cost-effectiveness metric, as well as several new semantic textual similarity models. The thesis results also include the creation of the first publicly available annotated datasets of short texts in Serbian for the tasks of sentiment analysis and semantic textual similarity, the development and evaluation of numerous models on these tasks, and the first comparative evaluation of multiple morphological normalization tools on short texts in Serbian

National Repository of Dissertations in Serbia (NaRDuS)

Nardus

hr500k – A Reference Training Corpus of Croatian.

Author: Agić Željko
Batanović Vuk
Erjavec Tomaž
Klubicka Filip
Ljubešić Nikola
Publication venue: Dublin Institute of Technology
Publication date: 20/09/2018
Field of study

In this paper we present hr500k, a Croatian reference training corpus of 500 thousand tokens, segmented at document, sentence and word level, and annotated for morphosyntax, lemmas, dependency syntax, named entities, and semantic roles. We present each annotation layer via basic label statistics and describe the final encoding of the resource in CoNLL and TEI formats. We also give a description of the rather turbulent history of the resource and give insights into the topic and genre distribution in the corpus. Finally, we discuss further enrichments of the corpus with additional layers, which are already underway

Arrow@TUDublin

Otvoreni resursi i tehnologije za obradu srpskog jezika

Author: Batanović Vuk
Ljubešić Nikola
Miličević Petrović Maja
Samardžić Tanja
Publication venue: place:Belgrade
Publication date: 21/10/2020
Field of study

Open language resources and tools are very important for increasing the quality and speeding up the development of technologies for natural language processing. This paper presents a set of open resources available for processing the Serbian language. We describe several manually annotated corpora, as well as a range of computational models, including a web service designed in order to facilitate their use

Archivio istituzionale della ricerca - Alma Mater Studiorum Università di Bologna

NEUROSURGERY ENTHUSIASTIC WOMEN SOCIETY

Approximating Pareto frontier using a hybrid line search approach

Author: Ajith Abraham
Batanović
Chen
Crina Grosan
Das
Das
Das
Deb
Deb
Ehrgott
Ehrgott
Grosan
Ho
Huband
Jahn
Lewis
Messac
Messac
Messac
Messac
Messac
Messac
Messac
Messac
Messac
Miettinen
Okabe
Ortiz
Protter
Rueda
Ruzika
Sanchis
Sayın
Tripathi
Veldhuizen
Wang
Publication venue: 'Elsevier BV'
Publication date: 24/02/2010
Field of study

This is the post-print version of the final paper published in Information Sciences. The published article is available from the link below. Changes resulting from the publishing process, such as peer review, editing, corrections, structural formatting, and other quality control mechanisms may not be reflected in this document. Changes may have been made to this work since it was submitted for publication. Copyright @ 2010 Elsevier B.V.The aggregation of objectives in multiple criteria programming is one of the simplest and widely used approach. But it is well known that this technique sometimes fail in different aspects for determining the Pareto frontier. This paper proposes a new approach for multicriteria optimization, which aggregates the objective functions and uses a line search method in order to locate an approximate efficient point. Once the first Pareto solution is obtained, a simplified version of the former one is used in the context of Pareto dominance to obtain a set of efficient points, which will assure a thorough distribution of solutions on the Pareto frontier. In the current form, the proposed technique is well suitable for problems having multiple objectives (it is not limited to bi-objective problems) and require the functions to be continuous twice differentiable. In order to assess the effectiveness of this approach, some experiments were performed and compared with two recent well known population-based metaheuristics namely ParEGO and NSGA II. When compared to ParEGO and NSGA II, the proposed approach not only assures a better convergence to the Pareto frontier but also illustrates a good distribution of solutions. From a computational point of view, both stages of the line search converge within a short time (average about 150 ms for the first stage and about 20 ms for the second stage). Apart from this, the proposed technique is very simple, easy to implement and use to solve multiobjective problems.CNCSIS IDEI 2412, Romani

CiteSeerX

Crossref

Brunel University Research Archive

Metodologija rešavanja semantičkih problema u obradi kratkih tekstova napisanih na prirodnim jezicima sa ograničenim resursima

Author: Batanović Vuk
Publication venue: Универзитет у Београду, Електротехнички факултет
Publication date: 23/12/2020
Field of study

National Repository of Dissertations in Serbia (NaRDuS)

Sentiment Classification of Documents in Serbian: The Effects of Morphological Normalization and Word Embeddings

Author: B. Nikolić
V. Batanović
Publication venue: 'Centre for Evaluation in Education and Science (CEON/CEES)'
Publication date: 01/11/2017
Field of study

An open issue in the sentiment classification of texts written in Serbian is the effect of different forms of morphological normalization and the usefulness of leveraging large amounts of unlabeled texts. In this paper, we assess the impact of lemmatizers and stemmers for Serbian on classifiers trained and evaluated on the Serbian Movie Review Dataset. We also consider the effectiveness of using word embeddings, generated from a large unlabeled corpus, as classification features

Directory of Open Access Journals

Annotated corpus of Serbian language-related news articles MetaLangNEWS-Sr

Author: Batanović Vuk
Bogetić Ksenija
Publication venue: Regional Linguistic Data Initiative Centre ReLDI
Publication date: 30/10/2020
Field of study

A comprehensive corpus of news articles on the topic of language, published in major Serbian daily newspapers and news portals in the five-year period of January 1, 2015 - January 1, 2020. The corpus is designed to facilitate research on metalanguage (‘language about language’), linguistic ideologies, language policy and planning, as well as the specific contemporary debates on language defining, naming, and standardisation, ongoing in post-Yugoslav societies. The corpus has been tagged using the CLASSLA-StanfordNLP models for morphosyntactic annotation and lemmatisation of standard Serbian. The corpus is available in plain text version, XML with full metadata, and tagged CONLL-U format. MetaLangNEWS-Sr is complemented with a separate corpus of citizen metalanguage comments, i.e. online comments to the news articles, available as MetaLangNEWS-COMMENTS-Sr (http://hdl.handle.net/11356/1372). Parallel versions from Slovenia (http://hdl.handle.net/11356/1360) and Croatia (http://hdl.handle.net/11356/1369) are also available

Common Language Resources and Technology Infrastructure - Slovenia