47 research outputs found
Kako bojimo svijet rijeÄima
Th is paper presents a computational approach to the automatic detection of language patterns,
specifi cally those dealing with expressing colors in the Croatian language. It investigates diff erent
lexicalization patterns of color terms, mainly compounds and multiword units, in order to classify
them and prepare them for usage in the design of an algorithm that will automatically recognize
and annotate these expressions in Croatian text. Th e paper also presents a comparative analysis of
diff erent classes of color terms found in a corpus built from books intended for younger (CLC) and
older (ALC) populations. Finally, the research data is presented through a dictionary of three types
of color terms categorized as multiword expressionsU radu je dan sveobuhvatan prikaz razliÄitih obrazaca koji se koriste u terminologiji boja u hrvatskom
jeziku i koji su do sada opisani kroz objavljena istraživanja u ovom podruÄju. U fokusu je prikaz iz raÄunalnog
pristupa automatskom otkrivanju leksiÄkih obrazaca. Svrha predstavljenog istraživanja je defi nirati
postojeÄe modele za izgradnju izraza o boji u hrvatskom jeziku, s posebnim naglaskom na složenice i
viÅ”erjeÄne izraze te implementacija prepoznatih modela u raÄunalnoj obradi jezika.
Analiza i defi niranje razliÄitih modela na osnovu postojeÄe literature za boje u hrvatskom jeziku imala
je za cilj njihovu klasifi kaciju i pripremu za uporabu u raÄunalnoj obradi jezika. U ovoj su fazi defi nirana
4 osnovna uzorka sa svojim podāklasama. Ovako defi nirani leksikalizirani obrasci koriÅ”teni su unutar
NooJ alata za obradu jezika gdje su omoguÄili izradu (a) digitalnog rjeÄnika s popisom osnovnih boja i
opisom njihovih derivacija te (b) raÄunalnog algoritma za automatsko prepoznavanje i oznaÄavanje boja u
hrvatskom jeziku i pripadajuÄih oznaka klase.
U radu je dodatno predstavljena usporedna analiza razliÄitih klasa izraza za boje pronaÄenih u korpusu
izgraÄenom iz knjževnih djela namijenjenih mlaÄoj (CLC) i starijoj (ALC) populaciji kako bi se dobili dodatni
uvidi o koriÅ”tenju odreÄenog obrasca ovisno o uzorku teksta nad kojim se radi analiza. Podaci istraživanja
dani su i kroz tabliÄni prikaz tri tipa izraza za boju u klasi viÅ”erjeÄnih izraza. Pripremljeni resursi otvaraju
moguÄnost dodatnih analiza tekstova iz drugih domena i s novim istraživaÄkim interesima koji ukljuÄuju
boje u raÄunalnoj obradi jezik
Data Quality in the Context of Longitudinal Research Studies
This paper discusses the concept of data quality in the context of longitudinal research. By deconstructing quality assurance process and data collection strategies through a case study of the āCroatian Birth Cohort Studyā, we try to define causes and sources of poor data quality in the context of longitudinal studies. Besides the problems discussed throughout the known literature (panel conditioning, sample attrition, recall bias, temporal and financial demands), we introduce singlesource problems, multi-source problems, security problems, design questionnaire problems and QA workflow problems as important aspects in the domain of the possible sources of errors. Additionaly we propose models for eliminating the errors through prevention and detection in order to improve data quality
Story of a 'Storyline Visualization' in High School Readings
Storyline visualization, as a process of illustrating data that has a course of events via a visual medium, has been used in the area of film making for a very long time. Not so long ago, it has moved from the paper version to the digital word allowing for a wider usage. In this paper we propose its usage as a teaching tool in the area of literature reading for the Croatian class (primary language). We have conducted a preliminary research in five Croatian high schools of a different profile to see how storyline visualization, and visualization of school materials in general, affects students understanding of the material being studied. Each school participated with two groups of students where one group was exposed to the storyline visualization of a novel Prokleta avlija by Ivo AndriÄ [N=103 in total] during the reading period, and the other one was reading without the visualization [N=93 in total]. We will present our results taking into account studentsā gender and type of a school
Hrvatski poredbeni idiomi: MWU pristup
This article presents the work aiming to describe comparative idioms in Croatian language for computational processing using NooJ linguistic environment. As a part of a larger project concentrated on annotating and extracting different Croatian idioms as multi-word units (MWUs), this work aims to present automated comparative idiom search in any Croatian text. Using NooJ environment, a user can find any comparative structure in a text and use it for translation, language learning or research purposes
Big Data: how we got to the BigData and where are they taking us
KoliÄina informacija nastala u razmaku od otprilike 1200 godina, od osnivanja Carigrada pa do otkriÄa Gutenbergova tiskarskoga stroja, udvostruÄila se tek nakon 50 godina. Danas postojeÄu koliÄinu informacija udvostruÄimo svake 3 godine pa je veÄ mjerimo u eksabajtima. Tako velike koliÄine podataka promijenile su i naÄin na koji koristimo, ali i obraÄujemo podatke. Sa sigurnoÅ”Äu možemo reÄi da smo u tijeku jedne nove velike revolucije koja ima i svoje prigodno ime Big Data ā Veliki podatci. Iako su termin osmislili znanstvenici iz podruÄja poput astronomije i genomije, Veliki podatci su posvuda. Oni su istovremeno i resurs i alat Äiji je glavni zadatak informiranje. Ali, koliko god nam mogu pomoÄi bolje razumjeti svijet oko nas, ovisno o tome kako se njima upravlja i tko njima upravlja, mogu nas odvesti i u nekome drugome smjeru. Iako nam se brojke koje se vežu uz Velike podatke mogu u ovom trenutku Äiniti enormnima, moramo biti svjesni Äinjenice da Äe koliÄina onoga Å”to možemo prikupiti i obraditi uvijek biti samo djeliÄ informacija koje zaista postoje na svijetu (i oko njega). No, od neÄega moramo poÄeti
Invasive species of algae in the Adriatic sea
Zadnjih par desetljeÄa Sredozemno more, a time i naÅ” Jadran ugrožen je dolaskom novih invazivnih vrsta. Nakon prokopa Sueskog kanala mnogim organizmima otvorio se put prema Sredozemlju, a oni su tu priliku iskoristili kako bi naselili nova staniÅ”ta i doÅ”li do novih izvora hrane. U ovom su radu obraÄene tri vrste algi koje se velikom brzinom Å”ire podmorjem Jadranskog mora, Caulerpa taxifolia, Caulerpa racemosa i Womersleyella setacea. Opisan je habitus svake alge, njezin naÄin razmnožavanja, podruÄja gdje se je može naÄi te kakav utjecaj ima na ostale organizme. Izložena je ideja o bioloÅ”koj kontroli koju znanstvenici prouÄavaju zadnjih par godina, a joÅ” Äe toliko i proÄi do njezine realizacije. Potrebno je da sve zemlje Sredozemlja ulože zajedniÄke snage kako bi stale na kraj negativnom utjecaju algi i pronaÅ”le uÄinkovito rjeÅ”enje za smanjenje njezina Å”irenja. Na taj naÄin autohtone zajednice Sredozemnog mora pa i Jadrana biti Äe oÄuvane.Over the last few decades the Mediterranean Sea, and thus our Adriatic Sea, is severely affected by the arrival of new invasive species. After Suez Canal was dug through, lots of organisms got the opportunity to populate new habitats and to find new food sources in the Mediterranean Sea. Herein are presented three species of algae that are spreading very fast over the seabed of Adriatic Sea, Caulerpa taxifolia, Caulerpa racemosa and Womersleyella setacea. The habitus of every algae, its way of reproduction, area that occupies and the affect that has on other organisms is described. The idea of biological control on which scientists are working on for a few years now is also described. It will also take a few years to realize it. All Mediterranean countries should synergy to stop the negative influences of algae and to find effective solution to reduce their expansion rates. Thus native communities of the Mediterranean and Adriatic Sea would be preserved
Molecular phylogenetic and phylogeographic analysis of Ancylus fluviatilis O. F. MĆ¼ller, 1774 (Gastropoda: Planorbidae) in Croatia
Ancylus fluviatilis O. F. MĆ¼ller, 1774 slatkovodni je puž iz porodice Planorbidae Å”iroko rasprostranjen u Hrvatskoj. A. fluviatilis najvjerojatnije predstavlja kompleks Äetiri genetski i reproduktivno izolirane kriptiÄne vrste koje zasad joÅ” nisu formalno opisane. Cilj ovog istraživanja bio je, na temelju analize dva genska biljega, mitohondrijskih gena za COI i 16S rRNA, razotkriti koje su kriptiÄne vrste A. fluviatilis kompleksa prisutne u Hrvatskoj i kakvo je njihovo rasprostranjenje u naÅ”im vodotocima te utvrditi molekularno-filogenetske odnose i genetske udaljenosti izmeÄu populacija kao i filogeografski uzorak kompleksa na naÅ”em podruÄju. UtvrÄeno je da Hrvatsku nastanjuju bar tri vrste A. fluviatilis kompleksa, Å”iroko rasprostranjen Ancylus sp. B, te lokalno prisutan Ancylus sp. C i A. fluviatilis sensu stricto Äije rasprostranjenje je u skladu s opÄom filogeografskom slikom kompleksa. Filogeografski uzorak Ancylus sp. B u Hrvatskoj uglavnom se ne može objasniti prirodnim rasprostranjenjem i geografskim barijerama toku gena, nego pasivnim transportom i klimatskim znaÄajkama podruÄja koje nastanjuju.Ancylus fluviatilis O. F. MĆ¼ller, 1774 is a freshwater snail from family Planorbidae widespread in Croatia. A. fluviatilis most likely represents a complex of four genetically and reproductively isolated cryptic species that are not currently formally described. The aim of this study was, based on the analysis of two genetic markers, mitochondrial genes COI and 16S rRNA, to determine which cryptic species of A. fluviatilis complex are present in Croatia, their distribution, molecular phylogenetic relationships and genetic distance between populations as well as phylogenetic pattern that complex exhibits in our area. It was found that Croatia is inhabited with at least three species of A. fluviatilis complex, the widespread Ancylus sp. B, and locally present Ancylus sp. C and A. fluviatilis sensu stricto whose distribution is in accordance with the general phylogeographic picture of the complex. Observed phylogenetic pattern of Ancylus sp. B in Croatia mostly can not be explained by natural distribution and geographical barriers to gene flow, but rather to passive transport and climatic features of the area they inhabite
Scholarly reference trees
In this paper, we propose, explain and implement bibliometric data analysis and visualization model in a web environment. We use NLP syntactic grammars for pattern recognition of references used in scholarly publications. The extracted information is used for visualizing author egocentric data via tree like structure. The ultimate goal of this work is to use the egocentric trees for comparisons of two authors and to build networks or forests of different trees depending on the forestās attributes. We have stumbled upon many different problems ranging from exceptions in citation style structures to optimization of visualization model in order to achieve an optimal user experience. We will give a summary of our grammarsā restrictions and will provide some ideas for possible future work that could improve the overall user experience. The proposed trees can function by themselves, or they can be implemented in digital repositories of libraries and different types of citation databases
Building Scholarly Data Forest
In this paper, we will demonstrate syntactic analysis and visualization of scientific data, namely references from scientific papers. Our main goal is to build a parser which could extract references from scientific papers, convert them to XML format, send to custom visualization algorithm and present in a web interface as a ReferenceTree for a single author. For this process, we use several different technologies such as NLP software NooJ, programming languages PHP and JavaScript in combination with HTML5. Our main problem was dissimilarity in reference styles between articles. Thus, our parser was designed to recognize different reference source (book, paper, web page) in APA, MLA and Chicago reference styles. As for the visualization idea, we have chosen the concept of presenting an author as a tree, the publication years as the main branches, the articles/books as twigs and references used in each article/book as the leaves. The books are grouped on the left side of the tree while the articles are grouped on the right side. With final output, every processed author should have a unique tree (preferences of references) and could be compared with the rest of the scientific forest
Improving Students' Language Performance Through Consistent Use of E-Learning: An Empirical Study in Japanese, Korean, Hindi and Sanskrit
This paper describes the backing theories, methodology, and results of a two-semester long case study of the application of technology in teaching four Asian languages (Japanese, Korean, Hindi, and Sanskrit) to Croatian students. We have developed e-learning materials to follow the curriculum in Croatia and deployed them in Asian language classrooms. Students who agreed to participate in the study were tested before using the materials, and after each semester, and their progress was surveyed. In the case of Japanese students (N=53), we have thoroughly monitored their usage and compared the progress of students who have diligently studied vocabulary and grammar using our materials on Memrise, and those who have neglected their studies. This was measured through their scores on the Memrise, which shows the user's activity. Also, their progress was measured using standardized tests that were designed in such a manner to resemble Japanese Language Proficiency Test. We have found that frequent users progressed averagely 20,3% after each semester, while non-frequent users have progressed only 11,6%, proving this method to be related to stable and constant use of e-materials