47 research outputs found

    Kako bojimo svijet riječima

    Get PDF
    Th is paper presents a computational approach to the automatic detection of language patterns, specifi cally those dealing with expressing colors in the Croatian language. It investigates diff erent lexicalization patterns of color terms, mainly compounds and multiword units, in order to classify them and prepare them for usage in the design of an algorithm that will automatically recognize and annotate these expressions in Croatian text. Th e paper also presents a comparative analysis of diff erent classes of color terms found in a corpus built from books intended for younger (CLC) and older (ALC) populations. Finally, the research data is presented through a dictionary of three types of color terms categorized as multiword expressionsU radu je dan sveobuhvatan prikaz različitih obrazaca koji se koriste u terminologiji boja u hrvatskom jeziku i koji su do sada opisani kroz objavljena istraživanja u ovom području. U fokusu je prikaz iz računalnog pristupa automatskom otkrivanju leksičkih obrazaca. Svrha predstavljenog istraživanja je defi nirati postojeće modele za izgradnju izraza o boji u hrvatskom jeziku, s posebnim naglaskom na složenice i viÅ”erječne izraze te implementacija prepoznatih modela u računalnoj obradi jezika. Analiza i defi niranje različitih modela na osnovu postojeće literature za boje u hrvatskom jeziku imala je za cilj njihovu klasifi kaciju i pripremu za uporabu u računalnoj obradi jezika. U ovoj su fazi defi nirana 4 osnovna uzorka sa svojim podā€“klasama. Ovako defi nirani leksikalizirani obrasci koriÅ”teni su unutar NooJ alata za obradu jezika gdje su omogućili izradu (a) digitalnog rječnika s popisom osnovnih boja i opisom njihovih derivacija te (b) računalnog algoritma za automatsko prepoznavanje i označavanje boja u hrvatskom jeziku i pripadajućih oznaka klase. U radu je dodatno predstavljena usporedna analiza različitih klasa izraza za boje pronađenih u korpusu izgrađenom iz knjževnih djela namijenjenih mlađoj (CLC) i starijoj (ALC) populaciji kako bi se dobili dodatni uvidi o koriÅ”tenju određenog obrasca ovisno o uzorku teksta nad kojim se radi analiza. Podaci istraživanja dani su i kroz tablični prikaz tri tipa izraza za boju u klasi viÅ”erječnih izraza. Pripremljeni resursi otvaraju mogućnost dodatnih analiza tekstova iz drugih domena i s novim istraživačkim interesima koji uključuju boje u računalnoj obradi jezik

    Data Quality in the Context of Longitudinal Research Studies

    Get PDF
    This paper discusses the concept of data quality in the context of longitudinal research. By deconstructing quality assurance process and data collection strategies through a case study of the ā€œCroatian Birth Cohort Studyā€œ, we try to define causes and sources of poor data quality in the context of longitudinal studies. Besides the problems discussed throughout the known literature (panel conditioning, sample attrition, recall bias, temporal and financial demands), we introduce singlesource problems, multi-source problems, security problems, design questionnaire problems and QA workflow problems as important aspects in the domain of the possible sources of errors. Additionaly we propose models for eliminating the errors through prevention and detection in order to improve data quality

    Story of a 'Storyline Visualization' in High School Readings

    Get PDF
    Storyline visualization, as a process of illustrating data that has a course of events via a visual medium, has been used in the area of film making for a very long time. Not so long ago, it has moved from the paper version to the digital word allowing for a wider usage. In this paper we propose its usage as a teaching tool in the area of literature reading for the Croatian class (primary language). We have conducted a preliminary research in five Croatian high schools of a different profile to see how storyline visualization, and visualization of school materials in general, affects students understanding of the material being studied. Each school participated with two groups of students where one group was exposed to the storyline visualization of a novel Prokleta avlija by Ivo Andrić [N=103 in total] during the reading period, and the other one was reading without the visualization [N=93 in total]. We will present our results taking into account studentsā€™ gender and type of a school

    Hrvatski poredbeni idiomi: MWU pristup

    Get PDF
    This article presents the work aiming to describe comparative idioms in Croatian language for computational processing using NooJ linguistic environment. As a part of a larger project concentrated on annotating and extracting different Croatian idioms as multi-word units (MWUs), this work aims to present automated comparative idiom search in any Croatian text. Using NooJ environment, a user can find any comparative structure in a text and use it for translation, language learning or research purposes

    Big Data: how we got to the BigData and where are they taking us

    Get PDF
    Količina informacija nastala u razmaku od otprilike 1200 godina, od osnivanja Carigrada pa do otkrića Gutenbergova tiskarskoga stroja, udvostručila se tek nakon 50 godina. Danas postojeću količinu informacija udvostručimo svake 3 godine pa je već mjerimo u eksabajtima. Tako velike količine podataka promijenile su i način na koji koristimo, ali i obrađujemo podatke. Sa sigurnoŔću možemo reći da smo u tijeku jedne nove velike revolucije koja ima i svoje prigodno ime Big Data ā€“ Veliki podatci. Iako su termin osmislili znanstvenici iz područja poput astronomije i genomije, Veliki podatci su posvuda. Oni su istovremeno i resurs i alat čiji je glavni zadatak informiranje. Ali, koliko god nam mogu pomoći bolje razumjeti svijet oko nas, ovisno o tome kako se njima upravlja i tko njima upravlja, mogu nas odvesti i u nekome drugome smjeru. Iako nam se brojke koje se vežu uz Velike podatke mogu u ovom trenutku činiti enormnima, moramo biti svjesni činjenice da će količina onoga Å”to možemo prikupiti i obraditi uvijek biti samo djelić informacija koje zaista postoje na svijetu (i oko njega). No, od nečega moramo početi

    Invasive species of algae in the Adriatic sea

    Get PDF
    Zadnjih par desetljeća Sredozemno more, a time i naÅ” Jadran ugrožen je dolaskom novih invazivnih vrsta. Nakon prokopa Sueskog kanala mnogim organizmima otvorio se put prema Sredozemlju, a oni su tu priliku iskoristili kako bi naselili nova staniÅ”ta i doÅ”li do novih izvora hrane. U ovom su radu obrađene tri vrste algi koje se velikom brzinom Å”ire podmorjem Jadranskog mora, Caulerpa taxifolia, Caulerpa racemosa i Womersleyella setacea. Opisan je habitus svake alge, njezin način razmnožavanja, područja gdje se je može naći te kakav utjecaj ima na ostale organizme. Izložena je ideja o bioloÅ”koj kontroli koju znanstvenici proučavaju zadnjih par godina, a joÅ” će toliko i proći do njezine realizacije. Potrebno je da sve zemlje Sredozemlja ulože zajedničke snage kako bi stale na kraj negativnom utjecaju algi i pronaÅ”le učinkovito rjeÅ”enje za smanjenje njezina Å”irenja. Na taj način autohtone zajednice Sredozemnog mora pa i Jadrana biti će očuvane.Over the last few decades the Mediterranean Sea, and thus our Adriatic Sea, is severely affected by the arrival of new invasive species. After Suez Canal was dug through, lots of organisms got the opportunity to populate new habitats and to find new food sources in the Mediterranean Sea. Herein are presented three species of algae that are spreading very fast over the seabed of Adriatic Sea, Caulerpa taxifolia, Caulerpa racemosa and Womersleyella setacea. The habitus of every algae, its way of reproduction, area that occupies and the affect that has on other organisms is described. The idea of biological control on which scientists are working on for a few years now is also described. It will also take a few years to realize it. All Mediterranean countries should synergy to stop the negative influences of algae and to find effective solution to reduce their expansion rates. Thus native communities of the Mediterranean and Adriatic Sea would be preserved

    Molecular phylogenetic and phylogeographic analysis of Ancylus fluviatilis O. F. MĆ¼ller, 1774 (Gastropoda: Planorbidae) in Croatia

    Get PDF
    Ancylus fluviatilis O. F. MĆ¼ller, 1774 slatkovodni je puž iz porodice Planorbidae Å”iroko rasprostranjen u Hrvatskoj. A. fluviatilis najvjerojatnije predstavlja kompleks četiri genetski i reproduktivno izolirane kriptične vrste koje zasad joÅ” nisu formalno opisane. Cilj ovog istraživanja bio je, na temelju analize dva genska biljega, mitohondrijskih gena za COI i 16S rRNA, razotkriti koje su kriptične vrste A. fluviatilis kompleksa prisutne u Hrvatskoj i kakvo je njihovo rasprostranjenje u naÅ”im vodotocima te utvrditi molekularno-filogenetske odnose i genetske udaljenosti između populacija kao i filogeografski uzorak kompleksa na naÅ”em području. Utvrđeno je da Hrvatsku nastanjuju bar tri vrste A. fluviatilis kompleksa, Å”iroko rasprostranjen Ancylus sp. B, te lokalno prisutan Ancylus sp. C i A. fluviatilis sensu stricto čije rasprostranjenje je u skladu s općom filogeografskom slikom kompleksa. Filogeografski uzorak Ancylus sp. B u Hrvatskoj uglavnom se ne može objasniti prirodnim rasprostranjenjem i geografskim barijerama toku gena, nego pasivnim transportom i klimatskim značajkama područja koje nastanjuju.Ancylus fluviatilis O. F. MĆ¼ller, 1774 is a freshwater snail from family Planorbidae widespread in Croatia. A. fluviatilis most likely represents a complex of four genetically and reproductively isolated cryptic species that are not currently formally described. The aim of this study was, based on the analysis of two genetic markers, mitochondrial genes COI and 16S rRNA, to determine which cryptic species of A. fluviatilis complex are present in Croatia, their distribution, molecular phylogenetic relationships and genetic distance between populations as well as phylogenetic pattern that complex exhibits in our area. It was found that Croatia is inhabited with at least three species of A. fluviatilis complex, the widespread Ancylus sp. B, and locally present Ancylus sp. C and A. fluviatilis sensu stricto whose distribution is in accordance with the general phylogeographic picture of the complex. Observed phylogenetic pattern of Ancylus sp. B in Croatia mostly can not be explained by natural distribution and geographical barriers to gene flow, but rather to passive transport and climatic features of the area they inhabite

    Scholarly reference trees

    Get PDF
    In this paper, we propose, explain and implement bibliometric data analysis and visualization model in a web environment. We use NLP syntactic grammars for pattern recognition of references used in scholarly publications. The extracted information is used for visualizing author egocentric data via tree like structure. The ultimate goal of this work is to use the egocentric trees for comparisons of two authors and to build networks or forests of different trees depending on the forestā€™s attributes. We have stumbled upon many different problems ranging from exceptions in citation style structures to optimization of visualization model in order to achieve an optimal user experience. We will give a summary of our grammarsā€™ restrictions and will provide some ideas for possible future work that could improve the overall user experience. The proposed trees can function by themselves, or they can be implemented in digital repositories of libraries and different types of citation databases

    Building Scholarly Data Forest

    Get PDF
    In this paper, we will demonstrate syntactic analysis and visualization of scientific data, namely references from scientific papers. Our main goal is to build a parser which could extract references from scientific papers, convert them to XML format, send to custom visualization algorithm and present in a web interface as a ReferenceTree for a single author. For this process, we use several different technologies such as NLP software NooJ, programming languages PHP and JavaScript in combination with HTML5. Our main problem was dissimilarity in reference styles between articles. Thus, our parser was designed to recognize different reference source (book, paper, web page) in APA, MLA and Chicago reference styles. As for the visualization idea, we have chosen the concept of presenting an author as a tree, the publication years as the main branches, the articles/books as twigs and references used in each article/book as the leaves. The books are grouped on the left side of the tree while the articles are grouped on the right side. With final output, every processed author should have a unique tree (preferences of references) and could be compared with the rest of the scientific forest

    Improving Students' Language Performance Through Consistent Use of E-Learning: An Empirical Study in Japanese, Korean, Hindi and Sanskrit

    Get PDF
    This paper describes the backing theories, methodology, and results of a two-semester long case study of the application of technology in teaching four Asian languages (Japanese, Korean, Hindi, and Sanskrit) to Croatian students. We have developed e-learning materials to follow the curriculum in Croatia and deployed them in Asian language classrooms. Students who agreed to participate in the study were tested before using the materials, and after each semester, and their progress was surveyed. In the case of Japanese students (N=53), we have thoroughly monitored their usage and compared the progress of students who have diligently studied vocabulary and grammar using our materials on Memrise, and those who have neglected their studies. This was measured through their scores on the Memrise, which shows the user's activity. Also, their progress was measured using standardized tests that were designed in such a manner to resemble Japanese Language Proficiency Test. We have found that frequent users progressed averagely 20,3% after each semester, while non-frequent users have progressed only 11,6%, proving this method to be related to stable and constant use of e-materials
    corecore