14 research outputs found
Dataset preparation for swimmer detection
Velika količina podataka koja se svaki dan kreira može se upotrijebiti za razvoj algoritama umjetne inteligencije u domeni računalnog vida koji rješavaju zadatke poput klasifikacije slika, detekcije osoba i raspoznavanja akcija. Ti skupovi podataka su najčešće izrađeni od videozapisa i slika preuzetih s televizijskih kanala ili s društvene mreže YouTube i prikupljeni su i pripremljeni za odgovarajući zadatak. Nas je zanimao zadatak detekcije plivača, kako bi se model mogao koristiti za raspoznavanje i unaprjeđenje plivačkih tehnika. Iako danas postoje ogromne otvorene baze slika poput COCO i
ImageNet, pripremljene za nadzirano strojno učenje te baze sportskih scena poput Olympic Sports Dataset, UCF Action Sport dataset ili Sport-1M koje uključuju slike popularnijih (gledanijih) sportova, nijedna od njih ne uključuje slike koje bi se mogle koristiti za izradu našeg modela za detekciju plivača. Stoga je u ovom radu opisan postupak snimanja i prikupljanja video materijala te priprema skupa slika UNIRI-SWM za detekciju plivača. Skup uključuje snimke plivača u realnim, situacijskim uvjetima treninga i natjecanja snimljenih akcijskim kamerama iz različitih kutova snimanja. U radu su dani rezultati detekcije plivača korištenjem dubokih konvolucijskih neuronskih mreža Mask R-CNN i Yolov3, naučenim na skupu općih slika prije i nakon učenja na skupu UNIRI-SWM. Rezultati pokazuju da se nakon prilagodbe modela na odgovarajućem skupu slika iz domene plivanja mogu postići jako dobri rezultati detekcije plivača.The large amount of data that is created every day can be used to develop artificial intelligence algorithms in the domain of computer vision that solve tasks such as image classification, face detection and action recognition. These datasets are most often created from videos and images downloaded from television channels or the YouTube social network and are collected and prepared for the appropriate task. We were interested in the task of detecting swimmers, so that the model
could be used to recognize and improve swimming techniques. Although today there are huge open image databases like COCO and ImageNet, prepared for supervised machine learning and sports scene databases like Olympic Sports Dataset, UCF Action Sport dataset or Sport-1M that include images of more popular (watched) sports, none of them include images that could be used to make our swimmer detection model. Therefore, this paper describes the process of recording and collecting video material and preparing a set of UNIRI-SWM images for swimmer detection. The set includes shots of swimmers in real, situational training and competition conditions filmed by action cameras from different shooting angles. The paper presents the results of swimmer detection using deep convolutional neural networks Mask R-CNN and Yolo v3, learned in the set of general images before and after learning in the set UNIRI-SWM. The results show that after adjusting the model on the appropriate set of images from the swimming domain, very good results of swimmer detection can be achieved
Mogućnost primjene govora u računalnim igrama temeljenim na lokaciji
Iako je govor u računalno sintetiziranom obliku postao dio računalnih igara već 1978. godine, njegova primjena, osobito u žanru računalnih igara temeljenih na lokaciji, slabo je istražena. U ovom radu predstavljen je pregled implementacije govora u računalnim igrama nastalim u razdoblju od 1978. do 2018. godine i dosadašnja iskustva njegove primjene. Fokus je stavljen na analizu mogućnosti korištenja govornih tehnologija u računalnim igrama temeljenim na lokaciji. Zaključak donosi odgovor na pitanje ima li smisla, s obzirom na specifičnosti žanra i aktualno stanje tehnologije, uvoditi govorne tehnologije u igre temeljene na lokaciji te koji su preduvjeti za to
Knowledge-based System for Annotation, Interpretation and Image Retrieval
Sustavi za pretraživanje slika nastoje biti intuitivni i što jednostavniji za korištenje. Slike se mogu pretraživati prema vizualnom sadržaju ili prema tekstualnim oznakama kojima su označene. Automatsko označavanje slika razvijeno je kao alternativa pretraživanja slika koje koristi i vizualnu i tekstualnu informaciju. Kako bi rezultati automatskog označavanja odgovarali ključnim riječima koje korisnici intuitivno koriste prilikom pretraživanja slika, neophodno je u označavanje slike uključiti i apstraktnije koncepte nego što su to oznake klasa čije se instance pojavljuju na slici. Označavanje slike koje uključuje koncepte povezane sa slikom na različitim razinama apstrakcije, naziva se višeslojno tumačenje slike. U ovom se radu, za automatsko označavanje i višeslojno tumačenje slika predlaže sustav utemeljen na znanju. Činjenično i neizvjesno znanje o slikama iz domene vanjskih scena predstavljeno je shemom za predstavljanje znanja temeljem neizrazitih Petrijevih mreža (KRFPN shemom). Isti model koristi se za pretraživanje slika. Korištenjem algoritama neizrazitog zaključivanja na KRFPN shemi, kvantizirane vrijednosti komponenata vektora značajki, dobivene na području slike, se klasificiraju u odgovarajuću elementarnu klasu kojom se označava to područje slike. Elementarne klase dobivene prilikom automatskog označavanja područja slike mogu se vrednovati prema vjerojatnom kontekstu uzimajući u obzir pseudo-prostorne relacije definirane u bazi znanja između elementarnih klasa. Unija elementarnih klasa, dobivenih kao rezultat označavanja područja slike zaključivanjem na KRFPN shemi, ulaz je u KRFPN shemu više hijerarhijske razine. Na KRFPN shemi više hijerarhijske razine zaključuju se apstraktniji koncepti, kao što su klase scena, njihove generalizacije ili izvedene klase, koji su implicitno povezani sa slikom. Evaluacija različitih metoda automatskog označavanja slike, na istom skupu slika, pokazala je da su najbolji rezultati postignuti korištenjem KRFPN sheme kada su vektori značajki bili kvantizirani generaliziranim Lloydovim algoritmom.Systems for image retrieval tend to be more intuitive and easier to use. Images can be retrieved using visual contents or keywords they are annotated with. Automatic image annotation has been developed as an alternative to image retrieval and uses both visual and textual information. In order to match the results of the automatic annotation with keywords that users intuitively use when retrieving images, it is necessary to include more abstract concepts in image annotation than classes whose instances appear in the image. Image annotation which includes concepts associated with the image, at different levels of abstraction, is called multilayered image interpretation. In this dissertation a system for automatic image annotation and multilayered image interpretation, based on knowledge, is proposed. Factual and uncertain knowledge of outdoor image scenes is presented using hierarchically arranged knowledge representation schemes based on fuzzy Petri nets formalism (KRFPN1 and KRFPN2 schemes). Scheme KRFPN1 represents knowledge that is used when annotating image segments, and scheme KRFPN2, on the higher hierarchical level, is used for multilayer image interpretation. Given that knowledge of concepts is often incomplete and imprecise; an important property of these schemes is the ability to display the probability or reliability of concepts and relations using values ??associated with tokens and transitions. Since automatically segmented images of outdoor scenes were available, those were used for automatic annotation and multilayered interpretation. Each image segment was associated with a low-level feature vector and keywords from the controlled vocabulary. Therefore, that specific knowledge for a given context is included in the knowledge base represented by KRFPN1 scheme. Elements of KRFPN1 scheme are elementary classes that correspond to the keywords used to annotate image segments, attribute values ??that correspond to code words ??of feature vector components, as well as relations between elementary classes and their attribute values and pseudo-spatial relationships between elementary classes. In addition to the domain specific, more abstract concepts such as scene classes and aggregation relationships between scenes and elementary classes, the knowledge base represented by KRFPN2 scheme includes general knowledge that is relevant to the concepts of interest, such as generalization classes, derived classes and their relationships. The basic idea was to define a mapping that would classify the code words ??of the feature vector components obtained from the image segments, into the corresponding elementary class used to annotate that image segment. Since in the process of quantization some information is lost, three algorithms for designing the codebook were used: the k-means algorithm, the generalized Lloyd algorithm and the EM algorithm. For classification of code words, a fuzzy recognition algorithm on inverse KRFPN1 scheme was used for inference. The inference was based on relationships between the elementary classes and characteristic values ??of their attributes. Additionally, for the purpose of comparison, code words were classified using Na?ve Bayes and k-nearest neighbour algorithms. In comparison to the results of the automatic image annotation given by Na?ve Bayes algorithm and k-nearest neighbour algorithm on the same set of images, automatic image annotation obtained by the proposed system based on KRFPN1 scheme achieves the best results for each method of quantization. Further, obtained results have shown that the quantization affects the results of automatic annotation (classification), and that the best results are achieved by proposed system based on KRFPN1 scheme when the feature vectors were quantized using generalized Lloyd algorithm. Moreover, obtained results were better than the results of the automatic image annotation achieved by models dInd [Duygulu et al. 2002] and dMRF [Carbonetto et al. 2004], whose results on the same set of images were published in [Carbonetto et al. 2004]. Elementary classes obtained during automatic annotation of an image segments can be evaluated according to the probable context, taking into account the pseudo-spatial relationships defined in the knowledge base between elementary classes. Depending on the chosen strategy, obtained elementary class may be rejected as inconsistent or substituted by an elementary class that is more appropriate to the context and has the largest matching properties with the “inconsistent” class according to the fuzzy intersection algorithm on KRFPN1 scheme. Union of elementary classes, those that were obtained as a result of image segments annotation by inference on the KRFPN1 scheme is an input into the KRFPN2 scheme. The KRFPN2 scheme is used to infer more abstract concepts, such as scene classes, their generalizations or derived classes, which are implicitly associated with the image. Given that the schemes are independent, it is possible to use results of image annotation realized by some other method, such as Na?ve Bayesian classifier, as input to KRFPN2 scheme. The scene class that best suits given elementary classes is inferred using fuzzy recognition algorithm on the inverse KRFPN2 scheme. In addition, by using fuzzy inheritance algorithm on the KRFPN2 scheme, the scene classes can be generalized to more abstract classes that are closer to the user's interpretation of the images. The same model was used for image retrieval. As required concept can be at a different level of abstraction, from elementary class to scene or its generalization, characteristics of inheritance reasoning algorithm that can represent knowledge at different levels of abstraction become significant. The proposed system, which is based on KRFPN formalism used for annotation and multilayered interpretation of images in the domain of outdoor scenes, can be used as a template to describe the concepts of another domain. The methodology of acquiring knowledge concerning the concepts of multiple semantic levels is extensible and adaptable to the acquisition of knowledge about the appearance of the object of interest in a particular context. Specifically, KRFPN scheme provides a formal framework for the explicit, machine-workable image interpretation, based on which it is possible to perform, by following set of rules and existing knowledge, inference of new knowledge. As a consequence, this approach could enable the interpretation of images from different domains, if the knowledge base has at its disposal the relevant facts about the objects in the images. Further research will focus on adapting and applying this model to images of another domain. Furthermore, as the formalism of Petri nets has been successfully used to display sequential, parallel and synchronized events, it is expected that the KRFPN scheme, used for multilayered images interpreting, could be modified and used for the interpretation of videos
Knowledge-based System for Annotation, Interpretation and Image Retrieval
Sustavi za pretraživanje slika nastoje biti intuitivni i što jednostavniji za korištenje. Slike se mogu pretraživati prema vizualnom sadržaju ili prema tekstualnim oznakama kojima su označene. Automatsko označavanje slika razvijeno je kao alternativa pretraživanja slika koje koristi i vizualnu i tekstualnu informaciju. Kako bi rezultati automatskog označavanja odgovarali ključnim riječima koje korisnici intuitivno koriste prilikom pretraživanja slika, neophodno je u označavanje slike uključiti i apstraktnije koncepte nego što su to oznake klasa čije se instance pojavljuju na slici. Označavanje slike koje uključuje koncepte povezane sa slikom na različitim razinama apstrakcije, naziva se višeslojno tumačenje slike. U ovom se radu, za automatsko označavanje i višeslojno tumačenje slika predlaže sustav utemeljen na znanju. Činjenično i neizvjesno znanje o slikama iz domene vanjskih scena predstavljeno je shemom za predstavljanje znanja temeljem neizrazitih Petrijevih mreža (KRFPN shemom). Isti model koristi se za pretraživanje slika. Korištenjem algoritama neizrazitog zaključivanja na KRFPN shemi, kvantizirane vrijednosti komponenata vektora značajki, dobivene na području slike, se klasificiraju u odgovarajuću elementarnu klasu kojom se označava to područje slike. Elementarne klase dobivene prilikom automatskog označavanja područja slike mogu se vrednovati prema vjerojatnom kontekstu uzimajući u obzir pseudo-prostorne relacije definirane u bazi znanja između elementarnih klasa. Unija elementarnih klasa, dobivenih kao rezultat označavanja područja slike zaključivanjem na KRFPN shemi, ulaz je u KRFPN shemu više hijerarhijske razine. Na KRFPN shemi više hijerarhijske razine zaključuju se apstraktniji koncepti, kao što su klase scena, njihove generalizacije ili izvedene klase, koji su implicitno povezani sa slikom. Evaluacija različitih metoda automatskog označavanja slike, na istom skupu slika, pokazala je da su najbolji rezultati postignuti korištenjem KRFPN sheme kada su vektori značajki bili kvantizirani generaliziranim Lloydovim algoritmom.Systems for image retrieval tend to be more intuitive and easier to use. Images can be retrieved using visual contents or keywords they are annotated with. Automatic image annotation has been developed as an alternative to image retrieval and uses both visual and textual information. In order to match the results of the automatic annotation with keywords that users intuitively use when retrieving images, it is necessary to include more abstract concepts in image annotation than classes whose instances appear in the image. Image annotation which includes concepts associated with the image, at different levels of abstraction, is called multilayered image interpretation. In this dissertation a system for automatic image annotation and multilayered image interpretation, based on knowledge, is proposed. Factual and uncertain knowledge of outdoor image scenes is presented using hierarchically arranged knowledge representation schemes based on fuzzy Petri nets formalism (KRFPN1 and KRFPN2 schemes). Scheme KRFPN1 represents knowledge that is used when annotating image segments, and scheme KRFPN2, on the higher hierarchical level, is used for multilayer image interpretation. Given that knowledge of concepts is often incomplete and imprecise; an important property of these schemes is the ability to display the probability or reliability of concepts and relations using values ??associated with tokens and transitions. Since automatically segmented images of outdoor scenes were available, those were used for automatic annotation and multilayered interpretation. Each image segment was associated with a low-level feature vector and keywords from the controlled vocabulary. Therefore, that specific knowledge for a given context is included in the knowledge base represented by KRFPN1 scheme. Elements of KRFPN1 scheme are elementary classes that correspond to the keywords used to annotate image segments, attribute values ??that correspond to code words ??of feature vector components, as well as relations between elementary classes and their attribute values and pseudo-spatial relationships between elementary classes. In addition to the domain specific, more abstract concepts such as scene classes and aggregation relationships between scenes and elementary classes, the knowledge base represented by KRFPN2 scheme includes general knowledge that is relevant to the concepts of interest, such as generalization classes, derived classes and their relationships. The basic idea was to define a mapping that would classify the code words ??of the feature vector components obtained from the image segments, into the corresponding elementary class used to annotate that image segment. Since in the process of quantization some information is lost, three algorithms for designing the codebook were used: the k-means algorithm, the generalized Lloyd algorithm and the EM algorithm. For classification of code words, a fuzzy recognition algorithm on inverse KRFPN1 scheme was used for inference. The inference was based on relationships between the elementary classes and characteristic values ??of their attributes. Additionally, for the purpose of comparison, code words were classified using Na?ve Bayes and k-nearest neighbour algorithms. In comparison to the results of the automatic image annotation given by Na?ve Bayes algorithm and k-nearest neighbour algorithm on the same set of images, automatic image annotation obtained by the proposed system based on KRFPN1 scheme achieves the best results for each method of quantization. Further, obtained results have shown that the quantization affects the results of automatic annotation (classification), and that the best results are achieved by proposed system based on KRFPN1 scheme when the feature vectors were quantized using generalized Lloyd algorithm. Moreover, obtained results were better than the results of the automatic image annotation achieved by models dInd [Duygulu et al. 2002] and dMRF [Carbonetto et al. 2004], whose results on the same set of images were published in [Carbonetto et al. 2004]. Elementary classes obtained during automatic annotation of an image segments can be evaluated according to the probable context, taking into account the pseudo-spatial relationships defined in the knowledge base between elementary classes. Depending on the chosen strategy, obtained elementary class may be rejected as inconsistent or substituted by an elementary class that is more appropriate to the context and has the largest matching properties with the “inconsistent” class according to the fuzzy intersection algorithm on KRFPN1 scheme. Union of elementary classes, those that were obtained as a result of image segments annotation by inference on the KRFPN1 scheme is an input into the KRFPN2 scheme. The KRFPN2 scheme is used to infer more abstract concepts, such as scene classes, their generalizations or derived classes, which are implicitly associated with the image. Given that the schemes are independent, it is possible to use results of image annotation realized by some other method, such as Na?ve Bayesian classifier, as input to KRFPN2 scheme. The scene class that best suits given elementary classes is inferred using fuzzy recognition algorithm on the inverse KRFPN2 scheme. In addition, by using fuzzy inheritance algorithm on the KRFPN2 scheme, the scene classes can be generalized to more abstract classes that are closer to the user's interpretation of the images. The same model was used for image retrieval. As required concept can be at a different level of abstraction, from elementary class to scene or its generalization, characteristics of inheritance reasoning algorithm that can represent knowledge at different levels of abstraction become significant. The proposed system, which is based on KRFPN formalism used for annotation and multilayered interpretation of images in the domain of outdoor scenes, can be used as a template to describe the concepts of another domain. The methodology of acquiring knowledge concerning the concepts of multiple semantic levels is extensible and adaptable to the acquisition of knowledge about the appearance of the object of interest in a particular context. Specifically, KRFPN scheme provides a formal framework for the explicit, machine-workable image interpretation, based on which it is possible to perform, by following set of rules and existing knowledge, inference of new knowledge. As a consequence, this approach could enable the interpretation of images from different domains, if the knowledge base has at its disposal the relevant facts about the objects in the images. Further research will focus on adapting and applying this model to images of another domain. Furthermore, as the formalism of Petri nets has been successfully used to display sequential, parallel and synchronized events, it is expected that the KRFPN scheme, used for multilayered images interpreting, could be modified and used for the interpretation of videos
Knowledge-based System for Annotation, Interpretation and Image Retrieval
Sustavi za pretraživanje slika nastoje biti intuitivni i što jednostavniji za korištenje. Slike se mogu pretraživati prema vizualnom sadržaju ili prema tekstualnim oznakama kojima su označene. Automatsko označavanje slika razvijeno je kao alternativa pretraživanja slika koje koristi i vizualnu i tekstualnu informaciju. Kako bi rezultati automatskog označavanja odgovarali ključnim riječima koje korisnici intuitivno koriste prilikom pretraživanja slika, neophodno je u označavanje slike uključiti i apstraktnije koncepte nego što su to oznake klasa čije se instance pojavljuju na slici. Označavanje slike koje uključuje koncepte povezane sa slikom na različitim razinama apstrakcije, naziva se višeslojno tumačenje slike. U ovom se radu, za automatsko označavanje i višeslojno tumačenje slika predlaže sustav utemeljen na znanju. Činjenično i neizvjesno znanje o slikama iz domene vanjskih scena predstavljeno je shemom za predstavljanje znanja temeljem neizrazitih Petrijevih mreža (KRFPN shemom). Isti model koristi se za pretraživanje slika. Korištenjem algoritama neizrazitog zaključivanja na KRFPN shemi, kvantizirane vrijednosti komponenata vektora značajki, dobivene na području slike, se klasificiraju u odgovarajuću elementarnu klasu kojom se označava to područje slike. Elementarne klase dobivene prilikom automatskog označavanja područja slike mogu se vrednovati prema vjerojatnom kontekstu uzimajući u obzir pseudo-prostorne relacije definirane u bazi znanja između elementarnih klasa. Unija elementarnih klasa, dobivenih kao rezultat označavanja područja slike zaključivanjem na KRFPN shemi, ulaz je u KRFPN shemu više hijerarhijske razine. Na KRFPN shemi više hijerarhijske razine zaključuju se apstraktniji koncepti, kao što su klase scena, njihove generalizacije ili izvedene klase, koji su implicitno povezani sa slikom. Evaluacija različitih metoda automatskog označavanja slike, na istom skupu slika, pokazala je da su najbolji rezultati postignuti korištenjem KRFPN sheme kada su vektori značajki bili kvantizirani generaliziranim Lloydovim algoritmom.Systems for image retrieval tend to be more intuitive and easier to use. Images can be retrieved using visual contents or keywords they are annotated with. Automatic image annotation has been developed as an alternative to image retrieval and uses both visual and textual information. In order to match the results of the automatic annotation with keywords that users intuitively use when retrieving images, it is necessary to include more abstract concepts in image annotation than classes whose instances appear in the image. Image annotation which includes concepts associated with the image, at different levels of abstraction, is called multilayered image interpretation. In this dissertation a system for automatic image annotation and multilayered image interpretation, based on knowledge, is proposed. Factual and uncertain knowledge of outdoor image scenes is presented using hierarchically arranged knowledge representation schemes based on fuzzy Petri nets formalism (KRFPN1 and KRFPN2 schemes). Scheme KRFPN1 represents knowledge that is used when annotating image segments, and scheme KRFPN2, on the higher hierarchical level, is used for multilayer image interpretation. Given that knowledge of concepts is often incomplete and imprecise; an important property of these schemes is the ability to display the probability or reliability of concepts and relations using values ??associated with tokens and transitions. Since automatically segmented images of outdoor scenes were available, those were used for automatic annotation and multilayered interpretation. Each image segment was associated with a low-level feature vector and keywords from the controlled vocabulary. Therefore, that specific knowledge for a given context is included in the knowledge base represented by KRFPN1 scheme. Elements of KRFPN1 scheme are elementary classes that correspond to the keywords used to annotate image segments, attribute values ??that correspond to code words ??of feature vector components, as well as relations between elementary classes and their attribute values and pseudo-spatial relationships between elementary classes. In addition to the domain specific, more abstract concepts such as scene classes and aggregation relationships between scenes and elementary classes, the knowledge base represented by KRFPN2 scheme includes general knowledge that is relevant to the concepts of interest, such as generalization classes, derived classes and their relationships. The basic idea was to define a mapping that would classify the code words ??of the feature vector components obtained from the image segments, into the corresponding elementary class used to annotate that image segment. Since in the process of quantization some information is lost, three algorithms for designing the codebook were used: the k-means algorithm, the generalized Lloyd algorithm and the EM algorithm. For classification of code words, a fuzzy recognition algorithm on inverse KRFPN1 scheme was used for inference. The inference was based on relationships between the elementary classes and characteristic values ??of their attributes. Additionally, for the purpose of comparison, code words were classified using Na?ve Bayes and k-nearest neighbour algorithms. In comparison to the results of the automatic image annotation given by Na?ve Bayes algorithm and k-nearest neighbour algorithm on the same set of images, automatic image annotation obtained by the proposed system based on KRFPN1 scheme achieves the best results for each method of quantization. Further, obtained results have shown that the quantization affects the results of automatic annotation (classification), and that the best results are achieved by proposed system based on KRFPN1 scheme when the feature vectors were quantized using generalized Lloyd algorithm. Moreover, obtained results were better than the results of the automatic image annotation achieved by models dInd [Duygulu et al. 2002] and dMRF [Carbonetto et al. 2004], whose results on the same set of images were published in [Carbonetto et al. 2004]. Elementary classes obtained during automatic annotation of an image segments can be evaluated according to the probable context, taking into account the pseudo-spatial relationships defined in the knowledge base between elementary classes. Depending on the chosen strategy, obtained elementary class may be rejected as inconsistent or substituted by an elementary class that is more appropriate to the context and has the largest matching properties with the “inconsistent” class according to the fuzzy intersection algorithm on KRFPN1 scheme. Union of elementary classes, those that were obtained as a result of image segments annotation by inference on the KRFPN1 scheme is an input into the KRFPN2 scheme. The KRFPN2 scheme is used to infer more abstract concepts, such as scene classes, their generalizations or derived classes, which are implicitly associated with the image. Given that the schemes are independent, it is possible to use results of image annotation realized by some other method, such as Na?ve Bayesian classifier, as input to KRFPN2 scheme. The scene class that best suits given elementary classes is inferred using fuzzy recognition algorithm on the inverse KRFPN2 scheme. In addition, by using fuzzy inheritance algorithm on the KRFPN2 scheme, the scene classes can be generalized to more abstract classes that are closer to the user's interpretation of the images. The same model was used for image retrieval. As required concept can be at a different level of abstraction, from elementary class to scene or its generalization, characteristics of inheritance reasoning algorithm that can represent knowledge at different levels of abstraction become significant. The proposed system, which is based on KRFPN formalism used for annotation and multilayered interpretation of images in the domain of outdoor scenes, can be used as a template to describe the concepts of another domain. The methodology of acquiring knowledge concerning the concepts of multiple semantic levels is extensible and adaptable to the acquisition of knowledge about the appearance of the object of interest in a particular context. Specifically, KRFPN scheme provides a formal framework for the explicit, machine-workable image interpretation, based on which it is possible to perform, by following set of rules and existing knowledge, inference of new knowledge. As a consequence, this approach could enable the interpretation of images from different domains, if the knowledge base has at its disposal the relevant facts about the objects in the images. Further research will focus on adapting and applying this model to images of another domain. Furthermore, as the formalism of Petri nets has been successfully used to display sequential, parallel and synchronized events, it is expected that the KRFPN scheme, used for multilayered images interpreting, could be modified and used for the interpretation of videos
Mask R-CNN and Optical Flow Based Method for Detection and Marking of Handball Actions
To build a successful supervised learning model for action recognition a large amount of training data needs to be labeled first. Labeling is normally done manually and it is a tedious and time-consuming task, especially in the case of video footage, when each individual athlete performing a given action should be labeled. To minimize the manual labor, we propose a Mask R-CNN and Optical flow based method to determine the active players who perform a given action among all players presented on the scene. The Mask R-CNN is a deep learning object recognition method used for player detection and optical flow measures player activity. Combining both methods ensures tracking and labeling of active players in handball video sequences. The method was successfully tested on a dataset of handball practice videos recorded in the wild
3D Pose Estimation and Tracking in Handball Actions Using a Monocular Camera
Player pose estimation is particularly important for sports because it provides more accurate monitoring of athlete movements and performance, recognition of player actions, analysis of techniques, and evaluation of action execution accuracy. All of these tasks are extremely demanding and challenging in sports that involve rapid movements of athletes with inconsistent speed and position changes, at varying distances from the camera with frequent occlusions, especially in team sports when there are more players on the field. A prerequisite for recognizing the player’s actions on the video footage and comparing their poses during the execution of an action is the detection of the player’s pose in each element of an action or technique. First, a 2D pose of the player is determined in each video frame, and converted into a 3D pose, then using the tracking method all the player poses are grouped into a sequence to construct a series of elements of a particular action. Considering that action recognition and comparison depend significantly on the accuracy of the methods used to estimate and track player pose in real-world conditions, the paper provides an overview and analysis of the methods that can be used for player pose estimation and tracking using a monocular camera, along with evaluation metrics on the example of handball scenarios. We have evaluated the applicability and robustness of 12 selected 2-stage deep learning methods for 3D pose estimation on a public and a custom dataset of handball jump shots for which they have not been trained and where never-before-seen poses may occur. Furthermore, this paper proposes methods for retargeting and smoothing the 3D sequence of poses that have experimentally shown a performance improvement for all tested models. Additionally, we evaluated the applicability and robustness of five state-of-the-art tracking methods on a public and a custom dataset of a handball training recorded with a monocular camera. The paper ends with a discussion apostrophizing the shortcomings of the pose estimation and tracking methods, reflected in the problems of locating key skeletal points and generating poses that do not follow possible human structures, which consequently reduces the overall accuracy of action recognition
Modification of arcade games Space Invaders and Super Mario into educational versions for learning mathematics and Croatian language
Novi trendovi obrazovanja nastoje iskoristiti naviku učenika da se redovito igraju na različitim platformama i primijeniti motivacijske elemente digitalnih igara u obrazovanju kako bi se povećala motivacija, zainteresiranost i usredotočenost učenika kod učenja i utvrđivanja nastavnih sadržaja. Računalne igre koje se koriste u obrazovnom kontekstu nazivaju se edukativne igre i dizajnirane su u skladu s ishodima učenja kako bi pomogle učenicima da nauče određeno gradivo, da prošire znanje o određenim konceptima ili da im olakšaju usvajanje. Visok stupanj angažiranosti učenika koji se javlja tijekom igranja igara smatra se poželjnim za stvaranje plodnog okruženja za učenje, međutim pokazalo se da edukativne igre učenicima nisu jednako zanimljive niti zabavne kao i komercijalne igre, ne osiguravanju jednaku emociju, adrenalin, uključenost, interakciju i motivaciju za igranjem. Kako bi se prevladala razlika između edukativnih i komercijalnih igara, predlaže se prilagodba poznatih igara, s kojima su se učenici navikli igrati, u njihove edukativne inačice. U ovom radu će se predstaviti prototipovi edukativnih verzija poznatih igara Super Mario i Space Invaders, koji su namijenjeni učenicima mlađih razreda osnovne škole za vježbanje jednostavnih zadataka iz matematike ili hrvatskog jezika
Building a labeled dataset for recognition of handball actions using mask R-CNN and STIPS
Building successful machine learning models depends on large amounts of training data that often needs to be labelled manually. We propose a method to efficiently build an action recognition dataset in the handball domain, focusing on minimizing the manual labor required to label the individual players performing the chosen actions. The method uses existing deep learning object recognition methods for player detection and combines the obtained location information with a player activity measure based on spatio-temporal interest points to track players that are performing the currently relevant action, here called active players. The method was successfully used on a challenging dataset of real-world handball practice videos, where the leading active player was correctly tracked and labeled in 84 % of cases
SAR-DAG_raycast DATASET
A dataset is used to train and define a system that can precisely geolocate persons automatically detected in offline processed images recorded during the SAR mission. The dataset contains data 3D simulations for a few real-world terrains of different configurations and complexity using a custom-made 3D terrain generator and raycaster, along with a person detections with YOLO deep neural network on the real terrain. The collected data is used to define a method for geolocating detected persons based on raycasting, which allows using low-cost commercial drones with a monocular camera in SAR missions