15 research outputs found

    Training deep neural networks for stereo vision

    Get PDF
    We present a method for extracting depth information from a rectified image pair. Our approach focuses on the first stage of many stereo algorithms: the matching cost computation. We approach the problem by learning a similarity measure on small image patches using a convolutional neural network. Training is carried out in a supervised manner by constructing a binary classification data set with examples of similar and dissimilar pairs of patches. We examine two network architectures for learning a similarity measure on image patches. The first architecture is faster than the second, but produces disparity maps that are slightly less accurate. In both cases, the input to the network is a pair of small image patches and the output is a measure of similarity between them. Both architectures contain a trainable feature extractor that represents each image patch with a feature vector. The similarity between patches is measured on the feature vectors instead of the raw image intensity values. The fast architecture uses a fixed similarity measure to compare the two feature vectors, while the accurate architecture attempts to learn a good similarity measure on feature vectors. The output of the convolutional neural network is used to initialize the stereo matching cost. A series of post-processing steps follow: cross-based cost aggregation, semiglobal matching, a left-right consistency check, subpixel enhancement, a median filter, and a bilateral filter. We evaluate our method on the KITTI 2012, KITTI 2015, and Middlebury stereo data sets and show that it outperforms other approaches on all three data sets

    Pareto Optimized Large Mask Approach for Efficient and Background Humanoid Shape Removal

    Get PDF
    The purpose of automated video object removal is to not only detect and remove the object of interest automatically, but also to utilize background context to inpaint the foreground area. Video inpainting requires to fill spatiotemporal gaps in a video with convincing material, necessitating both temporal and spatial consistency; the inpainted part must seamlessly integrate into the background in a variety of scenes, and it must maintain a consistent appearance in subsequent frames even if its surroundings change noticeably. We introduce deep learning-based methodology for removing unwanted human-like shapes in videos. The method uses Pareto-optimized Generative Adversarial Networks (GANs) technology, which is a novel contribution. The system automatically selects the Region of Interest (ROI) for each humanoid shape and uses a skeleton detection module to determine which humanoid shape to retain. The semantic masks of human like shapes are created using a semantic-aware occlusion-robust model that has four primary components: feature extraction, and local, global, and semantic branches. The global branch encodes occlusion-aware information to make the extracted features resistant to occlusion, while the local branch retrieves fine-grained local characteristics. A modified big mask inpainting approach is employed to eliminate a person from the image, leveraging Fast Fourier convolutions and utilizing polygonal chains and rectangles with unpredictable aspect ratios. The inpainter network takes the input image and the mask to create an output image excluding the background humanoid shapes. The generator uses an encoder-decoder structure with included skip connections to recover spatial information and dilated convolution and squeeze and excitation blocks to make the regions behind the humanoid shapes consistent with their surroundings. The discriminator avoids dissimilar structure at the patch scale, and the refiner network catches features around the boundaries of each background humanoid shape. The efficiency was assessed using the Structural Learned Perceptual Image Patch Similarity, Frechet Inception Distance, and Similarity Index Measure metrics and showed promising results in fully automated background person removal task. The method is evaluated on two video object segmentation datasets (DAVIS indicating respective values of 0.02, FID of 5.01 and SSIM of 0.79 and YouTube-VOS, resulting in 0.03, 6.22, 0.78 respectively) as well a database of 66 distinct video sequences of people behind a desk in an office environment (0.02, 4.01, and 0.78 respectively).publishedVersio

    BIOLOGICALLY INSPIRED OBJECT RECOGNITION SYSTEM

    Get PDF
    Object Recognition has been a field of interest to many researchers. In fact, it has been referred to as the most important problem in machine or computer vision. Researchers have developed many algorithms to solve the problem of object recognition that are machine vision motivated. On the other hand, biology has motivated researchers to study the visual system of humans and animals such as monkeys and map it into a computational model. Some of these models are based on the feed-forward mechanism of information communication in cortex where the information is communicated between the different visual areas from the lower areas to the top areas in a feed-forward manner; however, the performance of these models has been affected much by the increase of clutter in the scene as well as occlusion. Another mechanism of information processing in the cortex is called the feedback mechanism, where the information from the top areas in the visual system is communicated to the lower areas in a feedback manner; this mechanism has also been mapped into computational models. All these models which are based on the feed-forward or feedback mechanisms have shown promising results. However, during the testing of these models, there have been some issues that affect their performance such as occlusion that prevents objects from being visible. In addition, scenes that contain high amounts of clutter in them, where there are so many objects, have also affected the performance of these models. In fact, the performance has been reported to drop to 74% when systems that are based on these models are subjected to one or both of the issues mentioned above. The human visual system, naturally, utilizes both feed-forward and feedback mechanisms in the operation of perceiving the surrounding environment. Both feed-forward and feedback mechanisms are integrated in a way that makes the visual system of the human outperforms any state-of-the-art system. In this research, a proposed model of object recognition based on the integration concept of the feed-forward and feedback mechanisms in the human visual system is presented

    A Peer Reviewed Newspaper About Research Refusal

    Get PDF
    This publication presents the outcome of an online workshop (organized by Digital Aesthetics Research Centre, Aarhus University; Centre for the Study of the Networked Image, London South Bank University; and transmediale festival, Berlin) with the participation of nine different groups located at different geographical locations, some inside and some outside the academy. Each group was selected on the basis of an open call and has taken part in a shared mailing list, creating a common list of references, and discussing strategies of refusal, and how these might relate to practices of research and its infrastructures: what might be refused, and in what ways; how might academic autonomy be preserved in the context of capitalist tech development, especially perhaps in the present context of online delivery and the need for alternatives to corporate platforms (e.g. Zoom, Teams, Skype, and the like); and how to refuse research itself, in its instrumental form? Following the workshop, each group has been asked to produce a section of this newspaper that in different ways represents the group’s abstractions on the subject. The design has been developed by Open Source Publishing, a collective renowned for a practice that questions the influence and affordance of digital tools in graphic design, and who works exclusively with free and open source software. The intention behind this publication has, in this way, been to explore the expanded possibilities of acting, sharing, and making, differently - beyond the normative production of research and its dissemination. Importantly, it has also been a means to allow emerging researchers to present their ideas to the wider community of the transmediale festival in an accessible form. The newspaper will be distributed at the festival’s various physical events in Berlin in the coming weeks, and is available for download here and over on the Digital Aesthetics Research Center website . You can also find extended versions of the participants research in APRJA , an open-access research journal that addresses digital culture

    Comparative study of AR versus video tutorials for minor maintenance operations

    Full text link
    [EN] Augmented Reality (AR) has become a mainstream technology in the development of solutions for repair and maintenance operations. Although most of the AR solutions are still limited to specific contexts in industry, some consumer electronics companies have started to offer pre-packaged AR solutions as alternative to video-based tutorials (VT) for minor maintenance operations. In this paper, we present a comparative study of the acquired knowledge and user perception achieved with AR and VT solutions in some maintenance tasks of IT equipment. The results indicate that both systems help users to acquire knowledge in various aspects of equipment maintenance. Although no statistically significant differences were found between AR and VT solutions, users scored higher on the AR version in all cases. Moreover, the users explicitly preferred the AR version when evaluating three different usability and satisfaction criteria. For the AR version, a strong and significant correlation was found between the satisfaction and the achieved knowledge. Since the AR solution achieved similar learning results with higher usability scores than the video-based tutorials, these results suggest that AR solutions are the most effective approach to substitute the typical paper-based instructions in consumer electronics.This work has been supported by Spanish MINECO and EU ERDF programs under grant RTI2018-098156-B-C55.Morillo, P.; García García, I.; Orduña, JM.; Fernández, M.; Juan, M. (2020). Comparative study of AR versus video tutorials for minor maintenance operations. Multimedia Tools and Applications. 79(11-12):7073-7100. https://doi.org/10.1007/s11042-019-08437-9S707371007911-12Ahn J, Williamson J, Gartrell M, Han R, Lv Q, Mishra S (2015) Supporting healthy grocery shopping via mobile augmented reality. ACM Trans Multimedia Comput Commun Appl 12(1s):16:1–16:24. https://doi.org/10.1145/2808207Anderson TW (2011) Anderson–darling tests of goodness-of-fit. Springer, Berlin, pp 52–54. https://doi.org/10.1007/978-3-642-04898-2_118Awad N, Lewandowski SE, Decker EW (2015) Event management system for facilitating user interactions at a venue. US Patent App. 14/829,382Azuma R (1997) A survey of augmented reality. Presence: Teleoperators and Virtual Environments 6(4):355–385Baird K, Barfield W (1999) Evaluating the effectiveness of augmented reality displays for a manual assembly task. Virtual Reality 4:250–259Ballo P (2018) Hardware and software for ar/vr development. In: Augmented and virtual reality in libraries, pp 45–55. LITA guidesBarrile V, Fotia A, Bilotta G (2018) Geomatics and augmented reality experiments for the cultural heritage. Applied Geomatics. https://doi.org/10.1007/s12518-018-0231-5Billinghurst M, Duenser A (2012) Augmented reality in the classroom. Computer 45(7):56–63. https://doi.org/10.1109/MC.2012.111Bowman DA, McMahan RP (2007) Virtual reality: how much immersion is enough? Computer 40(7)Brown TA (2015) Confirmatory factor analysis for applied research. Guilford PublicationsDodge Y. (ed) (2008) Kruskal-Wallis test. Springer, New York. https://doi.org/10.1007/978-0-387-32833-1_216Elmunsyah H, Hidayat WN, Asfani K (2019) Interactive learning media innovation: utilization of augmented reality and pop-up book to improve user’s learning autonomy. J Phys Conf Ser 1193(012):031. https://doi.org/10.1088/1742-6596/1193/1/012031Entertainment L (2017) Dolphin Player. https://play.google.com/store/apps/details?id=com.broov.player. Online; Accessed 09-September-2017Fletcher J, Belanich J, Moses F, Fehr A, Moss J (2017) Effectiveness of augmented reality & augmented virtuality. In: MODSIM Modeling & simulation of systems and applications) world conferenceFraga-Lamas P, Fernández-Caramés TM, Blanco-Novoa O, Vilar-Montesinos MA (2018) A review on industrial augmented reality systems for the industry 4.0 shipyard. IEEE Access 6:13,358–13,375. https://doi.org/10.1109/ACCESS.2018.2808326Furió D, Juan MC, Seguí I, Vivó R (2015) Mobile learning vs. traditional classroom lessons: a comparative study. J Comput Assist Learn 31(3):189–201. https://doi.org/10.1111/jcal.12071Gavish N, Gutiérrez T, Webel S, Rodríguez J, Peveri M, Bockholt U, Tecchia F (2015) Evaluating virtual reality and augmented reality training for industrial maintenance and assembly tasks. Interact Learn Environ 23(6):778–798. https://doi.org/10.1080/10494820.2013.815221Gimeno J, Morillo P, Orduña JM, Fernández M (2013) A new ar authoring tool using depth maps for industrial procedures. Comput Ind 64(9):1263–1271. https://doi.org/10.1016/j.compind.2013.06.012Holzinger A, Kickmeier-Rust MD, Albert D (2008) Dynamic media in computer science education; content complexity and learning performance: is less more? Educational Technology & Society 11(1):279–290Hornbaek K (2013) Some whys and hows of experiments in human–computer interaction. Foundations and TrendsⓇ in Human–Computer Interaction 5(4):299–373. https://doi.org/10.1561/1100000043Huang J, Liu S, Xing J, Mei T, Yan S (2014) Circle & search: Attribute-aware shoe retrieval. ACM Trans Multimedia Comput Commun Appl 11 (1):3:1–3:21. https://doi.org/10.1145/2632165Jiang S, Wu Y, Fu Y (2018) Deep bidirectional cross-triplet embedding for online clothing shopping. ACM Trans Multimedia Comput Commun Appl 14(1):5:1–5:22. https://doi.org/10.1145/3152114Kim SK, Kang SJ, Choi YJ, Choi MH, Hong M (2017) Augmented-reality survey: from concept to application. KSII Transactions on Internet and Information Systems 11:982–1004. https://doi.org/10.3837/tiis.2017.02.019Langlotz T, Zingerle M, Grasset R, Kaufmann H, Reitmayr G (2012) Ar record&replay: Situated compositing of video content in mobile augmented reality. In: Proceedings of the 24th Australian Computer-Human Interaction Conference, OzCHI ’12. ACM, New York, pp 318–326, DOI https://doi.org/10.1145/2414536.2414588, (to appear in print)Martin-SanJose JF, Juan MC, Mollá R, Vivó R (2017) Advanced displays and natural user interfaces to support learning. Interact Learn Environ 25(1):17–34. https://doi.org/10.1080/10494820.2015.1090455Massey FJ (1951) The kolmogorov-Smirnov test for goodness of fit. J Am Stat Assoc 46(253):68–78van der Meij H, van der Meij J, Voerman T, Duipmans E (2018) Supporting motivation, task performance and retention in video tutorials for software training. Educ Technol Res Dev 66(3):597–614. https://doi.org/10.1007/s11423-017-9560-zvan der Meij J, van der Meij H (2015) A test of the design of a video tutorial for software training. J Comput Assist Learn 31 (2):116–132. https://doi.org/10.1111/jcal.12082Mestre LS (2012) Student preference for tutorial design: a usability study. Ref Serv Rev 40(2):258–276. https://doi.org/10.1108/00907321211228318Mohr P, Kerbl B, Donoser M, Schmalstieg D, Kalkofen D (2015) Retargeting technical documentation to augmented reality. In: Proceedings of the 33rd annual ACM conference on human factors in computing systems, CHI ’15. ACM, New York, pp 3337–3346, DOI https://doi.org/10.1145/2702123.2702490, (to appear in print)Mohr P, Mandl D, Tatzgern M, Veas E, Schmalstieg D, Kalkofen D (2017) Retargeting video tutorials showing tools with surface contact to augmented reality. In: Proceedings of the 2017 CHI conference on human factors in computing systems, CHI ’17. ACM, New York, pp 6547–6558, DOI https://doi.org/10.1145/3025453.3025688, (to appear in print)Montgomery DC, Runger GC (2003) Applied statistics and probability for engineers. Wiley, New YorkMorillo P, Orduña JM, Casas S, Fernández M (2019) A comparison study of ar applications versus pseudo-holographic systems as virtual exhibitors for luxury watch retail stores. Multimedia Systems. https://doi.org/10.1007/s00530-019-00606-yMorse JM (2000) Determining sample size. Qual Health Res 10(1):3–5. https://doi.org/10.1177/104973200129118183Muñoz-Montoya F, Juan M, Mendez-Lopez M, Fidalgo C (2019) Augmented reality based on slam to assess spatial short-term memory. IEEE Access 7:2453–2466. https://doi.org/10.1109/ACCESS.2018.2886627Neuhäuser M (2011) Wilcoxon–Mann–Whitney test. Springer, Berlin, pp 1656–1658Neumann U, Majoros A (1998) Cognitive, performance, and systems issues for augmented reality applications in manufacturing and maintenance. In: Inproceedings of the IEEE virtual reality annual international symposium (VR ’98), pp 4–11no JJA, Juan MC, Gil-Gómez JA, Mollá R. (2014) A comparative study using an autostereoscopic display with augmented and virtual reality. Behaviour & Information Technology 33(6):646–655. https://doi.org/10.1080/0144929X.2013.815277Palmarini R, Erkoyuncu JA, Roy R, Torabmostaedi H (2018) A systematic review of augmented reality applications in maintenance. Robot Comput Integr Manuf 49:215–228Quint F, Loch F (2015) Using smart glasses to document maintenance processes. Mensch und Computer 2015–WorkshopbandRadkowski R, Herrema J, Oliver J (2015) Augmented reality-based manual assembly support with visual features for different degrees of difficulty. International Journal of Human–Computer Interaction 31(5):337–349. https://doi.org/10.1080/10447318.2014.994194Regenbrecht H, Schubert T (2002) Measuring presence in augmented reality environments: design and a first test of a questionnaire, Porto, PortugalRobertson J (2012) Likert-type scales, statistical methods, and effect sizes. Commun ACM 55(5):6–7. https://doi.org/10.1145/2160718.2160721Rodríguez-Andrés D, Juan MC, Méndez-López M, Pérez-Hernández E, Lluch J (2016) Mnemocity task: Assessment of childrens spatial memory using stereoscopy and virtual environments. PLos ONE 1(8). https://doi.org/10.1371/journal.pone.0161858Sanna A, Manuri F, Lamberti F, Paravati G, Pezzolla P (2015) Using handheld devices to support augmented reality-based maintenance and assembly tasks. In: 2015 IEEE International conference on consumer electronics (ICCE), pp. 178–179. https://doi.org/10.1109/ICCE.2015.7066370Schmidt S, Ehrenbrink P, Weiss B, Voigt-Antons J, Kojic T, Johnston A, Moller S (2018) Impact of virtual environments on motivation and engagement during exergames. In: 2018 Tenth international conference on quality of multimedia experience (qoMEX), pp 1–6. https://doi.org/10.1109/QoMEX.2018.8463389Shapiro SS, Wilk MB (1965) An analysis of variance test for normality (complete samples). Biometrika 52(3/4):591–611Tang A, Owen C, Biocca F, Mou W (2003) Comparative effectiveness of augmented reality in object assembly. In: Proceedings of the SIGCHI conference on human factors in computing systems, CHI ’03. ACM, New York, pp 73–80, DOI https://doi.org/10.1145/642611.642626, (to appear in print)Tomás JM, Oliver A, Galiana L, Sancho P, Lila M (2013) Explaining method effects associated with negatively worded items in trait and state global and domain-specific self-esteem scales. Structural Equation Modeling: A Multidisciplinary Journal 20(2):299–313. https://doi.org/10.1080/10705511.2013.769394Uva AE, Gattullo M, Manghisi VM, Spagnulo D, Cascella GL, Fiorentino M (2017) Evaluating the effectiveness of spatial augmented reality in smart manufacturing: a solution for manual working stations. The Int J Adv Manuf Technol: 1–13Wang X, Ong SK, Nee AYC (2016) A comprehensive survey of augmented reality assembly research. Advances in Manufacturing 4(1):1–22. https://doi.org/10.1007/s40436-015-0131-4Westerfield G, Mitrovic A, Billinghurst M (2015) Intelligent augmented reality training for motherboard assembly. Int J Artif Intell Educ 25(1):157–172. https://doi.org/10.1007/s40593-014-0032-xWiedenmaier S, Oehme O, Schmidt L, Luczak H (2003) Augmented reality (ar) for assembly processes - design and experimental evaluation. International Journal of Human-Computer Interaction 16(3):497–514Witmer BG, Singer MJ (1998) Measuring presence in virtual environments: a presence questionnaire. Presence: Teleoperators and Virtual Environments 7(3):225–240Wu HK, Lee SWY, Chang HY, Liang JC (2013) Current status, opportunities and challenges of augmented reality in education. Computers & Education 62:41–49. https://doi.org/10.1016/j.compedu.2012.10.024Yim MYC, Chu SC, Sauer PL (2017) Is augmented reality technology an effective tool for e-commerce? an interactivity and vividness perspective. Journal of Interactive Marketing 39(http://www.sciencedirect.com/science/article/pii/S1094996817300336):89–103. https://doi.org/10.1016/j.intmar.2017.04.001Yuan ML, Ong SK, Nee AYC (2008) Augmented reality for assembly guidance using a virtual interactive tool. Int J Prod Res 46(7):1745–1767. https://doi.org/10.1080/0020754060097293

    Visual grasp point localization, classification and state recognition in robotic manipulation of cloth: an overview

    Get PDF
    © . This manuscript version is made available under the CC-BY-NC-ND 4.0 license http://creativecommons.org/licenses/by-nc-nd/4.0/Cloth manipulation by robots is gaining popularity among researchers because of its relevance, mainly (but not only) in domestic and assistive robotics. The required science and technologies begin to be ripe for the challenges posed by the manipulation of soft materials, and many contributions have appeared in the last years. This survey provides a systematic review of existing techniques for the basic perceptual tasks of grasp point localization, state estimation and classification of cloth items, from the perspective of their manipulation by robots. This choice is grounded on the fact that any manipulative action requires to instruct the robot where to grasp, and most garment handling activities depend on the correct recognition of the type to which the particular cloth item belongs and its state. The high inter- and intraclass variability of garments, the continuous nature of the possible deformations of cloth and the evident difficulties in predicting their localization and extension on the garment piece are challenges that have encouraged the researchers to provide a plethora of methods to confront such problems, with some promising results. The present review constitutes for the first time an effort in furnishing a structured framework of these works, with the aim of helping future contributors to gain both insight and perspective on the subjectPeer ReviewedPostprint (author's final draft

    A PYRAMIDAL APPROACH FOR DESIGNING DEEP NEURAL NETWORK ARCHITECTURES

    Get PDF
    Developing an intelligent system, capable of learning discriminative high-level features from high dimensional data lies at the core of solving many computer vision (CV ) and machine learning (ML) tasks. Scene or human action recognition from videos is an important topic in CV and ML. Its applications include video surveillance, robotics, human-computer interaction, video retrieval, etc. Several bio inspired hand crafted feature extraction systems have been proposed for processing temporal data. However, recent deep learning techniques have dominated CV and ML by their good performance on large scale datasets. One of the most widely used deep learning technique is Convolutional neural network (CNN) or its variations, e.g. ConvNet, 3DCNN, C3D. CNN kernel scheme reduces the number of parameters with respect to fully connected Neural Networks. Recent deep CNNs have more layers and more kernels for each layer with respect to early CNNs, and as a consequence, they result in a large number of parameters. In addition, they violate the pyramidal plausible architecture of biological neural network due to the increasing number of filters at each higher layer resulting in difficulty for convergence at training step. In this dissertation, we address three main questions central to pyramidal structure and deep neural networks: 1) Is it worth to utilize pyramidal architecture for proposing a generalized recognition system? 2) How to enhance pyramidal neural network (PyraNet) for recognizing action and dynamic scenes in the videos? 3) What will be the impact of imposing pyramidal structure on a deep CNN? In the first part of the thesis, we provide a brief review of the work done for action and dynamic scene recognition using traditional computer vision and machine learning approaches. In addition, we give a historical and present overview of pyramidal neural networks and how deep learning emerged. In the second part, we introduce a strictly pyramidal deep architecture for dynamic scene and human action recognition. It is based on the 3DCNN model and the image pyramid concept. We introduce a new 3D weighting scheme that presents a simple connection scheme with lower computational and memory costs and results in less number of learnable parameters compared to other neural networks. 3DPyraNet extracts features from both spatial and temporal dimensions by keeping biological structure, thereby it is capable to capture the motion information encoded in multiple adjacent frames. 3DPyraNet model is extended with three modifications: 1) changing input image size; 2) changing receptive field and overlap size in correlation layers; and 3) adding a linear classifier at the end to classify the learned features. It results in a discriminative approach for spatiotemporal feature learning in action and dynamic scene recognition. In combination with a linear SVM classifier, our model outperforms state-of-the-art methods in one-vs-all accuracy on three video benchmark datasets (KTH, Weizmann, and Maryland). Whereas, it gives competitive accuracy on a 4th dataset (YUPENN). In the last part of our thesis, we investigate to what extent CNN may take advantage of pyramid structure typical of biological neurons. A generalized statement over convolutional layers from input up-to fully connected layer is introduced that further helps in understanding and designing a successful deep network. It reduces ambiguity, number of parameters, and their size on disk without degrading overall accuracy. It also helps in giving a generalize guideline for modeling a deep architecture by keeping certain ratio of filters in starting layers vs. other deeper layers. Competitive results are achieved compared to similar well-engineered deeper architectures on four benchmark datasets. The same approach is further applied on person re-identification. Less ambiguity in features increase Rank-1 performance and results in better or comparable results to the state-of-the-art deep models
    corecore