4,993 research outputs found

    Deep Feature-based Face Detection on Mobile Devices

    Full text link
    We propose a deep feature-based face detector for mobile devices to detect user's face acquired by the front facing camera. The proposed method is able to detect faces in images containing extreme pose and illumination variations as well as partial faces. The main challenge in developing deep feature-based algorithms for mobile devices is the constrained nature of the mobile platform and the non-availability of CUDA enabled GPUs on such devices. Our implementation takes into account the special nature of the images captured by the front-facing camera of mobile devices and exploits the GPUs present in mobile devices without CUDA-based frameorks, to meet these challenges.Comment: ISBA 201

    An Immersive Telepresence System using RGB-D Sensors and Head Mounted Display

    Get PDF
    We present a tele-immersive system that enables people to interact with each other in a virtual world using body gestures in addition to verbal communication. Beyond the obvious applications, including general online conversations and gaming, we hypothesize that our proposed system would be particularly beneficial to education by offering rich visual contents and interactivity. One distinct feature is the integration of egocentric pose recognition that allows participants to use their gestures to demonstrate and manipulate virtual objects simultaneously. This functionality enables the instructor to ef- fectively and efficiently explain and illustrate complex concepts or sophisticated problems in an intuitive manner. The highly interactive and flexible environment can capture and sustain more student attention than the traditional classroom setting and, thus, delivers a compelling experience to the students. Our main focus here is to investigate possible solutions for the system design and implementation and devise strategies for fast, efficient computation suitable for visual data processing and network transmission. We describe the technique and experiments in details and provide quantitative performance results, demonstrating our system can be run comfortably and reliably for different application scenarios. Our preliminary results are promising and demonstrate the potential for more compelling directions in cyberlearning.Comment: IEEE International Symposium on Multimedia 201

    Active User Authentication for Smartphones: A Challenge Data Set and Benchmark Results

    Full text link
    In this paper, automated user verification techniques for smartphones are investigated. A unique non-commercial dataset, the University of Maryland Active Authentication Dataset 02 (UMDAA-02) for multi-modal user authentication research is introduced. This paper focuses on three sensors - front camera, touch sensor and location service while providing a general description for other modalities. Benchmark results for face detection, face verification, touch-based user identification and location-based next-place prediction are presented, which indicate that more robust methods fine-tuned to the mobile platform are needed to achieve satisfactory verification accuracy. The dataset will be made available to the research community for promoting additional research.Comment: 8 pages, 12 figures, 6 tables. Best poster award at BTAS 201

    Design and implementation of a mobile phone application to help people with visual dysfunction visually inspect their surrounding spaces

    Get PDF
    Aquest projecte consisteix en el desenvolupament de software amb l’objectiu d’ajudar a persones amb discapacitat visual a moure’s i ubicar-se en espais interiors, que probablement siguin el seu entorn personal i domèstic. Aquest software està dissenyat per permetre al seu usuari fer una foto de l’entorn que l’envolta i donar-li una resposta oral que expliqui algunes de les característiques de la fotografia, definint per tant l’espai que la persona vol analitzar. A més, l'usuari ha de ser capaç de fer saber a l'aplicació què vol examinar gràficament en particular. L’usuari executa l’aplicació mòbil cada vegada que la vol utilitzar, operant-la mitjançant ordres de veu. Per tal de detectar, reconèixer i inspeccionar els objectes i entorns circumdants, s’utilitzen tecnologies d’aprenentatge profund i xarxes d’interacció entre dispositius per proporcionar els esforços computacionals i les comunicacions. S'ha realitzat una avaluació de la precisió i robustesa de les xarxes neurals al mateix temps que s’han anat desenvolupant per tal de dissenyar i implementar solucions que les facin més fiables. S’han implementat llenguatges de programació per a la creació d'aplicacions software i protocols de comunicació amb èxit per tal desenvolupar el programari funcional en la seva totalitat.Este proyecto consiste en el desarrollo de software con el objetivo de ayudar a personas con discapacidad visual a moverse y ubicarse en espacios interiores, que probablemente sean su entorno personal y doméstico. Este software está diseñado para permitir a su usuario hacer una foto del entorno que le rodea y darle una respuesta oral que explique algunas de las características de la fotografía, definiendo por tanto el espacio que la persona quiere analizar. Además, el usuario debe ser capaz de hacer saber a la aplicación qué quiere examinar gráficamente en particular. El usuario ejecuta la aplicación móvil cada vez que la quiere utilizar, operándola mediante comandos de voz. Con el fin de detectar, reconocer e inspeccionar los objetos y entornos circundantes, se utilizan tecnologías de aprendizaje profundo y redes de interacción entre dispositivos para proporcionar los esfuerzos computacionales y las comunicaciones. Se ha realizado una evaluación de la precisión y robustez de las redes neurales a medida que se han ido desarrollando con el fin de diseñar e implementar soluciones que las hagan más fiables. Se han implementado lenguajes de programación para la creación de aplicaciones software y protocolos de comunicación con éxito para desarrollar el software funcional en su totalidad.This project consists in the development of software that helps people with visual impairment move and get along in indoor spaces, which might probably be their personal and domestic surroundings. This software is meant to allow its user to take a photo of the environment that surrounds him and give him an oral response that explains some of the characteristics of the taken picture, therefore defining the space that the person wants to analyse. Furthermore, the user must be capable of letting the application know what in particular he wants to graphically examine. The user runs the mobile phone application each time he wants to use it, operating it through voice commands. In order to detect, recognize and inspect the surrounding objects and environments, Deep Learning and cloud technologies are used to provide the computational efforts and communications. An evaluation of the accuracy and robustness of the neural networks has been performed at the same time than they have been developed in order to design and implement solutions that make them more reliable. Programming languages for the creation of software applications and communication protocols have been successfully implemented to develop the fully functional software

    Advanced Capsule Networks via Context Awareness

    Full text link
    Capsule Networks (CN) offer new architectures for Deep Learning (DL) community. Though its effectiveness has been demonstrated in MNIST and smallNORB datasets, the networks still face challenges in other datasets for images with distinct contexts. In this research, we improve the design of CN (Vector version) namely we expand more Pooling layers to filter image backgrounds and increase Reconstruction layers to make better image restoration. Additionally, we perform experiments to compare accuracy and speed of CN versus DL models. In DL models, we utilize Inception V3 and DenseNet V201 for powerful computers besides NASNet, MobileNet V1 and MobileNet V2 for small and embedded devices. We evaluate our models on a fingerspelling alphabet dataset from American Sign Language (ASL). The results show that CNs perform comparably to DL models while dramatically reducing training time. We also make a demonstration and give a link for the purpose of illustration.Comment: 12 page
    corecore