10,065 research outputs found

    DeepDriving: Learning Affordance for Direct Perception in Autonomous Driving

    Full text link
    Today, there are two major paradigms for vision-based autonomous driving systems: mediated perception approaches that parse an entire scene to make a driving decision, and behavior reflex approaches that directly map an input image to a driving action by a regressor. In this paper, we propose a third paradigm: a direct perception approach to estimate the affordance for driving. We propose to map an input image to a small number of key perception indicators that directly relate to the affordance of a road/traffic state for driving. Our representation provides a set of compact yet complete descriptions of the scene to enable a simple controller to drive autonomously. Falling in between the two extremes of mediated perception and behavior reflex, we argue that our direct perception representation provides the right level of abstraction. To demonstrate this, we train a deep Convolutional Neural Network using recording from 12 hours of human driving in a video game and show that our model can work well to drive a car in a very diverse set of virtual environments. We also train a model for car distance estimation on the KITTI dataset. Results show that our direct perception approach can generalize well to real driving images. Source code and data are available on our project website

    Vision Based Extraction of Nutrition Information from Skewed Nutrition Labels

    Get PDF
    An important component of a healthy diet is the comprehension and retention of nutritional information and understanding of how different food items and nutritional constituents affect our bodies. In the U.S. and many other countries, nutritional information is primarily conveyed to consumers through nutrition labels (NLs) which can be found in all packaged food products. However, sometimes it becomes really challenging to utilize all this information available in these NLs even for consumers who are health conscious as they might not be familiar with nutritional terms or find it difficult to integrate nutritional data collection into their daily activities due to lack of time, motivation, or training. So it is essential to automate this data collection and interpretation process by integrating Computer Vision based algorithms to extract nutritional information from NLs because it improves the user’s ability to engage in continuous nutritional data collection and analysis. To make nutritional data collection more manageable and enjoyable for the users, we present a Proactive NUTrition Management System (PNUTS). PNUTS seeks to shift current research and clinical practices in nutrition management toward persuasion, automated nutritional information processing, and context-sensitive nutrition decision support. PNUTS consists of two modules, firstly a barcode scanning module which runs on smart phones and is capable of vision-based localization of One Dimensional (1D) Universal Product Code (UPC) and International Article Number (EAN) barcodes with relaxed pitch, roll, and yaw camera alignment constraints. The algorithm localizes barcodes in images by computing Dominant Orientations of Gradients (DOGs) of image segments and grouping smaller segments with similar DOGs into larger connected components. Connected components that pass given morphological criteria are marked as potential barcodes. The algorithm is implemented in a distributed, cloud-based system. The system’s front end is a smartphone application that runs on Android smartphones with Android 4.2 or higher. The system’s back end is deployed on a five node Linux cluster where images are processed. The algorithm was evaluated on a corpus of 7,545 images extracted from 506 videos of bags, bottles, boxes, and cans in a supermarket. The DOG algorithm was coupled to our in-place scanner for 1D UPC and EAN barcodes. The scanner receives from the DOG algorithm the rectangular planar dimensions of a connected component and the component’s dominant gradient orientation angle referred to as the skew angle. The scanner draws several scan lines at that skew angle within the component to recognize the barcode in place without any rotations. The scanner coupled to the localizer was tested on the same corpus of 7,545 images. Laboratory experiments indicate that the system can localize and scan barcodes of any orientation in the yaw plane, of up to 73.28 degrees in the pitch plane, and of up to 55.5 degrees in the roll plane. The videos have been made public for all interested research communities to replicate our findings or to use them in their own research. The front end Android application is available for free download at Google Play under the title of NutriGlass. This module is also coupled to a comprehensive NL database from which nutritional information can be retrieved on demand. Currently our NL database consists of more than 230,000 products. The second module of PNUTS is an algorithm whose objective is to determine the text skew angle of an NL image without constraining the angle’s magnitude. The horizontal, vertical, and diagonal matrices of the (Two Dimensional) 2D Haar Wavelet Transform are used to identify 2D points with significant intensity changes. The set of points is bounded with a minimum area rectangle whose rotation angle is the text’s skew. The algorithm’s performance is compared with the performance of five text skew detection algorithms on 1001 U.S. nutrition label images and 2200 single- and multi-column document images in multiple languages. To ensure the reproducibility of the reported results, the source code of the algorithm and the image data have been made publicly available. If the skew angle is estimated correctly, optical character recognition (OCR) techniques can be used to extract nutrition information

    Driver Distraction Identification with an Ensemble of Convolutional Neural Networks

    Get PDF
    The World Health Organization (WHO) reported 1.25 million deaths yearly due to road traffic accidents worldwide and the number has been continuously increasing over the last few years. Nearly fifth of these accidents are caused by distracted drivers. Existing work of distracted driver detection is concerned with a small set of distractions (mostly, cell phone usage). Unreliable ad-hoc methods are often used.In this paper, we present the first publicly available dataset for driver distraction identification with more distraction postures than existing alternatives. In addition, we propose a reliable deep learning-based solution that achieves a 90% accuracy. The system consists of a genetically-weighted ensemble of convolutional neural networks, we show that a weighted ensemble of classifiers using a genetic algorithm yields in a better classification confidence. We also study the effect of different visual elements in distraction detection by means of face and hand localizations, and skin segmentation. Finally, we present a thinned version of our ensemble that could achieve 84.64% classification accuracy and operate in a real-time environment.Comment: arXiv admin note: substantial text overlap with arXiv:1706.0949

    PILOT: Password and PIN Information Leakage from Obfuscated Typing Videos

    Full text link
    This paper studies leakage of user passwords and PINs based on observations of typing feedback on screens or from projectors in the form of masked characters that indicate keystrokes. To this end, we developed an attack called Password and Pin Information Leakage from Obfuscated Typing Videos (PILOT). Our attack extracts inter-keystroke timing information from videos of password masking characters displayed when users type their password on a computer, or their PIN at an ATM. We conducted several experiments in various attack scenarios. Results indicate that, while in some cases leakage is minor, it is quite substantial in others. By leveraging inter-keystroke timings, PILOT recovers 8-character alphanumeric passwords in as little as 19 attempts. When guessing PINs, PILOT significantly improved on both random guessing and the attack strategy adopted in our prior work [4]. In particular, we were able to guess about 3% of the PINs within 10 attempts. This corresponds to a 26-fold improvement compared to random guessing. Our results strongly indicate that secure password masking GUIs must consider the information leakage identified in this paper

    PRHOLO: 360º Interactive Public Relations

    Get PDF
    In the globalized world, possessing good products may not be enough to reach potential clients unless creative marketing strategies are well delineated. In this context, public relations are also important when it comes to capture the client’s attention, making the first contact between the clients and the company’s products, while being persuasive enough to make them confident that the company has the right products to fit their needs. Three virtual public relations installations were purposed in this chapter, combining technology with a human like public relations ability, capable of interacting with potential clients located in front of the installation, at angles of up to 57º (degrees), 180º and 360º, respectively. From one to several Microsoft Kinects were used to develop the three interaction models, which allows tracking and recognition of users’ gestures and positions (heat map), sound sources, voice commands and face and body extraction of the user interacting with the installation.info:eu-repo/semantics/publishedVersio

    Forensic Authentication of WhatsApp Messenger Using the Information Retrieval Approach

    Get PDF
    The development of telecommunications has increased very rapidly since the internet-based instant messaging service has spread rapidly to Indonesia. WhatsApp is the most popular instant messaging application compared to other instant messaging services, according to the statista website users of WhatsApp services in 2018 showed significant growth by gathering 1.5 billion monthly active users or monthly active users (MAU). That number increased 14 percent compared to MAU WhatsApp in July 2017 which amounted to 1.3 billion. Daily active users aka DAU are in the range of one billion. WhatsApp handles more than 60 billion message exchanges between users around the world. This growth is predicted to continue to increase, along with the wider internet penetration. Along with WhatsApp updates with various features embedded in this application including Web-based Whatsapp for computers, this feature makes it easier for users to share data and can be synchronized with their smartphone or user's computer. Besides the positive side found in the application, WhatsApp also provides a security gap for user privacy, one of which is tapping conversations involving both smartphone and computer devices. The handling of crimes involving digital devices needs to be emphasized so that they can help the judicial process of the effects they have caused Mobile Forensics Investigation also took part in suppressing the misuse of WhatsApp's instant messaging service features, including investigating the handling of cases of WhatsApp conversations through a series of standard steps according to digital forensics procedures. Exploration of evidence (digital evidence) WhatsApp conversations will be a reference to the crime of telecommunication tapping which will then be carried out forensic investigation report involving evidence of the smartphone and computer of the victim. Keywords: Authentication, Mobile Forensics, Instant Messenger, and WhatsApp Messenger

    Real Time Vehicle License Plate Recognition on Mobile Devices

    Get PDF
    Automatic license plate recognition is useful in many contexts such as parking control, law enforcement and vehicle background checking. The high cost and low portability of commercial systems makes them inaccessible to the majority of end users. However, current mobile devices now have processors and cameras that make image processing and recognition applications feasible on them. This thesis investigates high accuracy real-time license plate recognition on a smartphone, taking into account device limitations. It first explores how, using the minimal image processing and simple configurable heuristics based on plate geometry, license plates and their characters can be detected in an image. Then, using minimal training data, it shows that a character recognition package can achieve high levels of accuracy. This approach accurately recognized 99 percent of plates appearing in a test set of videos of vehicles with New Zealand license plates
    corecore