4 research outputs found
ReSTiNet : On improving the performance of Tiny-YOLO-Based CNN architecture for applications in human detection
Human detection is a special application of object recognition and is considered one of the greatest challenges in computer vision. It is the starting point of a number of applications, including public safety and security surveillance around the world. Human detection technologies have advanced significantly in recent years due to the rapid development of deep learning techniques. Despite recent advances, we still need to adopt the best network-design practices that enable compact sizes, deep designs, and fast training times while maintaining high accuracies. In this article, we propose ReSTiNet, a novel compressed convolutional neural network that addresses the issues of size, detection speed, and accuracy. Following SqueezeNet, ReSTiNet adopts the fire modules by examining the number of fire modules and their placement within the model to reduce the number of parameters and thus the model size. The residual connections within the fire modules in ReSTiNet are interpolated and finely constructed to improve feature propagation and ensure the largest possible information flow in the model, with the goal of further improving the proposed ReSTiNet in terms of detection speed and accuracy. The proposed algorithm downsizes the previously popular Tiny-YOLO model and improves the following features: (1) faster detection speed; (2) compact model size; (3) solving the overfitting problems; and (4) superior performance than other lightweight models such as MobileNet and SqueezeNet in terms of mAP. The proposed model was trained and tested using MS COCO and Pascal VOC datasets. The resulting ReSTiNet model is 10.7 MB in size (almost five times smaller than Tiny-YOLO), but it achieves an mAP of 63.74% on PASCAL VOC and 27.3% on MS COCO datasets using Tesla k80 GPU
Recommended from our members
Unmediated Interaction: Communicating with Computers and Embedded Devices as If They Are Not There
Although computers are smaller and more readily accessible today than they have ever been, I believe that we have barely scratched the surface of what computers can become. When we use computing devices today, we end up spending a lot of our time navigating to particular functions or commands to use devices their way rather than executing those commands immediately. In this dissertation, I explore what I call unmediated interaction, the notion of people using computers as if the computers are not there and as if the people are using their own abilities or powers instead. I argue that facilitating unmediated interaction via personalization, new input modalities, and improved text entry can reduce both input overhead and output overhead, which are the burden of providing inputs to and receiving outputs from the intermediate device, respectively. I introduce three computational methods for reducing input overhead and one for reducing output overhead. First, I show how input data mining can eliminate the need for user inputs altogether. Specifically, I develop a method for mining controller inputs to gain deep insights about a players playing style, their preferences, and the nature of video games that they are playing, all of which can be used to personalize their experience without any explicit input on their part. Next, I introduce gaze locking, a method for sensing eye contact from an image that allows people to interact with computers, devices, and other objects just by looking at them. Third, I introduce computationally optimized keyboard designs for touchscreen manual input that allow people to type on smartphones faster and with far fewer errors than currently possible. Last, I introduce the racing auditory display (RAD), an audio system that makes it possible for people who are blind to play the same types of racing games that sighted players can play, and with a similar speed and sense of control as sighted players. The RAD shows how we can reduce output overhead to provide user interface parity between people with and without disabilities. Together, I hope that these systems open the door to even more efforts in unmediated interaction, with the goal of making computers less like devices that we use and more like abilities or powers that we have