2,576 research outputs found

    Deep-Learning-Based Computer Vision Approach For The Segmentation Of Ball Deliveries And Tracking In Cricket

    Full text link
    There has been a significant increase in the adoption of technology in cricket recently. This trend has created the problem of duplicate work being done in similar computer vision-based research works. Our research tries to solve one of these problems by segmenting ball deliveries in a cricket broadcast using deep learning models, MobileNet and YOLO, thus enabling researchers to use our work as a dataset for their research. The output from our research can be used by cricket coaches and players to analyze ball deliveries which are played during the match. This paper presents an approach to segment and extract video shots in which only the ball is being delivered. The video shots are a series of continuous frames that make up the whole scene of the video. Object detection models are applied to reach a high level of accuracy in terms of correctly extracting video shots. The proof of concept for building large datasets of video shots for ball deliveries is proposed which paves the way for further processing on those shots for the extraction of semantics. Ball tracking in these video shots is also done using a separate RetinaNet model as a sample of the usefulness of the proposed dataset. The position on the cricket pitch where the ball lands is also extracted by tracking the ball along the y-axis. The video shot is then classified as a full-pitched, good-length or short-pitched delivery

    Toolflows for Mapping Convolutional Neural Networks on FPGAs: A Survey and Future Directions

    Get PDF
    In the past decade, Convolutional Neural Networks (CNNs) have demonstrated state-of-the-art performance in various Artificial Intelligence tasks. To accelerate the experimentation and development of CNNs, several software frameworks have been released, primarily targeting power-hungry CPUs and GPUs. In this context, reconfigurable hardware in the form of FPGAs constitutes a potential alternative platform that can be integrated in the existing deep learning ecosystem to provide a tunable balance between performance, power consumption and programmability. In this paper, a survey of the existing CNN-to-FPGA toolflows is presented, comprising a comparative study of their key characteristics which include the supported applications, architectural choices, design space exploration methods and achieved performance. Moreover, major challenges and objectives introduced by the latest trends in CNN algorithmic research are identified and presented. Finally, a uniform evaluation methodology is proposed, aiming at the comprehensive, complete and in-depth evaluation of CNN-to-FPGA toolflows.Comment: Accepted for publication at the ACM Computing Surveys (CSUR) journal, 201

    Perception architecture exploration for automotive cyber-physical systems

    Get PDF
    2022 Spring.Includes bibliographical references.In emerging autonomous and semi-autonomous vehicles, accurate environmental perception by automotive cyber physical platforms are critical for achieving safety and driving performance goals. An efficient perception solution capable of high fidelity environment modeling can improve Advanced Driver Assistance System (ADAS) performance and reduce the number of lives lost to traffic accidents as a result of human driving errors. Enabling robust perception for vehicles with ADAS requires solving multiple complex problems related to the selection and placement of sensors, object detection, and sensor fusion. Current methods address these problems in isolation, which leads to inefficient solutions. For instance, there is an inherent accuracy versus latency trade-off between one stage and two stage object detectors which makes selecting an enhanced object detector from a diverse range of choices difficult. Further, even if a perception architecture was equipped with an ideal object detector performing high accuracy and low latency inference, the relative position and orientation of selected sensors (e.g., cameras, radars, lidars) determine whether static or dynamic targets are inside the field of view of each sensor or in the combined field of view of the sensor configuration. If the combined field of view is too small or contains redundant overlap between individual sensors, important events and obstacles can go undetected. Conversely, if the combined field of view is too large, the number of false positive detections will be high in real time and appropriate sensor fusion algorithms are required for filtering. Sensor fusion algorithms also enable tracking of non-ego vehicles in situations where traffic is highly dynamic or there are many obstacles on the road. Position and velocity estimation using sensor fusion algorithms have a lower margin for error when trajectories of other vehicles in traffic are in the vicinity of the ego vehicle, as incorrect measurement can cause accidents. Due to the various complex inter-dependencies between design decisions, constraints and optimization goals a framework capable of synthesizing perception solutions for automotive cyber physical platforms is not trivial. We present a novel perception architecture exploration framework for automotive cyber- physical platforms capable of global co-optimization of deep learning and sensing infrastructure. The framework is capable of exploring the synthesis of heterogeneous sensor configurations towards achieving vehicle autonomy goals. As our first contribution, we propose a novel optimization framework called VESPA that explores the design space of sensor placement locations and orientations to find the optimal sensor configuration for a vehicle. We demonstrate how our framework can obtain optimal sensor configurations for heterogeneous sensors deployed across two contemporary real vehicles. We then utilize VESPA to create a comprehensive perception architecture synthesis framework called PASTA. This framework enables robust perception for vehicles with ADAS requiring solutions to multiple complex problems related not only to the selection and placement of sensors but also object detection, and sensor fusion as well. Experimental results with the Audi-TT and BMW Minicooper vehicles show how PASTA can intelligently traverse the perception design space to find robust, vehicle-specific solutions

    Automated Deployment of an End-to-End Pipeline on Amazon Web Services for Real-Time Visual Inspection using Fast Streaming High-Definition Images

    Get PDF
    This thesis investigates various degrees of freedom and deployment challenges of building an end-to-end intelligent visual inspection system for use in automotive manufacturing. Current methods of fault detection in automotive assembly are highly manual and labor intensive, and thus prone to errors. An automated process can potentially be fast enough to operate within the real-time constraints of the assembly line and can reduce errors. In automotive manufacturing, components of the end-to-end pipeline include capturing a large set of high definition images from a camera setup at the assembly location, transferring and storing the images as needed, executing object detection within a given time frame before the next car arrives in the assembly line, and notifying a human operator when a fault is detected. As inference of object detection models are typically very computing- and memory-intensive, meeting the time, memory and resource constraints requires careful consideration of the choice of object detection model and model parameters, along with adequate hardware and environmental support. Some automotive manufacturing plants lack floor space to set up the entire pipeline on an edge platform. Thus, we have developed a template for Amazon Web Services (AWS) in Python using the BOTO3 libraries that can deploy the entire end-to-end scalable infrastructure in any region in AWS. In this thesis, we design, develop, and experimentally evaluate the performance of system components, including the throughput and latency to upload high definition images to an AWS cloud server, the time required by AWS components in the pipeline, and the tradeoffs of inference time, memory and accuracy for twenty-four popular object detection models on four hardware platforms

    Chimpanzee face recognition from videos in the wild using deep learning

    Get PDF
    Video recording is now ubiquitous in the study of animal behavior, but its analysis on a large scale is prohibited by the time and resources needed to manually process large volumes of data. We present a deep convolutional neural network (CNN) approach that provides a fully automated pipeline for face detection, tracking, and recognition of wild chimpanzees from long-term video records. In a 14-year dataset yielding 10 million face images from 23 individuals over 50 hours of footage, we obtained an overall accuracy of 92.5% for identity recognition and 96.2% for sex recognition. Using the identified faces, we generated co-occurrence matrices to trace changes in the social network structure of an aging population. The tools we developed enable easy processing and annotation of video datasets, including those from other species. Such automated analysis unveils the future potential of large-scale longitudinal video archives to address fundamental questions in behavior and conservation.Agência financiadora Número do subsídio Engineering & Physical Sciences Research Council (EPSRC) EP/M013774/1 Cooperative Research Program of Primate Research Institute, Kyoto University Google Clarendon Fund Boise Trust Fund Wolfson College, University of Oxford Leverhulme Trust PLP-2016-114 Ministry of Education, Culture, Sports, Science and Technology, Japan (MEXT) Japan Society for the Promotion of Science 16H06283 Ministry of Education, Culture, Sports, Science and Technology, Japan (MEXT) Japan Society for the Promotion of Science LGP-U04info:eu-repo/semantics/publishedVersio
    corecore