817 research outputs found

    TractorEYE: Vision-based Real-time Detection for Autonomous Vehicles in Agriculture

    Get PDF
    Agricultural vehicles such as tractors and harvesters have for decades been able to navigate automatically and more efficiently using commercially available products such as auto-steering and tractor-guidance systems. However, a human operator is still required inside the vehicle to ensure the safety of vehicle and especially surroundings such as humans and animals. To get fully autonomous vehicles certified for farming, computer vision algorithms and sensor technologies must detect obstacles with equivalent or better than human-level performance. Furthermore, detections must run in real-time to allow vehicles to actuate and avoid collision.This thesis proposes a detection system (TractorEYE), a dataset (FieldSAFE), and procedures to fuse information from multiple sensor technologies to improve detection of obstacles and to generate a map. TractorEYE is a multi-sensor detection system for autonomous vehicles in agriculture. The multi-sensor system consists of three hardware synchronized and registered sensors (stereo camera, thermal camera and multi-beam lidar) mounted on/in a ruggedized and water-resistant casing. Algorithms have been developed to run a total of six detection algorithms (four for rgb camera, one for thermal camera and one for a Multi-beam lidar) and fuse detection information in a common format using either 3D positions or Inverse Sensor Models. A GPU powered computational platform is able to run detection algorithms online. For the rgb camera, a deep learning algorithm is proposed DeepAnomaly to perform real-time anomaly detection of distant, heavy occluded and unknown obstacles in agriculture. DeepAnomaly is -- compared to a state-of-the-art object detector Faster R-CNN -- for an agricultural use-case able to detect humans better and at longer ranges (45-90m) using a smaller memory footprint and 7.3-times faster processing. Low memory footprint and fast processing makes DeepAnomaly suitable for real-time applications running on an embedded GPU. FieldSAFE is a multi-modal dataset for detection of static and moving obstacles in agriculture. The dataset includes synchronized recordings from a rgb camera, stereo camera, thermal camera, 360-degree camera, lidar and radar. Precise localization and pose is provided using IMU and GPS. Ground truth of static and moving obstacles (humans, mannequin dolls, barrels, buildings, vehicles, and vegetation) are available as an annotated orthophoto and GPS coordinates for moving obstacles. Detection information from multiple detection algorithms and sensors are fused into a map using Inverse Sensor Models and occupancy grid maps. This thesis presented many scientific contribution and state-of-the-art within perception for autonomous tractors; this includes a dataset, sensor platform, detection algorithms and procedures to perform multi-sensor fusion. Furthermore, important engineering contributions to autonomous farming vehicles are presented such as easily applicable, open-source software packages and algorithms that have been demonstrated in an end-to-end real-time detection system. The contributions of this thesis have demonstrated, addressed and solved critical issues to utilize camera-based perception systems that are essential to make autonomous vehicles in agriculture a reality

    Visual Crowd Analysis: Open Research Problems

    Full text link
    Over the last decade, there has been a remarkable surge in interest in automated crowd monitoring within the computer vision community. Modern deep-learning approaches have made it possible to develop fully-automated vision-based crowd-monitoring applications. However, despite the magnitude of the issue at hand, the significant technological advancements, and the consistent interest of the research community, there are still numerous challenges that need to be overcome. In this article, we delve into six major areas of visual crowd analysis, emphasizing the key developments in each of these areas. We outline the crucial unresolved issues that must be tackled in future works, in order to ensure that the field of automated crowd monitoring continues to progress and thrive. Several surveys related to this topic have been conducted in the past. Nonetheless, this article thoroughly examines and presents a more intuitive categorization of works, while also depicting the latest breakthroughs within the field, incorporating more recent studies carried out within the last few years in a concise manner. By carefully choosing prominent works with significant contributions in terms of novelty or performance gains, this paper presents a more comprehensive exposition of advancements in the current state-of-the-art.Comment: Accepted in AI Magazine published by Wiley Periodicals LLC on behalf of the Association for the Advancement of Artificial Intelligenc

    How Cover Images Represent Video Content: A Case Study of Bilibili

    Get PDF
    User generated videos are the most prevalent online products on social media platforms nowadays. In this context, thumbnails (or cover images) serve the important role of representing the video content and attracting viewers’ attention. In this study, we conducted a content analysis of cover images on the Bilibili video-sharing platform, the Chinese counterpart to YouTube, where content creators can upload videos and design their own cover images rather than using automatically generated thumbnails. We extracted four components – snapshot, background, text overlay, and face – that content creators use most often in cover images. We found that the use of different components and their combinations varies in cover images for videos of different duration. The study sheds light on human input into video representation and addresses a gap in the literature, as video thumbnails have previously been studied mainly as the output of automatic generation by algorithms

    Highly efficient low-level feature extraction for video representation and retrieval.

    Get PDF
    PhDWitnessing the omnipresence of digital video media, the research community has raised the question of its meaningful use and management. Stored in immense multimedia databases, digital videos need to be retrieved and structured in an intelligent way, relying on the content and the rich semantics involved. Current Content Based Video Indexing and Retrieval systems face the problem of the semantic gap between the simplicity of the available visual features and the richness of user semantics. This work focuses on the issues of efficiency and scalability in video indexing and retrieval to facilitate a video representation model capable of semantic annotation. A highly efficient algorithm for temporal analysis and key-frame extraction is developed. It is based on the prediction information extracted directly from the compressed domain features and the robust scalable analysis in the temporal domain. Furthermore, a hierarchical quantisation of the colour features in the descriptor space is presented. Derived from the extracted set of low-level features, a video representation model that enables semantic annotation and contextual genre classification is designed. Results demonstrate the efficiency and robustness of the temporal analysis algorithm that runs in real time maintaining the high precision and recall of the detection task. Adaptive key-frame extraction and summarisation achieve a good overview of the visual content, while the colour quantisation algorithm efficiently creates hierarchical set of descriptors. Finally, the video representation model, supported by the genre classification algorithm, achieves excellent results in an automatic annotation system by linking the video clips with a limited lexicon of related keywords

    Video metadata extraction in a videoMail system

    Get PDF
    Currently the world swiftly adapts to visual communication. Online services like YouTube and Vine show that video is no longer the domain of broadcast television only. Video is used for different purposes like entertainment, information, education or communication. The rapid growth of today’s video archives with sparsely available editorial data creates a big problem of its retrieval. The humans see a video like a complex interplay of cognitive concepts. As a result there is a need to build a bridge between numeric values and semantic concepts. This establishes a connection that will facilitate videos’ retrieval by humans. The critical aspect of this bridge is video annotation. The process could be done manually or automatically. Manual annotation is very tedious, subjective and expensive. Therefore automatic annotation is being actively studied. In this thesis we focus on the multimedia content automatic annotation. Namely the use of analysis techniques for information retrieval allowing to automatically extract metadata from video in a videomail system. Furthermore the identification of text, people, actions, spaces, objects, including animals and plants. Hence it will be possible to align multimedia content with the text presented in the email message and the creation of applications for semantic video database indexing and retrieving

    Extreme Compression for Novelty Detection in Bandwidth Constrained Environments

    Get PDF
    Many applications have been launched based on images. More than two hundred thousand images are uploaded to Facebook alone every minute. Accordingly, memory and bandwidth requirements for uploading images keep increasing. In this digital era, image storage and transmission of images over a channel have become more frequent. In addition, many environments exist that have limited memory and low bandwidth capacity. Applications like the Mars Curiosity Rover have extreme constraints on their memory and bandwidth. However, applications of this sort require a great deal of image processing and transmission. The problem is to look for novelty in unmanned places. To detect novelty, we compare all the images with each other that are captured by the machine. Comparison between extremely large images, such as those taken by scientific instruments on remote planets, takes a great deal of processing time, and transmitting those images back to Earth-based facilities is an even greater challenge. To avoid this problem, we propose a solution of extreme compression that forms a signature, solving both storage and transmission problems. These signatures are further used to detect novelty which yields almost same comparison accuracy as before compression. The whole idea reduces the similar data transfers by concentrating on novel data and also on available bandwidth.Computer Scienc

    PKFSKC: PCA Based Key Frame Similarity Kernel Clustering Algorithm

    Get PDF
    针对基于内容的视频检索领域中,关键帧特征矩阵维度不同时的相似度计算问题,提出一种基于主成分分析的关键帧相似度核聚类检索算法。首先,针对任意具有不; 同数量关键帧的视频片段,提取特征向量并构造不同维度的特征矩阵。其次,基于PCA计算对特征矩阵进行SVD计算降维矩阵后,结合矩阵运算方法及核方法设; 计出一种视频关键帧相似度核聚类检索算法,并给出其加权改进形式。最后,通过测试视频标准库和人工视频片段的实验表明,该算法能更好地提视频高视频检索的; 效率。In the content-based video retrieval research, a PCA based key frame; similarity kernel clustering algorithm is proposed to calculate the; similarity of the feature matrix of video key frame with different; dimensions. Firstly, feature vectors and structure feature matrices with; different dimensions of any different video clip key frame are; extracted. Secondly, the dimension reduction matrix with SVD method; based on PCA algorithm is calculated, the key frame similarity kernel; clustering algorithm is proposed with the matrix calculation method and; the kernel method, and its improved weighted representation is proposed; as well. Finally, the simulation experiments on the standard test video; database and artificial video clip database show that the algorithm can; improve the efficiency of video retrieval.福建省软科学一般项目; 2016年虚拟现实技术与系统国家重点实验室立项; 厦门大学立项课题项
    corecore