291 research outputs found

    Track, then Decide: Category-Agnostic Vision-based Multi-Object Tracking

    Full text link
    The most common paradigm for vision-based multi-object tracking is tracking-by-detection, due to the availability of reliable detectors for several important object categories such as cars and pedestrians. However, future mobile systems will need a capability to cope with rich human-made environments, in which obtaining detectors for every possible object category would be infeasible. In this paper, we propose a model-free multi-object tracking approach that uses a category-agnostic image segmentation method to track objects. We present an efficient segmentation mask-based tracker which associates pixel-precise masks reported by the segmentation. Our approach can utilize semantic information whenever it is available for classifying objects at the track level, while retaining the capability to track generic unknown objects in the absence of such information. We demonstrate experimentally that our approach achieves performance comparable to state-of-the-art tracking-by-detection methods for popular object categories such as cars and pedestrians. Additionally, we show that the proposed method can discover and robustly track a large variety of other objects.Comment: ICRA'18 submissio

    Understanding egocentric human actions with temporal decision forests

    Get PDF
    Understanding human actions is a fundamental task in computer vision with a wide range of applications including pervasive health-care, robotics and game control. This thesis focuses on the problem of egocentric action recognition from RGB-D data, wherein the world is viewed through the eyes of the actor whose hands describe the actions. The main contributions of this work are its findings regarding egocentric actions as described by hands in two application scenarios and a proposal of a new technique that is based on temporal decision forests. The thesis first introduces a novel framework to recognise fingertip writing in mid-air in the context of human-computer interaction. This framework detects whether the user is writing and tracks the fingertip over time to generate spatio-temporal trajectories that are recognised by using a Hough forest variant that encourages temporal consistency in prediction. A problem with using such forest approach for action recognition is that the learning of temporal dynamics is limited to hand-crafted temporal features and temporal regression, which may break the temporal continuity and lead to inconsistent predictions. To overcome this limitation, the thesis proposes transition forests. Besides any temporal information that is encoded in the feature space, the forest automatically learns the temporal dynamics during training, and it is exploited in inference in an online and efficient manner achieving state-of-the-art results. The last contribution of this thesis is its introduction of the first RGB-D benchmark to allow for the study of egocentric hand-object actions with both hand and object pose annotations. This study conducts an extensive evaluation of different baselines, state-of-the art approaches and temporal decision forest models using colour, depth and hand pose features. Furthermore, it extends the transition forest model to incorporate data from different modalities and demonstrates the benefit of using hand pose features to recognise egocentric human actions. The thesis concludes by discussing and analysing the contributions and proposing a few ideas for future work.Open Acces

    08291 Abstracts Collection -- Statistical and Geometrical Approaches to Visual Motion Analysis

    Get PDF
    From 13.07.2008 to 18.07.2008, the Dagstuhl Seminar 08291 ``Statistical and Geometrical Approaches to Visual Motion Analysis\u27\u27 was held in the International Conference and Research Center (IBFI), Schloss Dagstuhl. During the seminar, several participants presented their current research, and ongoing work and open problems were discussed. Abstracts of the presentations given during the seminar as well as abstracts of seminar results and ideas are put together in this paper. The first section describes the seminar topics and goals in general

    Analysis of the hands in egocentric vision: A survey

    Full text link
    Egocentric vision (a.k.a. first-person vision - FPV) applications have thrived over the past few years, thanks to the availability of affordable wearable cameras and large annotated datasets. The position of the wearable camera (usually mounted on the head) allows recording exactly what the camera wearers have in front of them, in particular hands and manipulated objects. This intrinsic advantage enables the study of the hands from multiple perspectives: localizing hands and their parts within the images; understanding what actions and activities the hands are involved in; and developing human-computer interfaces that rely on hand gestures. In this survey, we review the literature that focuses on the hands using egocentric vision, categorizing the existing approaches into: localization (where are the hands or parts of them?); interpretation (what are the hands doing?); and application (e.g., systems that used egocentric hand cues for solving a specific problem). Moreover, a list of the most prominent datasets with hand-based annotations is provided

    Real-time motion planning methods for autonomous on-road driving: state-of-the-art and future research directions

    Get PDF
    Currently autonomous or self-driving vehicles are at the heart of academia and industry research because of its multi-faceted advantages that includes improved safety, reduced congestion, lower emissions and greater mobility. Software is the key driving factor underpinning autonomy within which planning algorithms that are responsible for mission-critical decision making hold a significant position. While transporting passengers or goods from a given origin to a given destination, motion planning methods incorporate searching for a path to follow, avoiding obstacles and generating the best trajectory that ensures safety, comfort and efficiency. A range of different planning approaches have been proposed in the literature. The purpose of this paper is to review existing approaches and then compare and contrast different methods employed for the motion planning of autonomous on-road driving that consists of (1) finding a path, (2) searching for the safest manoeuvre and (3) determining the most feasible trajectory. Methods developed by researchers in each of these three levels exhibit varying levels of complexity and performance accuracy. This paper presents a critical evaluation of each of these methods, in terms of their advantages/disadvantages, inherent limitations, feasibility, optimality, handling of obstacles and testing operational environments. Based on a critical review of existing methods, research challenges to address current limitations are identified and future research directions are suggested so as to enhance the performance of planning algorithms at all three levels. Some promising areas of future focus have been identified as the use of vehicular communications (V2V and V2I) and the incorporation of transport engineering aspects in order to improve the look-ahead horizon of current sensing technologies that are essential for planning with the aim of reducing the total cost of driverless vehicles. This critical review on planning techniques presented in this paper, along with the associated discussions on their constraints and limitations, seek to assist researchers in accelerating development in the emerging field of autonomous vehicle research

    Real-time motion planning methods for autonomous on-road driving: State-of-the-art and future research directions

    Get PDF
    Open access articleCurrently autonomous or self-driving vehicles are at the heart of academia and industry research because of its multi-faceted advantages that includes improved safety, reduced congestion,lower emissions and greater mobility. Software is the key driving factor underpinning autonomy within which planning algorithms that are responsible for mission-critical decision making hold a significant position. While transporting passengers or goods from a given origin to a given destination, motion planning methods incorporate searching for a path to follow, avoiding obstacles and generating the best trajectory that ensures safety, comfort and efficiency. A range of different planning approaches have been proposed in the literature. The purpose of this paper is to review existing approaches and then compare and contrast different methods employed for the motion planning of autonomous on-road driving that consists of (1) finding a path, (2) searching for the safest manoeuvre and (3) determining the most feasible trajectory. Methods developed by researchers in each of these three levels exhibit varying levels of complexity and performance accuracy. This paper presents a critical evaluation of each of these methods, in terms of their advantages/disadvantages, inherent limitations, feasibility, optimality, handling of obstacles and testing operational environments. Based on a critical review of existing methods, research challenges to address current limitations are identified and future research directions are suggested so as to enhance the performance of planning algorithms at all three levels. Some promising areas of future focus have been identified as the use of vehicular communications (V2V and V2I) and the incorporation of transport engineering aspects in order to improve the look-ahead horizon of current sensing technologies that are essential for planning with the aim of reducing the total cost of driverless vehicles. This critical review on planning techniques presented in this paper, along with the associated discussions on their constraints and limitations, seek to assist researchers in accelerating development in the emerging field of autonomous vehicle research

    A new integrated collision risk assessment methodology for autonomous vehicles

    Get PDF
    Real-time risk assessment of autonomous driving at tactical and operational levels is extremely challenging since both contextual and circumferential factors should concurrently be considered. Recent methods have started to simultaneously treat the context of the traffic environment along with vehicle dynamics. In particular, interaction-aware motion models that take inter-vehicle dependencies into account by utilizing the Bayesian interference are employed to mutually control multiple factors. However, communications between vehicles are often assumed and the developed models are required many parameters to be tuned. Consequently, they are computationally very demanding. Even in the cases where these desiderata are fulfilled, current approaches cannot cope with a large volume of sequential data from organically changing traffic scenarios, especially in highly complex operational environments such as dense urban areas with heterogeneous road users. To overcome these limitations, this paper develops a new risk assessment methodology that integrates a network-level collision estimate with a vehicle-based risk estimate in real-time under the joint framework of interaction-aware motion models and Dynamic Bayesian Networks (DBN). Following the formulation and explanation of the required functions, machine learning classifiers were utilized for the real-time network-level collision prediction and the results were then incorporated into the integrated DBN model for predicting collision probabilities in real-time. Results indicated an enhancement of the interaction-aware model by up to 10%, when traffic conditions are deemed as collision-prone. Hence, it was concluded that a well-calibrated collision prediction classifier provides a crucial hint for better risk perception by autonomous vehicles

    The Interplay of Architecture and Correlated Variability in Neuronal Networks

    Get PDF
    This much is certain: neurons are coupled, and they exhibit covariations in their output. The extent of each does not have a single answer. Moreover, the strength of neuronal correlations, in particular, has been a subject of hot debate within the neuroscience community over the past decade, as advancing recording techniques have made available a lot of new, sometimes seemingly conflicting, datasets. The impact of connectivity and the resulting correlations on the ability of animals to perform necessary tasks is even less well understood. In order to answer relevant questions in these categories, novel approaches must be developed. This work focuses on three somewhat distinct, but inseparably coupled, crucial avenues of research within the broader field of computational neuroscience. First, there is a need for tools which can be applied, both by experimentalists and theorists, to understand how networks transform their inputs. In turn, these tools will allow neuroscientists to tease apart the structure which underlies network activity. The Generalized Thinning and Shift framework, presented in Chapter 4, addresses this need. Next, taking for granted a general understanding of network architecture as well as some grasp of the behavior of its individual units, we must be able to reverse the activity to structure relationship, and understand instead how network structure determines dynamics. We achieve this in Chapters 5 through 7 where we present an application of linear response theory yielding an explicit approximation of correlations in integrate--and--fire neuronal networks. This approximation reveals the explicit relationship between correlations, structure, and marginal dynamics. Finally, we must strive to understand the functional impact of network dynamics and architecture on the tasks that a neural network performs. This need motivates our analysis of a biophysically detailed model of the blow fly visual system in Chapter 8. Our hope is that the work presented here represents significant advances in multiple directions within the field of computational neuroscience.Mathematics, Department o

    Computational Modeling of Human Dorsal Pathway for Motion Processing

    Get PDF
    Reliable motion estimation in videos is of crucial importance for background iden- tification, object tracking, action recognition, event analysis, self-navigation, etc. Re- constructing the motion field in the 2D image plane is very challenging, due to variations in image quality, scene geometry, lighting condition, and most importantly, camera jit- tering. Traditional optical flow models assume consistent image brightness and smooth motion field, which are violated by unstable illumination and motion discontinuities that are common in real world videos. To recognize observer (or camera) motion robustly in complex, realistic scenarios, we propose a biologically-inspired motion estimation system to overcome issues posed by real world videos. The bottom-up model is inspired from the infrastructure as well as functionalities of human dorsal pathway, and the hierarchical processing stream can be divided into three stages: 1) spatio-temporal processing for local motion, 2) recogni- tion for global motion patterns (camera motion), and 3) preemptive estimation of object motion. To extract effective and meaningful motion features, we apply a series of steer- able, spatio-temporal filters to detect local motion at different speeds and directions, in a way that\u27s selective of motion velocity. The intermediate response maps are cal- ibrated and combined to estimate dense motion fields in local regions, and then, local motions along two orthogonal axes are aggregated for recognizing planar, radial and circular patterns of global motion. We evaluate the model with an extensive, realistic video database that collected by hand with a mobile device (iPad) and the video content varies in scene geometry, lighting condition, view perspective and depth. We achieved high quality result and demonstrated that this bottom-up model is capable of extracting high-level semantic knowledge regarding self motion in realistic scenes. Once the global motion is known, we segment objects from moving backgrounds by compensating for camera motion. For videos captured with non-stationary cam- eras, we consider global motion as a combination of camera motion (background) and object motion (foreground). To estimate foreground motion, we exploit corollary dis- charge mechanism of biological systems and estimate motion preemptively. Since back- ground motions for each pixel are collectively introduced by camera movements, we apply spatial-temporal averaging to estimate the background motion at pixel level, and the initial estimation of foreground motion is derived by comparing global motion and background motion at multiple spatial levels. The real frame signals are compared with those derived by forward predictions, refining estimations for object motion. This mo- tion detection system is applied to detect objects with cluttered, moving backgrounds and is proved to be efficient in locating independently moving, non-rigid regions. The core contribution of this thesis is the invention of a robust motion estimation system for complicated real world videos, with challenges by real sensor noise, complex natural scenes, variations in illumination and depth, and motion discontinuities. The overall system demonstrates biological plausibility and holds great potential for other applications, such as camera motion removal, heading estimation, obstacle avoidance, route planning, and vision-based navigational assistance, etc
    • …
    corecore