8,072 research outputs found

    Fusion of Head and Full-Body Detectors for Multi-Object Tracking

    Full text link
    In order to track all persons in a scene, the tracking-by-detection paradigm has proven to be a very effective approach. Yet, relying solely on a single detector is also a major limitation, as useful image information might be ignored. Consequently, this work demonstrates how to fuse two detectors into a tracking system. To obtain the trajectories, we propose to formulate tracking as a weighted graph labeling problem, resulting in a binary quadratic program. As such problems are NP-hard, the solution can only be approximated. Based on the Frank-Wolfe algorithm, we present a new solver that is crucial to handle such difficult problems. Evaluation on pedestrian tracking is provided for multiple scenarios, showing superior results over single detector tracking and standard QP-solvers. Finally, our tracker ranks 2nd on the MOT16 benchmark and 1st on the new MOT17 benchmark, outperforming over 90 trackers.Comment: 10 pages, 4 figures; Winner of the MOT17 challenge; CVPRW 201

    Machine Understanding of Human Behavior

    Get PDF
    A widely accepted prediction is that computing will move to the background, weaving itself into the fabric of our everyday living spaces and projecting the human user into the foreground. If this prediction is to come true, then next generation computing, which we will call human computing, should be about anticipatory user interfaces that should be human-centered, built for humans based on human models. They should transcend the traditional keyboard and mouse to include natural, human-like interactive functions including understanding and emulating certain human behaviors such as affective and social signaling. This article discusses a number of components of human behavior, how they might be integrated into computers, and how far we are from realizing the front end of human computing, that is, how far are we from enabling computers to understand human behavior

    Large-Scale Neural Systems for Vision and Cognition

    Full text link
    — Consideration of how people respond to the question What is this? has suggested new problem frontiers for pattern recognition and information fusion, as well as neural systems that embody the cognitive transformation of declarative information into relational knowledge. In contrast to traditional classification methods, which aim to find the single correct label for each exemplar (This is a car), the new approach discovers rules that embody coherent relationships among labels which would otherwise appear contradictory to a learning system (This is a car, that is a vehicle, over there is a sedan). This talk will describe how an individual who experiences exemplars in real time, with each exemplar trained on at most one category label, can autonomously discover a hierarchy of cognitive rules, thereby converting local information into global knowledge. Computational examples are based on the observation that sensors working at different times, locations, and spatial scales, and experts with different goals, languages, and situations, may produce apparently inconsistent image labels, which are reconciled by implicit underlying relationships that the network’s learning process discovers. The ARTMAP information fusion system can, moreover, integrate multiple separate knowledge hierarchies, by fusing independent domains into a unified structure. In the process, the system discovers cross-domain rules, inferring multilevel relationships among groups of output classes, without any supervised labeling of these relationships. In order to self-organize its expert system, the ARTMAP information fusion network features distributed code representations which exploit the model’s intrinsic capacity for one-to-many learning (This is a car and a vehicle and a sedan) as well as many-to-one learning (Each of those vehicles is a car). Fusion system software, testbed datasets, and articles are available from http://cns.bu.edu/techlab.Defense Advanced Research Projects Research Agency (Hewlett-Packard Company, DARPA HR0011-09-3-0001; HRL Laboratories LLC subcontract 801881-BS under prime contract HR0011-09-C-0011); Science of Learning Centers program of the National Science Foundation (SBE-0354378

    A mask-based approach for the geometric calibration of thermal-infrared cameras

    Get PDF
    Accurate and efficient thermal-infrared (IR) camera calibration is important for advancing computer vision research within the thermal modality. This paper presents an approach for geometrically calibrating individual and multiple cameras in both the thermal and visible modalities. The proposed technique can be used to correct for lens distortion and to simultaneously reference both visible and thermal-IR cameras to a single coordinate frame. The most popular existing approach for the geometric calibration of thermal cameras uses a printed chessboard heated by a flood lamp and is comparatively inaccurate and difficult to execute. Additionally, software toolkits provided for calibration either are unsuitable for this task or require substantial manual intervention. A new geometric mask with high thermal contrast and not requiring a flood lamp is presented as an alternative calibration pattern. Calibration points on the pattern are then accurately located using a clustering-based algorithm which utilizes the maximally stable extremal region detector. This algorithm is integrated into an automatic end-to-end system for calibrating single or multiple cameras. The evaluation shows that using the proposed mask achieves a mean reprojection error up to 78% lower than that using a heated chessboard. The effectiveness of the approach is further demonstrated by using it to calibrate two multiple-camera multiple-modality setups. Source code and binaries for the developed software are provided on the project Web site

    Event-based Vision: A Survey

    Get PDF
    Event cameras are bio-inspired sensors that differ from conventional frame cameras: Instead of capturing images at a fixed rate, they asynchronously measure per-pixel brightness changes, and output a stream of events that encode the time, location and sign of the brightness changes. Event cameras offer attractive properties compared to traditional cameras: high temporal resolution (in the order of microseconds), very high dynamic range (140 dB vs. 60 dB), low power consumption, and high pixel bandwidth (on the order of kHz) resulting in reduced motion blur. Hence, event cameras have a large potential for robotics and computer vision in challenging scenarios for traditional cameras, such as low-latency, high speed, and high dynamic range. However, novel methods are required to process the unconventional output of these sensors in order to unlock their potential. This paper provides a comprehensive overview of the emerging field of event-based vision, with a focus on the applications and the algorithms developed to unlock the outstanding properties of event cameras. We present event cameras from their working principle, the actual sensors that are available and the tasks that they have been used for, from low-level vision (feature detection and tracking, optic flow, etc.) to high-level vision (reconstruction, segmentation, recognition). We also discuss the techniques developed to process events, including learning-based techniques, as well as specialized processors for these novel sensors, such as spiking neural networks. Additionally, we highlight the challenges that remain to be tackled and the opportunities that lie ahead in the search for a more efficient, bio-inspired way for machines to perceive and interact with the world

    Fine-To-Coarse Global Registration of RGB-D Scans

    Full text link
    RGB-D scanning of indoor environments is important for many applications, including real estate, interior design, and virtual reality. However, it is still challenging to register RGB-D images from a hand-held camera over a long video sequence into a globally consistent 3D model. Current methods often can lose tracking or drift and thus fail to reconstruct salient structures in large environments (e.g., parallel walls in different rooms). To address this problem, we propose a "fine-to-coarse" global registration algorithm that leverages robust registrations at finer scales to seed detection and enforcement of new correspondence and structural constraints at coarser scales. To test global registration algorithms, we provide a benchmark with 10,401 manually-clicked point correspondences in 25 scenes from the SUN3D dataset. During experiments with this benchmark, we find that our fine-to-coarse algorithm registers long RGB-D sequences better than previous methods

    Towards Safe Autonomous Driving: Capture Uncertainty in the Deep Neural Network For Lidar 3D Vehicle Detection

    Full text link
    To assure that an autonomous car is driving safely on public roads, its object detection module should not only work correctly, but show its prediction confidence as well. Previous object detectors driven by deep learning do not explicitly model uncertainties in the neural network. We tackle with this problem by presenting practical methods to capture uncertainties in a 3D vehicle detector for Lidar point clouds. The proposed probabilistic detector represents reliable epistemic uncertainty and aleatoric uncertainty in classification and localization tasks. Experimental results show that the epistemic uncertainty is related to the detection accuracy, whereas the aleatoric uncertainty is influenced by vehicle distance and occlusion. The results also show that we can improve the detection performance by 1%-5% by modeling the aleatoric uncertainty.Comment: Accepted to present in the 21st IEEE International Conference on Intelligent Transportation Systems (ITSC 2018

    Tracking icebergs with time-lapse photography and sparse optical flow, LeConte Bay, Alaska, 2016–2017

    Get PDF
    We present a workflow to track icebergs in proglacial fjords using oblique time-lapse photos and the Lucas-Kanade optical flow algorithm. We employ the workflow at LeConte Bay, Alaska, where we ran five time-lapse cameras between April 2016 and September 2017, capturing more than 400 000 photos at frame rates of 0.5–4.0 min−1. Hourly to daily average velocity fields in map coordinates illustrate dynamic currents in the bay, with dominant downfjord velocities (exceeding 0.5 m s−1 intermittently) and several eddies. Comparisons with simultaneous Acoustic Doppler Current Profiler (ADCP) measurements yield best agreement for the uppermost ADCP levels (∼ 12 m and above), in line with prevalent small icebergs that trace near-surface currents. Tracking results from multiple cameras compare favorably, although cameras with lower frame rates (0.5 min−1) tend to underestimate high flow speeds. Tests to determine requisite temporal and spatial image resolution confirm the importance of high image frame rates, while spatial resolution is of secondary importance. Application of our procedure to other fjords will be successful if iceberg concentrations are high enough and if the camera frame rates are sufficiently rapid (at least 1 min−1 for conditions similar to LeConte Bay).This work was funded by the U.S. National Science Foundation (OPP-1503910, OPP-1504288, OPP-1504521 and OPP-1504191).Ye

    Robust Techniques for Feature-based Image Mosaicing

    Get PDF
    Since the last few decades, image mosaicing in real time applications has been a challenging field for image processing experts. It has wide applications in the field of video conferencing, 3D image reconstruction, satellite imaging and several medical as well as computer vision fields. It can also be used for mosaic-based localization, motion detection & tracking, augmented reality, resolution enhancement, generating large FOV etc. In this research work, feature based image mosaicing technique using image fusion have been proposed. The image mosaicing algorithms can be categorized into two broad horizons. The first is the direct method and the second one is based on image features. The direct methods need an ambient initialization whereas, Feature based methods does not require initialization during registration. The feature-based techniques are primarily followed by the four steps: feature detection, feature matching, transformation model estimation, image resampling and transformation. SIFT and SURF are such algorithms which are based on the feature detection for the accomplishment of image mosaicing, but both the algorithms has their own limitations as well as advantages according to the applications concerned. The proposed method employs this two feature based image mosaicing techniques to generate an output image that works out the limitations of the both in terms of image quality The developed robust algorithm takes care of the combined effect of rotation, illumination, noise variation and other minor variation. Initially, the input images are stitched together using the popular stitching algorithms i.e. Scale Invariant Feature Transform (SIFT) and Speeded-Up Robust Features (SURF). To extract the best features from the stitching results, the blending process is done by means of Discrete Wavelet Transform (DWT) using the maximum selection rule for both approximate as well as detail-components
    corecore