4,297 research outputs found

    Hand Action Detection from Ego-centric Depth Sequences with Error-correcting Hough Transform

    Full text link
    Detecting hand actions from ego-centric depth sequences is a practically challenging problem, owing mostly to the complex and dexterous nature of hand articulations as well as non-stationary camera motion. We address this problem via a Hough transform based approach coupled with a discriminatively learned error-correcting component to tackle the well known issue of incorrect votes from the Hough transform. In this framework, local parts vote collectively for the start &\& end positions of each action over time. We also construct an in-house annotated dataset of 300 long videos, containing 3,177 single-action subsequences over 16 action classes collected from 26 individuals. Our system is empirically evaluated on this real-life dataset for both the action recognition and detection tasks, and is shown to produce satisfactory results. To facilitate reproduction, the new dataset and our implementation are also provided online

    cvpaper.challenge in 2015 - A review of CVPR2015 and DeepSurvey

    Full text link
    The "cvpaper.challenge" is a group composed of members from AIST, Tokyo Denki Univ. (TDU), and Univ. of Tsukuba that aims to systematically summarize papers on computer vision, pattern recognition, and related fields. For this particular review, we focused on reading the ALL 602 conference papers presented at the CVPR2015, the premier annual computer vision event held in June 2015, in order to grasp the trends in the field. Further, we are proposing "DeepSurvey" as a mechanism embodying the entire process from the reading through all the papers, the generation of ideas, and to the writing of paper.Comment: Survey Pape

    A Learning-Based Visual Saliency Prediction Model for Stereoscopic 3D Video (LBVS-3D)

    Full text link
    Over the past decade, many computational saliency prediction models have been proposed for 2D images and videos. Considering that the human visual system has evolved in a natural 3D environment, it is only natural to want to design visual attention models for 3D content. Existing monocular saliency models are not able to accurately predict the attentive regions when applied to 3D image/video content, as they do not incorporate depth information. This paper explores stereoscopic video saliency prediction by exploiting both low-level attributes such as brightness, color, texture, orientation, motion, and depth, as well as high-level cues such as face, person, vehicle, animal, text, and horizon. Our model starts with a rough segmentation and quantifies several intuitive observations such as the effects of visual discomfort level, depth abruptness, motion acceleration, elements of surprise, size and compactness of the salient regions, and emphasizing only a few salient objects in a scene. A new fovea-based model of spatial distance between the image regions is adopted for considering local and global feature calculations. To efficiently fuse the conspicuity maps generated by our method to one single saliency map that is highly correlated with the eye-fixation data, a random forest based algorithm is utilized. The performance of the proposed saliency model is evaluated against the results of an eye-tracking experiment, which involved 24 subjects and an in-house database of 61 captured stereoscopic videos. Our stereo video database as well as the eye-tracking data are publicly available along with this paper. Experiment results show that the proposed saliency prediction method achieves competitive performance compared to the state-of-the-art approaches

    Monotonic Calibrated Interpolated Look-Up Tables

    Full text link
    Real-world machine learning applications may require functions that are fast-to-evaluate and interpretable. In particular, guaranteed monotonicity of the learned function can be critical to user trust. We propose meeting these goals for low-dimensional machine learning problems by learning flexible, monotonic functions using calibrated interpolated look-up tables. We extend the structural risk minimization framework of lattice regression to train monotonic look-up tables by solving a convex problem with appropriate linear inequality constraints. In addition, we propose jointly learning interpretable calibrations of each feature to normalize continuous features and handle categorical or missing data, at the cost of making the objective non-convex. We address large-scale learning through parallelization, mini-batching, and propose random sampling of additive regularizer terms. Case studies with real-world problems with five to sixteen features and thousands to millions of training samples demonstrate the proposed monotonic functions can achieve state-of-the-art accuracy on practical problems while providing greater transparency to users.Comment: To appear (with minor revisions), Journal Machine Learning Research 201

    Advances in Human Action Recognition: A Survey

    Full text link
    Human action recognition has been an important topic in computer vision due to its many applications such as video surveillance, human machine interaction and video retrieval. One core problem behind these applications is automatically recognizing low-level actions and high-level activities of interest. The former is usually the basis for the latter. This survey gives an overview of the most recent advances in human action recognition during the past several years, following a well-formed taxonomy proposed by a previous survey. From this state-of-the-art survey, researchers can view a panorama of progress in this area for future research

    Imaging and Classification Techniques for Seagrass Mapping and Monitoring: A Comprehensive Survey

    Full text link
    Monitoring underwater habitats is a vital part of observing the condition of the environment. The detection and mapping of underwater vegetation, especially seagrass has drawn the attention of the research community as early as the nineteen eighties. Initially, this monitoring relied on in situ observation by experts. Later, advances in remote-sensing technology, satellite-monitoring techniques and, digital photo- and video-based techniques opened a window to quicker, cheaper, and, potentially, more accurate seagrass-monitoring methods. So far, for seagrass detection and mapping, digital images from airborne cameras, spectral images from satellites, acoustic image data using underwater sonar technology, and digital underwater photo and video images have been used to map the seagrass meadows or monitor their condition. In this article, we have reviewed the recent approaches to seagrass detection and mapping to understand the gaps of the present approaches and determine further research scope to monitor the ocean health more easily. We have identified four classes of approach to seagrass mapping and assessment: still image-, video data-, acoustic image-, and spectral image data-based techniques. We have critically analysed the surveyed approaches and found the research gaps including the need for quick, cheap and effective imaging techniques robust to depth, turbidity, location and weather conditions, fully automated seagrass detectors that can work in real-time, accurate techniques for estimating the seagrass density, and the availability of high computation facilities for processing large scale data. For addressing these gaps, future research should focus on developing cheaper image and video data collection techniques, deep learning based automatic annotation and classification, and real-time percentage-cover calculation.Comment: 36 pages, 14 figures, 8table

    Deep Learning Convolutional Networks for Multiphoton Microscopy Vasculature Segmentation

    Full text link
    Recently there has been an increasing trend to use deep learning frameworks for both 2D consumer images and for 3D medical images. However, there has been little effort to use deep frameworks for volumetric vascular segmentation. We wanted to address this by providing a freely available dataset of 12 annotated two-photon vasculature microscopy stacks. We demonstrated the use of deep learning framework consisting both 2D and 3D convolutional filters (ConvNet). Our hybrid 2D-3D architecture produced promising segmentation result. We derived the architectures from Lee et al. who used the ZNN framework initially designed for electron microscope image segmentation. We hope that by sharing our volumetric vasculature datasets, we will inspire other researchers to experiment with vasculature dataset and improve the used network architectures.Comment: 23 pages, 10 figure

    An Iterative Spanning Forest Framework for Superpixel Segmentation

    Full text link
    Superpixel segmentation has become an important research problem in image processing. In this paper, we propose an Iterative Spanning Forest (ISF) framework, based on sequences of Image Foresting Transforms, where one can choose i) a seed sampling strategy, ii) a connectivity function, iii) an adjacency relation, and iv) a seed pixel recomputation procedure to generate improved sets of connected superpixels (supervoxels in 3D) per iteration. The superpixels in ISF structurally correspond to spanning trees rooted at those seeds. We present five ISF methods to illustrate different choices of its components. These methods are compared with approaches from the state-of-the-art in effectiveness and efficiency. The experiments involve 2D and 3D datasets with distinct characteristics, and a high level application, named sky image segmentation. The theoretical properties of ISF are demonstrated in the supplementary material and the results show that some of its methods are competitive with or superior to the best baselines in effectiveness and efficiency

    An Efficient Approach to Communication-aware Path Planning for Long-range Surveillance Missions undertaken by UAVs

    Full text link
    While using drones for remote surveillance missions, it is mandatory to do path planning of the vehicle since these are pilot-less vehicles. Path planning, whether offline or online, entails setting up the path as a sequence of locations in the 3D Euclidean space, whose coordinates happen to be latitude, longitude and altitude. For the specific application of remote surveillance of long linear infrastructures in non-urban terrain, the continuous 3D-ESP problem practically entails two important scalar costs. The first scalar cost is the distance traveled along the planned path. Since drones are battery operated, hence it is needed that the path length between fixed start and goal locations of a mission should be minimal at all costs. The other scalar cost is the cost of transmitting the acquired video during the mission of remote surveillance, via a camera mounted in the drone's belly. Because of the length of surveillance target which is long linear infrastructure, the amount of video generated is very high and cannot be generally stored in its entirety, on board. If the connectivity is poor along certain segments of a naive path, to boost video transmission rate, the transmission power of the signal is kept high, which in turn dissipates more battery energy. Hence a path is desired that simultaneously also betters what is known as communication cost. These two costs trade-off, and hence Pareto optimization is needed for this 3D bi-objective Euclidean shortest path problem. In this report, we study the mono-objective offline path planning problem, based on the distance cost, while posing the communication cost as an upper-bounded constraint. The bi-objective path planning solution is sketched out towards the end.Comment: 46 pages. One part of this thesis, handling the turn constrained route planning, has been published at ECMR'1

    Geo-Supervised Visual Depth Prediction

    Full text link
    We propose using global orientation from inertial measurements, and the bias it induces on the shape of objects populating the scene, to inform visual 3D reconstruction. We test the effect of using the resulting prior in depth prediction from a single image, where the normal vectors to surfaces of objects of certain classes tend to align with gravity or be orthogonal to it. Adding such a prior to baseline methods for monocular depth prediction yields improvements beyond the state-of-the-art and illustrates the power of gravity as a supervisory signal.Comment: ICRA 2019, RA-L 201
    • …
    corecore