2,344 research outputs found

    Robust Subspace Estimation Using Low-rank Optimization. Theory And Applications In Scene Reconstruction, Video Denoising, And Activity Recognition.

    Get PDF
    In this dissertation, we discuss the problem of robust linear subspace estimation using low-rank optimization and propose three formulations of it. We demonstrate how these formulations can be used to solve fundamental computer vision problems, and provide superior performance in terms of accuracy and running time. Consider a set of observations extracted from images (such as pixel gray values, local features, trajectories . . . etc). If the assumption that these observations are drawn from a liner subspace (or can be linearly approximated) is valid, then the goal is to represent each observation as a linear combination of a compact basis, while maintaining a minimal reconstruction error. One of the earliest, yet most popular, approaches to achieve that is Principal Component Analysis (PCA). However, PCA can only handle Gaussian noise, and thus suffers when the observations are contaminated with gross and sparse outliers. To this end, in this dissertation, we focus on estimating the subspace robustly using low-rank optimization, where the sparse outliers are detected and separated through the `1 norm. The robust estimation has a two-fold advantage: First, the obtained basis better represents the actual subspace because it does not include contributions from the outliers. Second, the detected outliers are often of a specific interest in many applications, as we will show throughout this thesis. We demonstrate four different formulations and applications for low-rank optimization. First, we consider the problem of reconstructing an underwater sequence by removing the iii turbulence caused by the water waves. The main drawback of most previous attempts to tackle this problem is that they heavily depend on modelling the waves, which in fact is ill-posed since the actual behavior of the waves along with the imaging process are complicated and include several noise components; therefore, their results are not satisfactory. In contrast, we propose a novel approach which outperforms the state-of-the-art. The intuition behind our method is that in a sequence where the water is static, the frames would be linearly correlated. Therefore, in the presence of water waves, we may consider the frames as noisy observations drawn from a the subspace of linearly correlated frames. However, the noise introduced by the water waves is not sparse, and thus cannot directly be detected using low-rank optimization. Therefore, we propose a data-driven two-stage approach, where the first stage “sparsifies” the noise, and the second stage detects it. The first stage leverages the temporal mean of the sequence to overcome the structured turbulence of the waves through an iterative registration algorithm. The result of the first stage is a high quality mean and a better structured sequence; however, the sequence still contains unstructured sparse noise. Thus, we employ a second stage at which we extract the sparse errors from the sequence through rank minimization. Our method converges faster, and drastically outperforms state of the art on all testing sequences. Secondly, we consider a closely related situation where an independently moving object is also present in the turbulent video. More precisely, we consider video sequences acquired in a desert battlefields, where atmospheric turbulence is typically present, in addition to independently moving targets. Typical approaches for turbulence mitigation follow averaging or de-warping techniques. Although these methods can reduce the turbulence, they distort the independently moving objects which can often be of great interest. Therefore, we address the iv problem of simultaneous turbulence mitigation and moving object detection. We propose a novel three-term low-rank matrix decomposition approach in which we decompose the turbulence sequence into three components: the background, the turbulence, and the object. We simplify this extremely difficult problem into a minimization of nuclear norm, Frobenius norm, and `1 norm. Our method is based on two observations: First, the turbulence causes dense and Gaussian noise, and therefore can be captured by Frobenius norm, while the moving objects are sparse and thus can be captured by `1 norm. Second, since the object’s motion is linear and intrinsically different than the Gaussian-like turbulence, a Gaussian-based turbulence model can be employed to enforce an additional constraint on the search space of the minimization. We demonstrate the robustness of our approach on challenging sequences which are significantly distorted with atmospheric turbulence and include extremely tiny moving objects. In addition to robustly detecting the subspace of the frames of a sequence, we consider using trajectories as observations in the low-rank optimization framework. In particular, in videos acquired by moving cameras, we track all the pixels in the video and use that to estimate the camera motion subspace. This is particularly useful in activity recognition, which typically requires standard preprocessing steps such as motion compensation, moving object detection, and object tracking. The errors from the motion compensation step propagate to the object detection stage, resulting in miss-detections, which further complicates the tracking stage, resulting in cluttered and incorrect tracks. In contrast, we propose a novel approach which does not follow the standard steps, and accordingly avoids the aforementioned diffi- culties. Our approach is based on Lagrangian particle trajectories which are a set of dense trajectories obtained by advecting optical flow over time, thus capturing the ensemble motions v of a scene. This is done in frames of unaligned video, and no object detection is required. In order to handle the moving camera, we decompose the trajectories into their camera-induced and object-induced components. Having obtained the relevant object motion trajectories, we compute a compact set of chaotic invariant features, which captures the characteristics of the trajectories. Consequently, a SVM is employed to learn and recognize the human actions using the computed motion features. We performed intensive experiments on multiple benchmark datasets, and obtained promising results. Finally, we consider a more challenging problem referred to as complex event recognition, where the activities of interest are complex and unconstrained. This problem typically pose significant challenges because it involves videos of highly variable content, noise, length, frame size . . . etc. In this extremely challenging task, high-level features have recently shown a promising direction as in [53, 129], where core low-level events referred to as concepts are annotated and modelled using a portion of the training data, then each event is described using its content of these concepts. However, because of the complex nature of the videos, both the concept models and the corresponding high-level features are significantly noisy. In order to address this problem, we propose a novel low-rank formulation, which combines the precisely annotated videos used to train the concepts, with the rich high-level features. Our approach finds a new representation for each event, which is not only low-rank, but also constrained to adhere to the concept annotation, thus suppressing the noise, and maintaining a consistent occurrence of the concepts in each event. Extensive experiments on large scale real world dataset TRECVID Multimedia Event Detection 2011 and 2012 demonstrate that our approach consistently improves the discriminativity of the high-level features by a significant margin

    Open-access quantitative MRI data of the spinal cord and reproducibility across participants, sites and manufacturers

    Get PDF
    Databases; Imaging techniques; Spinal cord diseasesBases de datos; Tècnicas de imagen; Enfermedades de la médula espinalBases de dades; Tècniques d'imatge; Malalties de la medul·la espinalIn a companion paper by Cohen-Adad et al. we introduce the spine generic quantitative MRI protocol that provides valuable metrics for assessing spinal cord macrostructural and microstructural integrity. This protocol was used to acquire a single subject dataset across 19 centers and a multi-subject dataset across 42 centers (for a total of 260 participants), spanning the three main MRI manufacturers: GE, Philips and Siemens. Both datasets are publicly available via git-annex. Data were analysed using the Spinal Cord Toolbox to produce normative values as well as inter/intra-site and inter/intra-manufacturer statistics. Reproducibility for the spine generic protocol was high across sites and manufacturers, with an average inter-site coefficient of variation of less than 5% for all the metrics. Full documentation and results can be found at https://spine-generic.rtfd.io/. The datasets and analysis pipeline will help pave the way towards accessible and reproducible quantitative MRI in the spinal cord

    Open-access quantitative MRI data of the spinal cord and reproducibility across participants, sites and manufacturers

    Get PDF
    In a companion paper by Cohen-Adad et al. we introduce the spine generic quantitative MRI protocol that provides valuable metrics for assessing spinal cord macrostructural and microstructural integrity. This protocol was used to acquire a single subject dataset across 19 centers and a multi-subject dataset across 42 centers (for a total of 260 participants), spanning the three main MRI manufacturers: GE, Philips and Siemens. Both datasets are publicly available via git-annex. Data were analysed using the Spinal Cord Toolbox to produce normative values as well as inter/intra-site and inter/intra-manufacturer statistics. Reproducibility for the spine generic protocol was high across sites and manufacturers, with an average inter-site coefficient of variation of less than 5% for all the metrics. Full documentation and results can be found at https://spine-generic.rtfd.io/. The datasets and analysis pipeline will help pave the way towards accessible and reproducible quantitative MRI in the spinal cord

    Generic acquisition protocol for quantitative MRI of the spinal cord

    Get PDF
    Quantitative spinal cord (SC) magnetic resonance imaging (MRI) presents many challenges, including a lack of standardized imaging protocols. Here we present a prospectively harmonized quantitative MRI protocol, which we refer to as the spine generic protocol, for users of 3T MRI systems from the three main manufacturers: GE, Philips and Siemens. The protocol provides guidance for assessing SC macrostructural and microstructural integrity: T1-weighted and T2-weighted imaging for SC cross-sectional area computation, multi-echo gradient echo for gray matter cross-sectional area, and magnetization transfer and diffusion weighted imaging for assessing white matter microstructure. In a companion paper from the same authors, the spine generic protocol was used to acquire data across 42 centers in 260 healthy subjects. The key details of the spine generic protocol are also available in an open-access document that can be found at https://github.com/spine-generic/protocols. The protocol will serve as a starting point for researchers and clinicians implementing new SC imaging initiatives so that, in the future, inclusion of the SC in neuroimaging protocols will be more common. The protocol could be implemented by any trained MR technician or by a researcher/clinician familiar with MRI acquisition

    Cloud point labelling in optical motion capture systems

    Get PDF
    109 p.This Thesis deals with the task of point labeling involved in the overall workflow of Optical Motion Capture Systems. Human motion capture by optical sensors produces at each frame snapshots of the motion as a cloud of points that need to be labeled in order to carry out ensuing motion analysis. The problem of labeling is tackled as a classification problem, using machine learning techniques as AdaBoost or Genetic Search to train a set of weak classifiers, gathered in turn in an ensemble of partial solvers. The result is used to feed an online algorithm able to provide a marker labeling at a target detection accuracy at a reduced computational cost. On the other hand, in contrast to other approaches the use of misleading temporal correlations has been discarded, strengthening the process against failure due to occasional labeling errors. The effectiveness of the approach is demonstrated on a real dataset obtained from the measurement of gait motion of persons, for which the ground truth labeling has been verified manually. In addition to the above, a broad sight regarding the field of Motion Capture and its optical branch is provided to the reader: description, composition, state of the art and related work. Shall it serve as suitable framework to highlight the importance and ease the understanding of the point labeling

    Time-Resolved Quantification of Centrosomes by Automated Image Analysis Suggests Limiting Component to Set Centrosome Size in C. Elegans Embryos

    Get PDF
    The centrosome is a dynamic organelle found in all animal cells that serves as a microtubule organizing center during cell division. Most of the centrosome components have been identified by genetic screens over the last decade, but little is known about how these components interact with each other to form a functional centrosome. Towards a better understanding of the molecular organization of the centrosome, we investigated the mechanism that regulates the size of the centrosome in the early C. elegans embryo. For this, we monitored fluorescently labeled centrosomes in living embryos and developed a suite of image analysis algorithms to quantify the centrosomes in the resulting 3D time-lapse images. In particular, we developed a novel algorithm involving a two-stage linking process for tracking entrosomes, which is a multi-object tracking task. This fully automated analysis pipeline enabled us to acquire time-resolved data of centrosome growth in a large number of embryos and could detect subtle phenotypes that were missed by previous assays based on manual image analysis. In a first set of experiments, we quantified centrosome size over development in wild-type embryos and made three essential observations. First, centrosome volume scales proportionately with cell volume. Second, beginning at the 4-cell stage, when cells are small, centrosome size plateaus during the cell cycle. Third, the total centrosome volume the embryo gives rise to in any one cell stage is approximately constant. Based on our observations, we propose a ‘limiting component’ model in which centrosome size is limited by the amounts of maternally derived centrosome components. In a second set of experiments, we tested our hypothesis by varying cell size, centrosome number and microtubule-mediated pulling forces. We then manipulated the amounts of several centrosomal proteins and found that the conserved centriolar and pericentriolar material protein SPD-2 is one such component that determines centrosome size

    Vision-Based 2D and 3D Human Activity Recognition

    Get PDF

    Visual Recognition and Synthesis of Human-Object Interactions

    Full text link
    The ability to perceive and understand people's actions enables humans to efficiently communicate and collaborate in society. Endowing machines with such ability is an important step for building assistive and socially-aware robots. Despite such significance, the problem poses a great challenge and the current state of the art is still nowhere close to human-level performance. This dissertation drives progress on visual action understanding in the scope of human-object interactions (HOI), a major branch of human actions that dominates our everyday life. Specifically, we address the challenges of two important tasks: visual recognition and visual synthesis. The first part of this dissertation considers the recognition task. The main bottleneck of current research is a lack of proper benchmark, since existing action datasets contain only a small number of categories with limited diversity. To this end, we set out to construct a large-scale benchmark for HOI recognition. We first tackle the problem of establishing the vocabulary for human-object interactions, by investigating a variety of automatic approaches as well as a crowdsourcing approach that collects human labeled categories. Given the vocabulary, we then construct a large-scale image dataset of human-object interactions by annotating web images through online crowdsourcing. The new "HICO" dataset surpasses prior datasets in term of both the number of images and action categories by one order of magnitude. The introduction of HICO enables us to benchmark state-of-the-art recognition approaches and also shed light on new challenges in the realm of large-scale HOI recognition. We further discover that visual features of humans, objects, as well as their spatial relations play a central role in the representation of interaction, and the combination of three can improve the recognition outcome. The second part of this dissertation considers the synthesis task, and focuses particularly on the synthesis of body motion. The central goal is: given an image of a scene, synthesize the course of an action conditioned on the observed scene. Such capability can predict possible actions afforded by the scene, and will facilitate efficient reactions in human-robot interactions. We investigate two types of synthesis tasks: semantic-driven synthesis and goal-driven synthesis. For semantic-driven synthesis, we study the forecasting of human dynamics from a static image. We propose a novel deep neural network architecture that extracts semantic information from the image and use it to predict future body movement. For goal-directed synthesis, we study the synthesis of motion defined by human-object interactions. We focus on one particular class of interactions—a person sitting onto a chair. To ensure realistic motion from physical interactions, we leverage a physics simulated environment that contains a humanoid and chair model. We propose a novel reinforcement learning framework, and show that the synthesized motion can generalize to different initial human-chair configurations. At the end of this dissertation, we also contribute a new approach to temporal action localization, an essential task in video action understanding. We address the shortcomings of prior Faster R-CNN based approaches, and show state-of-the-art performance on standard benchmarks.PHDComputer Science & EngineeringUniversity of Michigan, Horace H. Rackham School of Graduate Studieshttps://deepblue.lib.umich.edu/bitstream/2027.42/150045/1/ywchao_1.pd
    corecore