81 research outputs found
Cascade R-CNN: Delving into High Quality Object Detection
In object detection, an intersection over union (IoU) threshold is required
to define positives and negatives. An object detector, trained with low IoU
threshold, e.g. 0.5, usually produces noisy detections. However, detection
performance tends to degrade with increasing the IoU thresholds. Two main
factors are responsible for this: 1) overfitting during training, due to
exponentially vanishing positive samples, and 2) inference-time mismatch
between the IoUs for which the detector is optimal and those of the input
hypotheses. A multi-stage object detection architecture, the Cascade R-CNN, is
proposed to address these problems. It consists of a sequence of detectors
trained with increasing IoU thresholds, to be sequentially more selective
against close false positives. The detectors are trained stage by stage,
leveraging the observation that the output of a detector is a good distribution
for training the next higher quality detector. The resampling of progressively
improved hypotheses guarantees that all detectors have a positive set of
examples of equivalent size, reducing the overfitting problem. The same cascade
procedure is applied at inference, enabling a closer match between the
hypotheses and the detector quality of each stage. A simple implementation of
the Cascade R-CNN is shown to surpass all single-model object detectors on the
challenging COCO dataset. Experiments also show that the Cascade R-CNN is
widely applicable across detector architectures, achieving consistent gains
independently of the baseline detector strength. The code will be made
available at https://github.com/zhaoweicai/cascade-rcnn
Recommended from our members
Towards Universal Object Detection
Object detection is one of the most important and challenging research topics in computer vision. It is playing an important role in our everyday life and has many applications, e.g. surveillance, autonomous driving, robotics, drone, medical imaging, etc. The ultimate goal of object detection is a universal object detector that can work very well in any case under any condition like human vision system. However, there are multiple challenges on the universality of object detection, e.g. scale-variance, high-quality requirement, domain shift, computational constraint, etc. These will prevent the object detector from being widely used for various scales of objects, critical applications requiring extremely accurate localization, scenarios with changing domain priors, and diverse hardware settings. To address these challenges, multiple solutions have been proposed in this thesis. These include an efficient multi-scale architecture to achieve scale-invariant detection, a robust multi-stage framework effective for high-quality requirement, a cross-domain solution to extend the universality over various domains, and a design of complexity-aware cascades and a novel low-precision network to enhance the universality under different computational constraints. All these efforts have substantially improved the universality of object detection, and the advanced object detector can be applied to broader environments
Fast catheter segmentation and tracking based on x-ray fluoroscopic and echocardiographic modalities for catheter-based cardiac minimally invasive interventions
X-ray fluoroscopy and echocardiography imaging (ultrasound, US) are two imaging modalities that are widely used in cardiac catheterization. For these modalities, a fast, accurate and stable algorithm for the detection and tracking of catheters is required to allow clinicians to observe the catheter location in real-time. Currently X-ray fluoroscopy is routinely used as the standard modality in catheter ablation interventions. However, it lacks the ability to visualize soft tissue and uses harmful radiation. US does not have these limitations but often contains acoustic artifacts and has a small field of view. These make the detection and tracking of the catheter in US very challenging.
The first contribution in this thesis is a framework which combines Kalman filter and discrete optimization for multiple catheter segmentation and tracking in X-ray images. Kalman filter is used to identify the whole catheter from a single point detected on the catheter in the first frame of a sequence of x-ray images. An energy-based formulation is developed that can be used to track the catheters in the following frames. We also propose a discrete optimization for minimizing the energy function in each frame of the X-ray image sequence. Our approach is robust to tangential motion of the catheter and combines the tubular and salient feature measurements into a single robust and efficient framework.
The second contribution is an algorithm for catheter extraction in 3D ultrasound images based on (a) the registration between the X-ray and ultrasound images and (b) the segmentation of the catheter in X-ray images. The search space for the catheter extraction in the ultrasound images is constrained to lie on or close to a curved surface in the ultrasound volume. The curved surface corresponds to the back-projection of the extracted catheter from the X-ray image to the ultrasound volume. Blob-like features are detected in the US images and organized in a graphical model. The extracted catheter is modelled as the optimal path in this graphical model.
Both contributions allow the use of ultrasound imaging for the improved visualization of soft tissue. However, X-ray imaging is still required for each ultrasound frame and the amount of X-ray exposure has not been reduced. The final contribution in this thesis is a system that can track the catheter in ultrasound volumes automatically without the need for X-ray imaging during the tracking. Instead X-ray imaging is only required for the system initialization and for recovery from tracking failures. This allows a significant reduction in the amount of X-ray exposure for patient and clinicians.Open Acces
Context-driven Object Detection and Segmentation with Auxiliary Information
One fundamental problem in computer vision and robotics is to
localize objects of interest in an image. The task can either be
formulated as an object detection problem if the objects are
described by a set of pose parameters, or an object segmentation
one if we recover object boundary precisely. A key issue in
object detection and segmentation concerns exploiting the spatial
context, as local evidence is often insufficient to determine
object pose in the presence of heavy occlusions or large object
appearance variations. This thesis addresses the object detection
and segmentation problem in such adverse conditions with
auxiliary depth data provided by RGBD cameras. We focus on four
main issues in context-aware object detection and segmentation:
1) what are the effective context representations? 2) how can we
work with limited and imperfect depth data? 3) how to design
depth-aware features and integrate depth cues into conventional
visual inference tasks? 4) how to make use of unlabeled data to
relax the labeling requirements for training data?
We discuss three object detection and segmentation scenarios
based on varying amounts of available auxiliary information. In
the first case, depth data are available for model training but
not available for testing. We propose a structured Hough voting
method for detecting objects with heavy occlusion in indoor
environments, in which we extend the Hough hypothesis space to
include both the object's location, and its visibility pattern.
We design a new score function that accumulates votes for object
detection and occlusion prediction. In addition, we explore the
correlation between objects and their environment, building a
depth-encoded object-context model based on RGBD data. In the
second case, we address the problem of localizing glass objects
with noisy and incomplete depth data. Our method integrates the
intensity and depth information from a single view point, and
builds a Markov Random Field that predicts glass boundary and
region jointly. In addition, we propose a nonparametric,
data-driven label transfer scheme for local glass boundary
estimation. A weighted voting scheme based on a joint feature
manifold is adopted to integrate depth and appearance cues, and
we learn a distance metric on the depth-encoded feature manifold.
In the third case, we make use of unlabeled data to relax the
annotation requirements for object detection and segmentation,
and propose a novel data-dependent margin distribution learning
criterion for boosting, which utilizes the intrinsic geometric
structure of datasets. One key aspect of this method is that it
can seamlessly incorporate unlabeled data by including a graph
Laplacian regularizer. We demonstrate the performance of our
models and compare with baseline methods on several real-world
object detection and segmentation tasks, including indoor object
detection, glass object segmentation and foreground segmentation
in video
Tightly-coupled manipulation pipelines: Combining traditional pipelines and end-to-end learning
Traditionally, robot manipulation tasks are solved by engineering solutions in a modular fashion --- typically consisting of object detection, pose estimation, grasp planning, motion planning, and finally run a control algorithm to execute the planned motion. This traditional approach to robot manipulation separates the hard problem of manipulation into several self-contained stages, which can be developed independently, and gives interpretable outputs at each stage of the pipeline. However, this approach comes with a plethora of issues, most notably, their generalisability to a broad range of tasks; it is common that as tasks get more difficult, the systems become increasingly complex.
To combat the flaws of these systems, recent trends have seen robots visually learning to predict actions and grasp locations directly from sensor input in an end-to-end manner using deep neural networks, without the need to explicitly model the in-between modules. This thesis investigates a sample of methods, which fall somewhere on a spectrum from pipelined to fully end-to-end, which we believe to be more advantageous for developing a general manipulation system; one that could eventually be used in highly dynamic and unpredictable household environments.
The investigation starts at the far end of the spectrum, where we explore learning an end-to-end controller in simulation and then transferring to the real world by employing domain randomisation, and finish on the other end, with a new pipeline, where the individual modules bear little resemblance to the "traditional" ones. The thesis concludes with a proposition of a new paradigm: Tightly-coupled Manipulation Pipelines (TMP). Rather than learning all modules implicitly in one large, end-to-end network or conversely, having individual, pre-defined modules that are developed independently, TMPs suggest taking the best of both world by tightly coupling actions to observations, whilst still maintaining structure via an undefined number of learned modules, which do not have to bear any resemblance to the modules seen in "traditional" systems.Open Acces
Brain and Human Body Modeling
This open access book describes modern applications of computational human modeling with specific emphasis in the areas of neurology and neuroelectromagnetics, depression and cancer treatments, radio-frequency studies and wireless communications. Special consideration is also given to the use of human modeling to the computational assessment of relevant regulatory and safety requirements. Readers working on applications that may expose human subjects to electromagnetic radiation will benefit from this book’s coverage of the latest developments in computational modelling and human phantom development to assess a given technology’s safety and efficacy in a timely manner. Describes construction and application of computational human models including anatomically detailed and subject specific models; Explains new practices in computational human modeling for neuroelectromagnetics, electromagnetic safety, and exposure evaluations; Includes a survey of modern applications for which computational human models are critical; Describes cellular-level interactions between the human body and electromagnetic fields
- …