38,214 research outputs found
CML-MOTS: Collaborative Multi-task Learning for Multi-Object Tracking and Segmentation
The advancement of computer vision has pushed visual analysis tasks from
still images to the video domain. In recent years, video instance segmentation,
which aims to track and segment multiple objects in video frames, has drawn
much attention for its potential applications in various emerging areas such as
autonomous driving, intelligent transportation, and smart retail. In this
paper, we propose an effective framework for instance-level visual analysis on
video frames, which can simultaneously conduct object detection, instance
segmentation, and multi-object tracking. The core idea of our method is
collaborative multi-task learning which is achieved by a novel structure, named
associative connections among detection, segmentation, and tracking task heads
in an end-to-end learnable CNN. These additional connections allow information
propagation across multiple related tasks, so as to benefit these tasks
simultaneously. We evaluate the proposed method extensively on KITTI MOTS and
MOTS Challenge datasets and obtain quite encouraging results
Instance Flow Based Online Multiple Object Tracking
We present a method to perform online Multiple Object Tracking (MOT) of known
object categories in monocular video data. Current Tracking-by-Detection MOT
approaches build on top of 2D bounding box detections. In contrast, we exploit
state-of-the-art instance aware semantic segmentation techniques to compute 2D
shape representations of target objects in each frame. We predict position and
shape of segmented instances in subsequent frames by exploiting optical flow
cues. We define an affinity matrix between instances of subsequent frames which
reflects locality and visual similarity. The instance association is solved by
applying the Hungarian method. We evaluate different configurations of our
algorithm using the MOT 2D 2015 train dataset. The evaluation shows that our
tracking approach is able to track objects with high relative motions. In
addition, we provide results of our approach on the MOT 2D 2015 test set for
comparison with previous works. We achieve a MOTA score of 32.1
BURST: A Benchmark for Unifying Object Recognition, Segmentation and Tracking in Video
Multiple existing benchmarks involve tracking and segmenting objects in video
e.g., Video Object Segmentation (VOS) and Multi-Object Tracking and
Segmentation (MOTS), but there is little interaction between them due to the
use of disparate benchmark datasets and metrics (e.g. J&F, mAP, sMOTSA). As a
result, published works usually target a particular benchmark, and are not
easily comparable to each another. We believe that the development of
generalized methods that can tackle multiple tasks requires greater cohesion
among these research sub-communities. In this paper, we aim to facilitate this
by proposing BURST, a dataset which contains thousands of diverse videos with
high-quality object masks, and an associated benchmark with six tasks involving
object tracking and segmentation in video. All tasks are evaluated using the
same data and comparable metrics, which enables researchers to consider them in
unison, and hence, more effectively pool knowledge from different methods
across different tasks. Additionally, we demonstrate several baselines for all
tasks and show that approaches for one task can be applied to another with a
quantifiable and explainable performance difference. Dataset annotations and
evaluation code is available at: https://github.com/Ali2500/BURST-benchmark
ZJU ReLER Submission for EPIC-KITCHEN Challenge 2023: TREK-150 Single Object Tracking
The Associating Objects with Transformers (AOT) framework has exhibited
exceptional performance in a wide range of complex scenarios for video object
tracking and segmentation. In this study, we convert the bounding boxes to
masks in reference frames with the help of the Segment Anything Model (SAM) and
Alpha-Refine, and then propagate the masks to the current frame, transforming
the task from Video Object Tracking (VOT) to video object segmentation (VOS).
Furthermore, we introduce MSDeAOT, a variant of the AOT series that
incorporates transformers at multiple feature scales. MSDeAOT efficiently
propagates object masks from previous frames to the current frame using two
feature scales of 16 and 8. As a testament to the effectiveness of our design,
we achieved the 1st place in the EPIC-KITCHENS TREK-150 Object Tracking
Challenge.Comment: Top 1 solution for EPIC-KITCHEN Challenge 2023: TREK-150 Single
Object Tracking. arXiv admin note: text overlap with arXiv:2307.0201
ZJU ReLER Submission for EPIC-KITCHEN Challenge 2023: Semi-Supervised Video Object Segmentation
The Associating Objects with Transformers (AOT) framework has exhibited
exceptional performance in a wide range of complex scenarios for video object
segmentation. In this study, we introduce MSDeAOT, a variant of the AOT series
that incorporates transformers at multiple feature scales. Leveraging the
hierarchical Gated Propagation Module (GPM), MSDeAOT efficiently propagates
object masks from previous frames to the current frame using a feature scale
with a stride of 16. Additionally, we employ GPM in a more refined feature
scale with a stride of 8, leading to improved accuracy in detecting and
tracking small objects. Through the implementation of test-time augmentations
and model ensemble techniques, we achieve the top-ranking position in the
EPIC-KITCHEN VISOR Semi-supervised Video Object Segmentation Challenge.Comment: Top 1 solution for EPIC-KITCHEN Challenge 2023: Semi-Supervised Video
Object Segmentatio
Practical Uses of A Semi-automatic Video Object Extraction System
Object-based technology is important
for computer vision applications including gesture
understanding, image recognition, augmented reality,
etc. However, extracting the shape information of
semantic objects from video sequences is a very
difficult task, since this information is not explicitly
provided within the video data. Therefore, an
application for exttracting the semantic video object
is indispensable and important for many advanced
applications.
An algorithm for semi-automatic video object
extraction system has been developed. The performance
measures of video object extraction system;
including evaluation using ground truth and
error metric is shown, followed by some practical
uses of our video object extraction system.
The principle at the basis of semi-automatic object
extraction technique is the interaction of the user
during some stages of the segmentation process,
whereby the semantic information is provided
directly by the user. After the user provides the initial
segmentation of the semantic video objects, a
tracking mechanism follows its temporal
transformation in the subsequent frames, thus
propagating the semantic information.
Since the tracking tends to introduce boundary
errors, the semantic information can be refreshed by
the user at certain key frame locations in the video
sequence. The tracking mechanism can also operate
in forward or backward direction of the video
sequence.
The performance analysis of the results is described
using single and multiple key frames; Mean Error
and “Last_Error”, and also forward and backward
extraction. To achieve best performance, results from
forward and backward extraction can be merged
Improving Multiple Object Tracking with Optical Flow and Edge Preprocessing
In this paper, we present a new method for detecting road users in an urban
environment which leads to an improvement in multiple object tracking. Our
method takes as an input a foreground image and improves the object detection
and segmentation. This new image can be used as an input to trackers that use
foreground blobs from background subtraction. The first step is to create
foreground images for all the frames in an urban video. Then, starting from the
original blobs of the foreground image, we merge the blobs that are close to
one another and that have similar optical flow. The next step is extracting the
edges of the different objects to detect multiple objects that might be very
close (and be merged in the same blob) and to adjust the size of the original
blobs. At the same time, we use the optical flow to detect occlusion of objects
that are moving in opposite directions. Finally, we make a decision on which
information we keep in order to construct a new foreground image with blobs
that can be used for tracking. The system is validated on four videos of an
urban traffic dataset. Our method improves the recall and precision metrics for
the object detection task compared to the vanilla background subtraction method
and improves the CLEAR MOT metrics in the tracking tasks for most videos
Video Object Tracking Using Motion Estimation
Real time object tracking is considered as a critical application. Object tracking is one of the most necessary steps for surveillance, augmented reality, smart rooms and perceptual user interfaces, video compression based on object and driver assistance. While traditional methods of Segmentation using Thresholding, Background subtraction and Background estimation provide satisfactory results to detect single objects, noise is produced in case of multiple objects and in poor lighting conditions.
Using the segmentation technique we can locate a target in the current frame. By minimizing the distance or maximizing the similarity coefficient we can find out the exact location of the target in the current frame. Target localization in current frame was computationally much complex in the conventional algorithms. Searching an object in the current frame using these algorithms starts from its location of the previous frame in the basis of attraction probably the square of the target area, calculating weighted average for all iteration then comparing similarity coefficients for each new location.
To overcome these difficulties, a new method is proposed for detecting and tracking multiple moving objects on night-time lighting conditions. The method is performed by integrating both the wavelet-based contrast change detector and locally adaptive thresholding scheme. In the initial stage, to detect the potential moving objects contrast in local change over time is used. To suppress false alarms motion prediction and spatial nearest neighbour data association are used. A latest change detector mechanism is implemented to detect the changes in a video sequence and divide the sequence into scenes to be encoded independently. Using the change detector algorithm (CD), it was efficient enough to detect abrupt cuts and help divide the video file into sequences. With this we get a sufficiently good output with less noise. But in some cases noise becomes prominent. Hence, a method called correlation is used which gives the relation between two consecutive frames which have sufficient difference to be used as current and previous frame. This gives a way better result in poor light condition and multiple moving objects
Design of networked visual monitoring systems
[[abstract]]We design and implement a networked visual monitoring system for surveillance. Instead of the usual periodical monitoring, the proposed system has an auto-tracking feature which captures the important characteristics of intruders. We integrate two schemes, namely, image segmentation and histogram comparison, to accomplish auto-tracking. The developed image segmentation scheme is able to separate moving objects from the background in real time. Next, the corresponding object centroid and boundary are computed. This information is used to guide the motion of tracking camera to track the intruders and then to take a series of shots, by following a predetermined pattern. We have also developed a multiple objects tracking scheme, based on object color histogram comparison, to overcome object occlusion and disocclusion issues. The designed system can track multiple intruders or follow any particular intruder automatically. To achieve efficient transmission and storage, the captured video is compressed in the H.263 format. Query based on time as well as events are provided. Users can access the system from web browsers to view the monitoring site or manipulate the tracking camera on the Internet. These features are of importance and value to surveillance.[[notice]]補正完畢[[incitationindex]]E
- …