4,556 research outputs found
Benchmarking and Error Diagnosis in Multi-Instance Pose Estimation
We propose a new method to analyze the impact of errors in algorithms for
multi-instance pose estimation and a principled benchmark that can be used to
compare them. We define and characterize three classes of errors -
localization, scoring, and background - study how they are influenced by
instance attributes and their impact on an algorithm's performance. Our
technique is applied to compare the two leading methods for human pose
estimation on the COCO Dataset, measure the sensitivity of pose estimation with
respect to instance size, type and number of visible keypoints, clutter due to
multiple instances, and the relative score of instances. The performance of
algorithms, and the types of error they make, are highly dependent on all these
variables, but mostly on the number of keypoints and the clutter. The analysis
and software tools we propose offer a novel and insightful approach for
understanding the behavior of pose estimation algorithms and an effective
method for measuring their strengths and weaknesses.Comment: Project page available at
http://www.vision.caltech.edu/~mronchi/projects/PoseErrorDiagnosis/; Code
available at https://github.com/matteorr/coco-analyze; published at ICCV 1
MSMG-Net: Multi-scale Multi-grained Supervised Metworks for Multi-task Image Manipulation Detection and Localization
With the rapid advances of image editing techniques in recent years, image
manipulation detection has attracted considerable attention since the
increasing security risks posed by tampered images. To address these
challenges, a novel multi-scale multi-grained deep network (MSMG-Net) is
proposed to automatically identify manipulated regions. In our MSMG-Net, a
parallel multi-scale feature extraction structure is used to extract
multi-scale features. Then the multi-grained feature learning is utilized to
perceive object-level semantics relation of multi-scale features by introducing
the shunted self-attention. To fuse multi-scale multi-grained features, global
and local feature fusion block are designed for manipulated region segmentation
by a bottom-up approach and multi-level feature aggregation block is designed
for edge artifacts detection by a top-down approach. Thus, MSMG-Net can
effectively perceive the object-level semantics and encode the edge artifact.
Experimental results on five benchmark datasets justify the superior
performance of the proposed method, outperforming state-of-the-art manipulation
detection and localization methods. Extensive ablation experiments and feature
visualization demonstrate the multi-scale multi-grained learning can present
effective visual representations of manipulated regions. In addition, MSMG-Net
shows better robustness when various post-processing methods further manipulate
images
CT-Mapper: Mapping Sparse Multimodal Cellular Trajectories using a Multilayer Transportation Network
Mobile phone data have recently become an attractive source of information
about mobility behavior. Since cell phone data can be captured in a passive way
for a large user population, they can be harnessed to collect well-sampled
mobility information. In this paper, we propose CT-Mapper, an unsupervised
algorithm that enables the mapping of mobile phone traces over a multimodal
transport network. One of the main strengths of CT-Mapper is its capability to
map noisy sparse cellular multimodal trajectories over a multilayer
transportation network where the layers have different physical properties and
not only to map trajectories associated with a single layer. Such a network is
modeled by a large multilayer graph in which the nodes correspond to
metro/train stations or road intersections and edges correspond to connections
between them. The mapping problem is modeled by an unsupervised HMM where the
observations correspond to sparse user mobile trajectories and the hidden
states to the multilayer graph nodes. The HMM is unsupervised as the transition
and emission probabilities are inferred using respectively the physical
transportation properties and the information on the spatial coverage of
antenna base stations. To evaluate CT-Mapper we collected cellular traces with
their corresponding GPS trajectories for a group of volunteer users in Paris
and vicinity (France). We show that CT-Mapper is able to accurately retrieve
the real cell phone user paths despite the sparsity of the observed trace
trajectories. Furthermore our transition probability model is up to 20% more
accurate than other naive models.Comment: Under revision in Computer Communication Journa
Toward Global Localization of Unmanned Aircraft Systems using Overhead Image Registration with Deep Learning Convolutional Neural Networks
Global localization, in which an unmanned aircraft system (UAS) estimates its unknown current location without access to its take-off location or other locational data from its flight path, is a challenging problem. This research brings together aspects from the remote sensing, geoinformatics, and machine learning disciplines by framing the global localization problem as a geospatial image registration problem in which overhead aerial and satellite imagery serve as a proxy for UAS imagery. A literature review is conducted covering the use of deep learning convolutional neural networks (DLCNN) with global localization and other related geospatial imagery applications. Differences between geospatial imagery taken from the overhead perspective and terrestrial imagery are discussed, as well as difficulties in using geospatial overhead imagery for image registration due to a lack of suitable machine learning datasets. Geospatial analysis is conducted to identify suitable areas for future UAS imagery collection. One of these areas, Jerusalem northeast (JNE) is selected as the area of interest (AOI) for this research. Multi-modal, multi-temporal, and multi-resolution geospatial overhead imagery is aggregated from a variety of publicly available sources and processed to create a controlled image dataset called Jerusalem northeast rural controlled imagery (JNE RCI). JNE RCI is tested with handcrafted feature-based methods SURF and SIFT and a non-handcrafted feature-based pre-trained fine-tuned VGG-16 DLCNN on coarse-grained image registration. Both handcrafted and non-handcrafted feature based methods had difficulty with the coarse-grained registration process. The format of JNE RCI is determined to be unsuitable for the coarse-grained registration process with DLCNNs and the process to create a new supervised machine learning dataset, Jerusalem northeast machine learning (JNE ML) is covered in detail. A multi-resolution grid based approach is used, where each grid cell ID is treated as the supervised training label for that respective resolution. Pre-trained fine-tuned VGG-16 DLCNNs, two custom architecture two-channel DLCNNs, and a custom chain DLCNN are trained on JNE ML for each spatial resolution of subimages in the dataset. All DLCNNs used could more accurately coarsely register the JNE ML subimages compared to the pre-trained fine-tuned VGG-16 DLCNN on JNE RCI. This shows the process for creating JNE ML is valid and is suitable for using machine learning with the coarse-grained registration problem. All custom architecture two-channel DLCNNs and the custom chain DLCNN were able to more accurately coarsely register the JNE ML subimages compared to the fine-tuned pre-trained VGG-16 approach. Both the two-channel custom DLCNNs and the chain DLCNN were able to generalize well to new imagery that these networks had not previously trained on. Through the contributions of this research, a foundation is laid for future work to be conducted on the UAS global localization problem within the rural forested JNE AOI
- …