5,222 research outputs found
Analysis of single‑ and dual‑dictionary strategies in pedestrian classifcation
Sparse coding has recently been a hot topic in visual tasks in image processing and computer vision. It has applications
and brings benefts in reconstruction-like tasks and in classifcation-like tasks as well. However, regarding binary classifcation
problems, there are several choices to learn and use dictionaries that have not been studied. In particular, how
single-dictionary and dual-dictionary approaches compare in terms of classifcation performance is largely unexplored. We
compare three single-dictionary strategies and two dual-dictionary strategies for the problem of pedestrian classifcation
(“pedestrian” vs “background” images). In each of these fve cases, images are represented as the sparse coefcients induced
from the respective dictionaries, and these coefcients are the input to a regular classifer both for training and subsequent
classifcation of novel unseen instances. Experimental results with the INRIA pedestrian dataset suggest, on the one hand,
that dictionaries learned from only one of the classes, even from the background class, are enough for obtaining competitive
good classifcation performance. On the other hand, while better performance is generally obtained when instances of both
classes are used for dictionary learning, the representation induced by a single dictionary learned from a set of instances
from both classes provides comparable or even superior performance over the representations induced by two dictionaries
learned separately from the pedestrian and background classes
Dynamic Patch-aware Enrichment Transformer for Occluded Person Re-Identification
Person re-identification (re-ID) continues to pose a significant challenge,
particularly in scenarios involving occlusions. Prior approaches aimed at
tackling occlusions have predominantly focused on aligning physical body
features through the utilization of external semantic cues. However, these
methods tend to be intricate and susceptible to noise. To address the
aforementioned challenges, we present an innovative end-to-end solution known
as the Dynamic Patch-aware Enrichment Transformer (DPEFormer). This model
effectively distinguishes human body information from occlusions automatically
and dynamically, eliminating the need for external detectors or precise image
alignment. Specifically, we introduce a dynamic patch token selection module
(DPSM). DPSM utilizes a label-guided proxy token as an intermediary to identify
informative occlusion-free tokens. These tokens are then selected for deriving
subsequent local part features. To facilitate the seamless integration of
global classification features with the finely detailed local features selected
by DPSM, we introduce a novel feature blending module (FBM). FBM enhances
feature representation through the complementary nature of information and the
exploitation of part diversity. Furthermore, to ensure that DPSM and the entire
DPEFormer can effectively learn with only identity labels, we also propose a
Realistic Occlusion Augmentation (ROA) strategy. This strategy leverages the
recent advances in the Segment Anything Model (SAM). As a result, it generates
occlusion images that closely resemble real-world occlusions, greatly enhancing
the subsequent contrastive learning process. Experiments on occluded and
holistic re-ID benchmarks signify a substantial advancement of DPEFormer over
existing state-of-the-art approaches. The code will be made publicly available.Comment: 12 pages, 6 figure
Built environment assessment: Multidisciplinary perspectives.
Context:As obesity has become increasingly widespread, scientists seek better ways to assess and modify built and social environments to positively impact health. The applicable methods and concepts draw on multiple disciplines and require collaboration and cross-learning. This paper describes the results of an expert team׳s analysis of how key disciplinary perspectives contribute to environmental context-based assessment related to obesity, identifies gaps, and suggests opportunities to encourage effective advances in this arena. Evidence acquisition:A team of experts representing diverse disciplines convened in 2013 to discuss the contributions of their respective disciplines to assessing built environments relevant to obesity prevention. The disciplines include urban planning, public health nutrition, exercise science, physical activity research, public health and epidemiology, behavioral and social sciences, and economics. Each expert identified key concepts and measures from their discipline, and applications to built environment assessment and action. A selective review of published literature and internet-based information was conducted in 2013 and 2014. Evidence synthesis:The key points that are highlighted in this article were identified in 2014-2015 through discussion, debate and consensus-building among the team of experts. Results focus on the various disciplines׳ perspectives and tools, recommendations, progress and gaps. Conclusions:There has been significant progress in collaboration across key disciplines that contribute to studies of built environments and obesity, but important gaps remain. Using lessons from interprofessional education and team science, along with appreciation of and attention to other disciplines׳ contributions, can promote more effective cross-disciplinary collaboration in obesity prevention
Multimodal Data at Signalized Intersections: Strategies for Archiving Existing and New Data Streams to Support Operations and Planning & Fusion and Integration of Arterial Performance Data
There is a growing interest in arterial system management due to the increasing amount of travel on arterials and a growing emphasis on multimodal transportation. The benefits of archiving arterial-related data are numerous. This research report describes our efforts to assemble and develop a multimodal archive for the Portland-Vancouver region. There is coverage of data sources from all modes in the metropolitan region; however, the preliminary nature of the archiving process means that some of the data are incomplete and samples. The arterial data sources available in the Portland-Vancouver region and that are covered in this report include data for various local agencies (City of Portland, Clark County, WA, TriMet and C-TRAN) covering vehicle, transit, pedestrian, and bicycle modes. We provide detailed descriptions of each data source and a spatial and temporal classification. The report describes the conceptual framework for an archive and the data collection and archival process, including the process for extracting the data from the agency systems and transferring these data to our multimodal database. Data can be made more useful though the use of improved visualization techniques. Thus as part of the project, a number of novel, online visualizations were created and implemented. These graphs and displays are summarized in this report and example visualizations are shown. As with any automated sensor system, data quality and completeness is an important issue and the challenge of automating data quality is large. Preliminary efforts to validate and monitor data quality and automate data quality processing are explored. Finally, the report presents efforts to combine transit and travel time data and signal timing and vehicle count data to generate some sample congestion measures
Unsupervised Learning of Long-Term Motion Dynamics for Videos
We present an unsupervised representation learning approach that compactly
encodes the motion dependencies in videos. Given a pair of images from a video
clip, our framework learns to predict the long-term 3D motions. To reduce the
complexity of the learning framework, we propose to describe the motion as a
sequence of atomic 3D flows computed with RGB-D modality. We use a Recurrent
Neural Network based Encoder-Decoder framework to predict these sequences of
flows. We argue that in order for the decoder to reconstruct these sequences,
the encoder must learn a robust video representation that captures long-term
motion dependencies and spatial-temporal relations. We demonstrate the
effectiveness of our learned temporal representations on activity
classification across multiple modalities and datasets such as NTU RGB+D and
MSR Daily Activity 3D. Our framework is generic to any input modality, i.e.,
RGB, Depth, and RGB-D videos.Comment: CVPR 201
- …