5,222 research outputs found

    Analysis of single‑ and dual‑dictionary strategies in pedestrian classifcation

    Get PDF
    Sparse coding has recently been a hot topic in visual tasks in image processing and computer vision. It has applications and brings benefts in reconstruction-like tasks and in classifcation-like tasks as well. However, regarding binary classifcation problems, there are several choices to learn and use dictionaries that have not been studied. In particular, how single-dictionary and dual-dictionary approaches compare in terms of classifcation performance is largely unexplored. We compare three single-dictionary strategies and two dual-dictionary strategies for the problem of pedestrian classifcation (“pedestrian” vs “background” images). In each of these fve cases, images are represented as the sparse coefcients induced from the respective dictionaries, and these coefcients are the input to a regular classifer both for training and subsequent classifcation of novel unseen instances. Experimental results with the INRIA pedestrian dataset suggest, on the one hand, that dictionaries learned from only one of the classes, even from the background class, are enough for obtaining competitive good classifcation performance. On the other hand, while better performance is generally obtained when instances of both classes are used for dictionary learning, the representation induced by a single dictionary learned from a set of instances from both classes provides comparable or even superior performance over the representations induced by two dictionaries learned separately from the pedestrian and background classes

    Dynamic Patch-aware Enrichment Transformer for Occluded Person Re-Identification

    Full text link
    Person re-identification (re-ID) continues to pose a significant challenge, particularly in scenarios involving occlusions. Prior approaches aimed at tackling occlusions have predominantly focused on aligning physical body features through the utilization of external semantic cues. However, these methods tend to be intricate and susceptible to noise. To address the aforementioned challenges, we present an innovative end-to-end solution known as the Dynamic Patch-aware Enrichment Transformer (DPEFormer). This model effectively distinguishes human body information from occlusions automatically and dynamically, eliminating the need for external detectors or precise image alignment. Specifically, we introduce a dynamic patch token selection module (DPSM). DPSM utilizes a label-guided proxy token as an intermediary to identify informative occlusion-free tokens. These tokens are then selected for deriving subsequent local part features. To facilitate the seamless integration of global classification features with the finely detailed local features selected by DPSM, we introduce a novel feature blending module (FBM). FBM enhances feature representation through the complementary nature of information and the exploitation of part diversity. Furthermore, to ensure that DPSM and the entire DPEFormer can effectively learn with only identity labels, we also propose a Realistic Occlusion Augmentation (ROA) strategy. This strategy leverages the recent advances in the Segment Anything Model (SAM). As a result, it generates occlusion images that closely resemble real-world occlusions, greatly enhancing the subsequent contrastive learning process. Experiments on occluded and holistic re-ID benchmarks signify a substantial advancement of DPEFormer over existing state-of-the-art approaches. The code will be made publicly available.Comment: 12 pages, 6 figure

    Built environment assessment: Multidisciplinary perspectives.

    Get PDF
    Context:As obesity has become increasingly widespread, scientists seek better ways to assess and modify built and social environments to positively impact health. The applicable methods and concepts draw on multiple disciplines and require collaboration and cross-learning. This paper describes the results of an expert team׳s analysis of how key disciplinary perspectives contribute to environmental context-based assessment related to obesity, identifies gaps, and suggests opportunities to encourage effective advances in this arena. Evidence acquisition:A team of experts representing diverse disciplines convened in 2013 to discuss the contributions of their respective disciplines to assessing built environments relevant to obesity prevention. The disciplines include urban planning, public health nutrition, exercise science, physical activity research, public health and epidemiology, behavioral and social sciences, and economics. Each expert identified key concepts and measures from their discipline, and applications to built environment assessment and action. A selective review of published literature and internet-based information was conducted in 2013 and 2014. Evidence synthesis:The key points that are highlighted in this article were identified in 2014-2015 through discussion, debate and consensus-building among the team of experts. Results focus on the various disciplines׳ perspectives and tools, recommendations, progress and gaps. Conclusions:There has been significant progress in collaboration across key disciplines that contribute to studies of built environments and obesity, but important gaps remain. Using lessons from interprofessional education and team science, along with appreciation of and attention to other disciplines׳ contributions, can promote more effective cross-disciplinary collaboration in obesity prevention

    Multimodal Data at Signalized Intersections: Strategies for Archiving Existing and New Data Streams to Support Operations and Planning & Fusion and Integration of Arterial Performance Data

    Get PDF
    There is a growing interest in arterial system management due to the increasing amount of travel on arterials and a growing emphasis on multimodal transportation. The benefits of archiving arterial-related data are numerous. This research report describes our efforts to assemble and develop a multimodal archive for the Portland-Vancouver region. There is coverage of data sources from all modes in the metropolitan region; however, the preliminary nature of the archiving process means that some of the data are incomplete and samples. The arterial data sources available in the Portland-Vancouver region and that are covered in this report include data for various local agencies (City of Portland, Clark County, WA, TriMet and C-TRAN) covering vehicle, transit, pedestrian, and bicycle modes. We provide detailed descriptions of each data source and a spatial and temporal classification. The report describes the conceptual framework for an archive and the data collection and archival process, including the process for extracting the data from the agency systems and transferring these data to our multimodal database. Data can be made more useful though the use of improved visualization techniques. Thus as part of the project, a number of novel, online visualizations were created and implemented. These graphs and displays are summarized in this report and example visualizations are shown. As with any automated sensor system, data quality and completeness is an important issue and the challenge of automating data quality is large. Preliminary efforts to validate and monitor data quality and automate data quality processing are explored. Finally, the report presents efforts to combine transit and travel time data and signal timing and vehicle count data to generate some sample congestion measures

    Unsupervised Learning of Long-Term Motion Dynamics for Videos

    Get PDF
    We present an unsupervised representation learning approach that compactly encodes the motion dependencies in videos. Given a pair of images from a video clip, our framework learns to predict the long-term 3D motions. To reduce the complexity of the learning framework, we propose to describe the motion as a sequence of atomic 3D flows computed with RGB-D modality. We use a Recurrent Neural Network based Encoder-Decoder framework to predict these sequences of flows. We argue that in order for the decoder to reconstruct these sequences, the encoder must learn a robust video representation that captures long-term motion dependencies and spatial-temporal relations. We demonstrate the effectiveness of our learned temporal representations on activity classification across multiple modalities and datasets such as NTU RGB+D and MSR Daily Activity 3D. Our framework is generic to any input modality, i.e., RGB, Depth, and RGB-D videos.Comment: CVPR 201
    • …
    corecore