2,607 research outputs found

    Activity-conditioned continuous human pose estimation for performance analysis of athletes using the example of swimming

    Get PDF
    In this paper we consider the problem of human pose estimation in real-world videos of swimmers. Swimming channels allow filming swimmers simultaneously above and below the water surface with a single stationary camera. These recordings can be used to quantitatively assess the athletes' performance. The quantitative evaluation, so far, requires manual annotations of body parts in each video frame. We therefore apply the concept of CNNs in order to automatically infer the required pose information. Starting with an off-the-shelf architecture, we develop extensions to leverage activity information - in our case the swimming style of an athlete - and the continuous nature of the video recordings. Our main contributions are threefold: (a) We apply and evaluate a fine-tuned Convolutional Pose Machine architecture as a baseline in our very challenging aquatic environment and discuss its error modes, (b) we propose an extension to input swimming style information into the fully convolutional architecture and (c) modify the architecture for continuous pose estimation in videos. With these additions we achieve reliable pose estimates with up to +16% more correct body joint detections compared to the baseline architecture.Comment: 10 pages, 9 figures, accepted at WACV 201

    How scaling of the disturbance set affects robust positively invariant sets for linear systems

    Full text link
    This paper presents new results on robust positively invariant (RPI) sets for linear discrete-time systems with additive disturbances. In particular, we study how RPI sets change with scaling of the disturbance set. More precisely, we show that many properties of RPI sets crucially depend on a unique scaling factor which determines the transition from nonempty to empty RPI sets. We characterize this critical scaling factor, present an efficient algorithm for its computation, and analyze it for a number of examples from the literature

    Synchronized audio-visual frames with fractional positional encoding for transformers in video-to-text translation

    Get PDF
    Video-to-Text (VTT) is the task of automatically generating descriptions for short audio-visual video clips, which can support visually impaired people to understand scenes of a YouTube video for instance. Transformer architectures have shown great performance in both machine translation and image captioning, lacking a straightforward and reproducible application for VTT. However, there is no comprehensive study on different strategies and advice for video description generation including exploiting the accompanying audio with fully self-attentive networks. Thus, we explore promising approaches from image captioning and video processing and apply them to VTT by developing a straightforward Transformer architecture. Additionally, we present a novel way of synchronizing audio and video features in Transformers which we call Fractional Positional Encoding (FPE). We run multiple experiments on the VATEX dataset to determine a configuration applicable to unseen datasets that helps describe short video clips in natural language and improved the CIDEr and BLEU-4 scores by 37.13 and 12.83 points compared to a vanilla Transformer network and achieve state-of-the-art results on the MSR-VTT and MSVD datasets. Also, FPE helps increase the CIDEr score by a relative factor of 8.6%

    Uplift and upsample: efficient 3D human pose estimation with uplifting transformers

    Get PDF
    The state-of-the-art for monocular 3D human pose esti- mation in videos is dominated by the paradigm of 2D-to- 3D pose uplifting. While the uplifting methods themselves are rather efficient, the true computational complexity de- pends on the per-frame 2D pose estimation. In this paper, we present a Transformer-based pose uplifting scheme that can operate on temporally sparse 2D pose sequences but still produce temporally dense 3D pose estimates. We show how masked token modeling can be utilized for temporal upsampling within Transformer blocks. This allows to de- couple the sampling rate of input 2D poses and the target frame rate of the video and drastically decreases the total computational complexity. Additionally, we explore the op- tion of pre-training on large motion capture archives, which has been largely neglected so far. We evaluate our method on two popular benchmark datasets: Human3.6M and MPI- INF-3DHP. With an MPJPE of 45.0 mm and 46.9 mm, re- spectively, our proposed method can compete with the state- of-the-art while reducing inference time by a factor of 12. This enables real-time throughput with variable consumer hardware in stationary and mobile applications. We re- lease our code and models at https://github.com/ goldbricklemon/uplift-upsample-3dhp

    Revenue Management and Demand Fulfillment: Matching Applications, Models, and Software

    Get PDF
    Recent years have seen great successes of revenue management, notably in the airline, hotel, and car rental business. Currently, an increasing number of industries, including manufacturers and retailers, are exploring ways to adopt similar concepts. Software companies are taking an active role in promoting the broadening range of applications. Also technological advances, including smart shelves and radio frequency identification (RFID), are removing many of the barriers to extended revenue management. The rapid developments in Supply Chain Planning and Revenue Management software solutions, scientific models, and industry applications have created a complex picture, which appears not yet to be well understood. It is not evident which scientific models fit which industry applications and which aspects are still missing. The relation between available software solutions and applications as well as scientific models appears equally unclear. The goal of this paper is to help overcome this confusion. To this end, we structure and review three dimensions, namely applications, models, and software. Subsequently, we relate these dimensions to each other and highlight commonalities and discrepancies. This comparison also provides a basis for identifying future research needs

    A Stochastic Dynamic Programming Approach to Revenue Management in a Make-to-Stock Production System

    Get PDF
    In this paper, we consider a make-to-stock production system with known exogenous replenishments and multiple customer classes. The objective is to maximize profit over the planning horizon by deciding whether to accept or reject a given order, in anticipation of more profitable future orders. What distinguishes this setup from classical airline revenue management problems is the explicit consideration of past and future replenishments and the integration of inventory holding and backlogging costs. If stock is on-hand, orders can be fulfilled immediately, backlogged or rejected. In shortage situations, orders can be either rejected or backlogged to be fulfilled from future arriving supply. The described decision problem occurs in many practical settings, notably in make-to-stock production systems, in which production planning is performed on a mid-term level, based on aggregated demand forecasts. In the short term, acceptance decisions about incoming orders are then made according to stock on-hand and scheduled production quantities. We model this problem as a stochastic dynamic program and characterize its optimal policy. It turns out that the optimal fulfillment policy has a relatively simple structure and is easy to implement. We evaluate this policy numerically and find that it systematically outperforms common current fulfillment policies, such as first-come-first-served and deterministic optimization

    NISTT: A Non-Intrusive SystemC-TLM 2.0 Tracing Tool

    Full text link
    The increasing complexity of systems-on-a-chip requires the continuous development of electronic design automation tools. Nowadays, the simulation of systems-on-a-chip using virtual platforms is common. Virtual platforms enable hardware/software co-design to shorten the time to market, offer insights into the models, and allow debugging of the simulated hardware. Profiling tools are required to improve the usability of virtual platforms. During simulation, these tools capture data that are evaluated afterward. Those data can reveal information about the simulation itself and the software executed on the platform. This work presents the tracing tool NISTT that can profile SystemC-TLM-2.0-based virtual platforms. NISTT is implemented in a completely non-intrusive way. That means no changes in the simulation are needed, the source code of the simulation is not required, and the traced simulation does not need to contain debug symbols. The standardized SystemC application programming interface guarantees the compatibility of NISTT with other simulations. The strengths of NISTT are demonstrated in a case study. Here, NISTT is connected to a virtual platform and traces the boot process of Linux. After the simulation, the database created by NISTT is evaluated, and the results are visualized. Furthermore, the overhead of NISTT is quantified. It is shown that NISTT has only a minor influence on the overall simulation performance.Comment: PREPRINT - accepted by 30th IFIP/IEEE International Conference on Very Large Scale Integration 2022 (VLSI-SoC 2022
    corecore