22 research outputs found

    Semantic Visual Localization

    Full text link
    Robust visual localization under a wide range of viewing conditions is a fundamental problem in computer vision. Handling the difficult cases of this problem is not only very challenging but also of high practical relevance, e.g., in the context of life-long localization for augmented reality or autonomous robots. In this paper, we propose a novel approach based on a joint 3D geometric and semantic understanding of the world, enabling it to succeed under conditions where previous approaches failed. Our method leverages a novel generative model for descriptor learning, trained on semantic scene completion as an auxiliary task. The resulting 3D descriptors are robust to missing observations by encoding high-level 3D geometric and semantic information. Experiments on several challenging large-scale localization datasets demonstrate reliable localization under extreme viewpoint, illumination, and geometry changes

    ALSTER: A Local Spatio-Temporal Expert for Online 3D Semantic Reconstruction

    Full text link
    We propose an online 3D semantic segmentation method that incrementally reconstructs a 3D semantic map from a stream of RGB-D frames. Unlike offline methods, ours is directly applicable to scenarios with real-time constraints, such as robotics or mixed reality. To overcome the inherent challenges of online methods, we make two main contributions. First, to effectively extract information from the input RGB-D video stream, we jointly estimate geometry and semantic labels per frame in 3D. A key focus of our approach is to reason about semantic entities both in the 2D input and the local 3D domain to leverage differences in spatial context and network architectures. Our method predicts 2D features using an off-the-shelf segmentation network. The extracted 2D features are refined by a lightweight 3D network to enable reasoning about the local 3D structure. Second, to efficiently deal with an infinite stream of input RGB-D frames, a subsequent network serves as a temporal expert predicting the incremental scene updates by leveraging 2D, 3D, and past information in a learned manner. These updates are then integrated into a global scene representation. Using these main contributions, our method can enable scenarios with real-time constraints and can scale to arbitrary scene sizes by processing and updating the scene only in a local region defined by the new measurement. Our experiments demonstrate improved results compared to existing online methods that purely operate in local regions and show that complementary sources of information can boost the performance. We provide a thorough ablation study on the benefits of different architectural as well as algorithmic design decisions. Our method yields competitive results on the popular ScanNet benchmark and SceneNN dataset

    RGB2LIDAR: Towards Solving Large-Scale Cross-Modal Visual Localization

    Full text link
    We study an important, yet largely unexplored problem of large-scale cross-modal visual localization by matching ground RGB images to a geo-referenced aerial LIDAR 3D point cloud (rendered as depth images). Prior works were demonstrated on small datasets and did not lend themselves to scaling up for large-scale applications. To enable large-scale evaluation, we introduce a new dataset containing over 550K pairs (covering 143 km^2 area) of RGB and aerial LIDAR depth images. We propose a novel joint embedding based method that effectively combines the appearance and semantic cues from both modalities to handle drastic cross-modal variations. Experiments on the proposed dataset show that our model achieves a strong result of a median rank of 5 in matching across a large test set of 50K location pairs collected from a 14km^2 area. This represents a significant advancement over prior works in performance and scale. We conclude with qualitative results to highlight the challenging nature of this task and the benefits of the proposed model. Our work provides a foundation for further research in cross-modal visual localization.Comment: ACM Multimedia 202

    Clinical and virological characteristics of hospitalised COVID-19 patients in a German tertiary care centre during the first wave of the SARS-CoV-2 pandemic: a prospective observational study

    Get PDF
    Purpose: Adequate patient allocation is pivotal for optimal resource management in strained healthcare systems, and requires detailed knowledge of clinical and virological disease trajectories. The purpose of this work was to identify risk factors associated with need for invasive mechanical ventilation (IMV), to analyse viral kinetics in patients with and without IMV and to provide a comprehensive description of clinical course. Methods: A cohort of 168 hospitalised adult COVID-19 patients enrolled in a prospective observational study at a large European tertiary care centre was analysed. Results: Forty-four per cent (71/161) of patients required invasive mechanical ventilation (IMV). Shorter duration of symptoms before admission (aOR 1.22 per day less, 95% CI 1.10-1.37, p < 0.01) and history of hypertension (aOR 5.55, 95% CI 2.00-16.82, p < 0.01) were associated with need for IMV. Patients on IMV had higher maximal concentrations, slower decline rates, and longer shedding of SARS-CoV-2 than non-IMV patients (33 days, IQR 26-46.75, vs 18 days, IQR 16-46.75, respectively, p < 0.01). Median duration of hospitalisation was 9 days (IQR 6-15.5) for non-IMV and 49.5 days (IQR 36.8-82.5) for IMV patients. Conclusions: Our results indicate a short duration of symptoms before admission as a risk factor for severe disease that merits further investigation and different viral load kinetics in severely affected patients. Median duration of hospitalisation of IMV patients was longer than described for acute respiratory distress syndrome unrelated to COVID-19
    corecore