28 research outputs found

    Capturing and viewing gigapixel images

    Get PDF
    We present a system to capture and view "Gigapixel images": very high resolution, high dynamic range, and wide angle imagery consisting of several billion pixels each. A specialized camera mount, in combination with an automated pipeline for alignment, exposure compensation, and stitching, provide the means to acquire Gigapixel images with a standard camera and lens. More importantly, our novel viewer enables exploration of such images at interactive rates over a network, while dynamically and smoothly interpolating the projection between perspective and curved projections, and simultaneously modifying the tone-mapping to ensure an optimal view of the portion of the scene being viewed.publishe

    Capturing and viewing gigapixel images

    Get PDF
    We present a system to capture and view "Gigapixel images": very high resolution, high dynamic range, and wide angle imagery consisting of several billion pixels each. A specialized camera mount, in combination with an automated pipeline for alignment, exposure compensation, and stitching, provide the means to acquire Gigapixel images with a standard camera and lens. More importantly, our novel viewer enables exploration of such images at interactive rates over a network, while dynamically and smoothly interpolating the projection between perspective and curved projections, and simultaneously modifying the tone-mapping to ensure an optimal view of the portion of the scene being viewed.publishe

    AdsorbML: A Leap in Efficiency for Adsorption Energy Calculations using Generalizable Machine Learning Potentials

    Full text link
    Computational catalysis is playing an increasingly significant role in the design of catalysts across a wide range of applications. A common task for many computational methods is the need to accurately compute the adsorption energy for an adsorbate and a catalyst surface of interest. Traditionally, the identification of low energy adsorbate-surface configurations relies on heuristic methods and researcher intuition. As the desire to perform high-throughput screening increases, it becomes challenging to use heuristics and intuition alone. In this paper, we demonstrate machine learning potentials can be leveraged to identify low energy adsorbate-surface configurations more accurately and efficiently. Our algorithm provides a spectrum of trade-offs between accuracy and efficiency, with one balanced option finding the lowest energy configuration 87.36% of the time, while achieving a 2000x speedup in computation. To standardize benchmarking, we introduce the Open Catalyst Dense dataset containing nearly 1,000 diverse surfaces and 100,000 unique configurations.Comment: 26 pages, 7 figures. Submitted to npj Computational Material

    Scale-MAE: A Scale-Aware Masked Autoencoder for Multiscale Geospatial Representation Learning

    Full text link
    Large, pretrained models are commonly finetuned with imagery that is heavily augmented to mimic different conditions and scales, with the resulting models used for various tasks with imagery from a range of spatial scales. Such models overlook scale-specific information in the data for scale-dependent domains, such as remote sensing. In this paper, we present Scale-MAE, a pretraining method that explicitly learns relationships between data at different, known scales throughout the pretraining process. Scale-MAE pretrains a network by masking an input image at a known input scale, where the area of the Earth covered by the image determines the scale of the ViT positional encoding, not the image resolution. Scale-MAE encodes the masked image with a standard ViT backbone, and then decodes the masked image through a bandpass filter to reconstruct low/high frequency images at lower/higher scales. We find that tasking the network with reconstructing both low/high frequency images leads to robust multiscale representations for remote sensing imagery. Scale-MAE achieves an average of a 2.4−5.6%2.4 - 5.6\% non-parametric kNN classification improvement across eight remote sensing datasets compared to current state-of-the-art and obtains a 0.90.9 mIoU to 1.71.7 mIoU improvement on the SpaceNet building segmentation transfer task for a range of evaluation scales

    The Open DAC 2023 Dataset and Challenges for Sorbent Discovery in Direct Air Capture

    Full text link
    New methods for carbon dioxide removal are urgently needed to combat global climate change. Direct air capture (DAC) is an emerging technology to capture carbon dioxide directly from ambient air. Metal-organic frameworks (MOFs) have been widely studied as potentially customizable adsorbents for DAC. However, discovering promising MOF sorbents for DAC is challenging because of the vast chemical space to explore and the need to understand materials as functions of humidity and temperature. We explore a computational approach benefiting from recent innovations in machine learning (ML) and present a dataset named Open DAC 2023 (ODAC23) consisting of more than 38M density functional theory (DFT) calculations on more than 8,400 MOF materials containing adsorbed CO2CO_2 and/or H2OH_2O. ODAC23 is by far the largest dataset of MOF adsorption calculations at the DFT level of accuracy currently available. In addition to probing properties of adsorbed molecules, the dataset is a rich source of information on structural relaxation of MOFs, which will be useful in many contexts beyond specific applications for DAC. A large number of MOFs with promising properties for DAC are identified directly in ODAC23. We also trained state-of-the-art ML models on this dataset to approximate calculations at the DFT level. This open-source dataset and our initial ML models will provide an important baseline for future efforts to identify MOFs for a wide range of applications, including DAC

    A Practical Stereo Depth System for Smart Glasses

    Full text link
    We present the design of a productionized end-to-end stereo depth sensing system that does pre-processing, online stereo rectification, and stereo depth estimation with a fallback to monocular depth estimation when rectification is unreliable. The output of our depth sensing system is then used in a novel view generation pipeline to create 3D computational photography effects using point-of-view images captured by smart glasses. All these steps are executed on-device on the stringent compute budget of a mobile phone, and because we expect the users can use a wide range of smartphones, our design needs to be general and cannot be dependent on a particular hardware or ML accelerator such as a smartphone GPU. Although each of these steps is well studied, a description of a practical system is still lacking. For such a system, all these steps need to work in tandem with one another and fallback gracefully on failures within the system or less than ideal input data. We show how we handle unforeseen changes to calibration, e.g., due to heat, robustly support depth estimation in the wild, and still abide by the memory and latency constraints required for a smooth user experience. We show that our trained models are fast, and run in less than 1s on a six-year-old Samsung Galaxy S8 phone's CPU. Our models generalize well to unseen data and achieve good results on Middlebury and in-the-wild images captured from the smart glasses.Comment: Accepted at CVPR202

    Capturing and Viewing Gigapixel Images

    No full text
    We present a system to capture and view “Gigapixel images”: very high resolution, high dynamic range, and wide angle imagery consisting of several billion pixels each. A specialized camera mount, in combination with an automated pipeline for alignment, exposure compensation, and stitching, provide the means to acquire Gigapixel images with a standard camera and lens. More importantly, our novel viewer enables exploration of such images at interactive rates over a network, while dynamically and smoothly interpolating the projection between perspective and curved projections, and simultaneously modifying the tone-mapping to ensure an optimal view of the portion of the scene being viewed.
    corecore