2 research outputs found
TensorBank:Tensor Lakehouse for Foundation Model Training
Storing and streaming high dimensional data for foundation model training
became a critical requirement with the rise of foundation models beyond natural
language. In this paper we introduce TensorBank, a petabyte scale tensor
lakehouse capable of streaming tensors from Cloud Object Store (COS) to GPU
memory at wire speed based on complex relational queries. We use Hierarchical
Statistical Indices (HSI) for query acceleration. Our architecture allows to
directly address tensors on block level using HTTP range reads. Once in GPU
memory, data can be transformed using PyTorch transforms. We provide a generic
PyTorch dataset type with a corresponding dataset factory translating
relational queries and requested transformations as an instance. By making use
of the HSI, irrelevant blocks can be skipped without reading them as those
indices contain statistics on their content at different hierarchical
resolution levels. This is an opinionated architecture powered by open
standards and making heavy use of open-source technology. Although, hardened
for production use using geospatial-temporal data, this architecture
generalizes to other use case like computer vision, computational neuroscience,
biological sequence analysis and more
AI Foundation Models for Weather and Climate: Applications, Design, and Implementation
Machine learning and deep learning methods have been widely explored in
understanding the chaotic behavior of the atmosphere and furthering weather
forecasting. There has been increasing interest from technology companies,
government institutions, and meteorological agencies in building digital twins
of the Earth. Recent approaches using transformers, physics-informed machine
learning, and graph neural networks have demonstrated state-of-the-art
performance on relatively narrow spatiotemporal scales and specific tasks. With
the recent success of generative artificial intelligence (AI) using pre-trained
transformers for language modeling and vision with prompt engineering and
fine-tuning, we are now moving towards generalizable AI. In particular, we are
witnessing the rise of AI foundation models that can perform competitively on
multiple domain-specific downstream tasks. Despite this progress, we are still
in the nascent stages of a generalizable AI model for global Earth system
models, regional climate models, and mesoscale weather models. Here, we review
current state-of-the-art AI approaches, primarily from transformer and operator
learning literature in the context of meteorology. We provide our perspective
on criteria for success towards a family of foundation models for nowcasting
and forecasting weather and climate predictions. We also discuss how such
models can perform competitively on downstream tasks such as downscaling
(super-resolution), identifying conditions conducive to the occurrence of
wildfires, and predicting consequential meteorological phenomena across various
spatiotemporal scales such as hurricanes and atmospheric rivers. In particular,
we examine current AI methodologies and contend they have matured enough to
design and implement a weather foundation model.Comment: 44 pages, 1 figure, updated Fig.