42 research outputs found
Exploring Generalisability of Self-Distillation with No Labels for SAR-Based Vegetation Prediction
In this work we pre-train a DINO-ViT based model using two Synthetic Aperture
Radar datasets (S1GRD or GSSIC) across three regions (China, Conus, Europe). We
fine-tune the models on smaller labeled datasets to predict vegetation
percentage, and empirically study the connection between the embedding space of
the models and their ability to generalize across diverse geographic regions
and to unseen data. For S1GRD, embedding spaces of different regions are
clearly separated, while GSSIC's overlaps. Positional patterns remain during
fine-tuning, and greater distances in embeddings often result in higher errors
for unfamiliar regions. With this, our work increases our understanding of
generalizability for self-supervised models applied to remote sensing.Comment: 10 pages, 9 figure
Fewshot learning on global multimodal embeddings for earth observation tasks
In this work we pretrain a CLIP/ViT based model using three different
modalities of satellite imagery across five AOIs covering over ~10\% of Earth's
total landmass, namely Sentinel 2 RGB optical imagery, Sentinel 1 SAR radar
amplitude and interferometric coherence. This model uses M
parameters. Then, we use the embeddings produced for each modality with a
classical machine learning method to attempt different downstream tasks for
earth observation related to vegetation, built up surface, croplands and
permanent water. We consistently show how we reduce the need for labeled data
by 99\%, so that with ~200-500 randomly selected labeled examples (around
4K-10K km) we reach performance levels analogous to those achieved with the
full labeled datasets (about 150K image chips or 3M km in each area of
interest - AOI) on all modalities, AOIs and downstream tasks. This leads us to
think that the model has captured significant earth features useful in a wide
variety of scenarios. To enhance our model's usability in practice, its
architecture allows inference in contexts with missing modalities and even
missing channels within each modality. Additionally, we visually show that this
embedding space, obtained with no labels, is sensible to the different earth
features represented by the labelled datasets we selected.Comment: 9 pages, 6 figures, presented on NeurIPS workshop on Robustness of
Few-shot and Zero-shot Learning in Foundation Model
Large Scale Masked Autoencoding for Reducing Label Requirements on SAR Data
Satellite-based remote sensing is instrumental in the monitoring and
mitigation of the effects of anthropogenic climate change. Large scale, high
resolution data derived from these sensors can be used to inform intervention
and policy decision making, but the timeliness and accuracy of these
interventions is limited by use of optical data, which cannot operate at night
and is affected by adverse weather conditions. Synthetic Aperture Radar (SAR)
offers a robust alternative to optical data, but its associated complexities
limit the scope of labelled data generation for traditional deep learning. In
this work, we apply a self-supervised pretraining scheme, masked autoencoding,
to SAR amplitude data covering 8.7\% of the Earth's land surface area, and tune
the pretrained weights on two downstream tasks crucial to monitoring climate
change - vegetation cover prediction and land cover classification. We show
that the use of this pretraining scheme reduces labelling requirements for the
downstream tasks by more than an order of magnitude, and that this pretraining
generalises geographically, with the performance gain increasing when tuned
downstream on regions outside the pretraining set. Our findings significantly
advance climate change mitigation by facilitating the development of task and
region-specific SAR models, allowing local communities and organizations to
deploy tailored solutions for rapid, accurate monitoring of climate change
effects.Comment: 12 pages, 6 figure
Exploring DINO: Emergent Properties and Limitations for Synthetic Aperture Radar Imagery
Self-supervised learning (SSL) models have recently demonstrated remarkable
performance across various tasks, including image segmentation. This study
delves into the emergent characteristics of the Self-Distillation with No
Labels (DINO) algorithm and its application to Synthetic Aperture Radar (SAR)
imagery. We pre-train a vision transformer (ViT)-based DINO model using
unlabeled SAR data, and later fine-tune the model to predict high-resolution
land cover maps. We rigorously evaluate the utility of attention maps generated
by the ViT backbone and compare them with the model's token embedding space. We
observe a small improvement in model performance with pre-training compared to
training from scratch and discuss the limitations and opportunities of SSL for
remote sensing and land cover segmentation. Beyond small performance increases,
we show that ViT attention maps hold great intrinsic value for remote sensing,
and could provide useful inputs to other algorithms. With this, our work lays
the groundwork for bigger and better SSL models for Earth Observation.Comment: 9 pages, 5 figure
Probabilistic Super-Resolution of Solar Magnetograms: Generating Many Explanations and Measuring Uncertainties
Machine learning techniques have been successfully applied to super-resolution tasks on natural images where visually pleasing results are sufficient. However in many scientific domains this is not adequate and estimations of errors and uncertainties are crucial. To address this issue we propose a Bayesian framework that decomposes uncertainties into epistemic and aleatoric uncertainties. We test the validity of our approach by super-resolving images of the Sun's magnetic field and by generating maps measuring the range of possible high resolution explanations compatible with a given low resolution magnetogram
Single-Frame Super-Resolution of Solar Magnetograms: Investigating Physics-Based Metrics \& Losses
Breakthroughs in our understanding of physical phenomena have traditionally followed improvements in instrumentation. Studies of the magnetic field of the Sun, and its influence on the solar dynamo and space weather events, have benefited from improvements in resolution and measurement frequency of new instruments. However, in order to fully understand the solar cycle, high-quality data across time-scales longer than the typical lifespan of a solar instrument are required. At the moment, discrepancies between measurement surveys prevent the combined use of all available data. In this work, we show that machine learning can help bridge the gap between measurement surveys by learning to \textbf{super-resolve} low-resolution magnetic field images and \textbf{translate} between characteristics of contemporary instruments in orbit. We also introduce the notion of physics-based metrics and losses for super-resolution to preserve underlying physics and constrain the solution space of possible super-resolution outputs
A Recurrent Mutation in KCNA2 as a Novel Cause of Hereditary Spastic Paraplegia and Ataxia
The hereditary spastic paraplegias (HSPs) are heterogeneous neurodegenerative disorders with over 50 known causative genes. We identified a recurrent mutation in KCNA2 (c.881G>A, p.R294H), encoding the voltage-gated K+-channel, K(V)1.2, in two unrelated families with HSP, intellectual disability (ID), and ataxia. Follow-up analysis of >2,000 patients with various neurological phenotypes identified a de novo p.R294H mutation in a proband with ataxia and ID. Two-electrode voltage-clamp recordings of Xenopus laevis oocytes expressing mutant KV1.2 channels showed loss of function with a dominant-negative effect. Our findings highlight the phenotypic spectrum of a recurrent KCNA2 mutation, implicating ion channel dysfunction as a novel HSP disease mechanism.Peer reviewe
SLCO5A1 and synaptic assembly genes contribute to impulsivity in juvenile myoclonic epilepsy
Elevated impulsivity is a key component of attention-deficit hyperactivity disorder (ADHD), bipolar disorder and juvenile myoclonic epilepsy (JME). We performed a genome-wide association, colocalization, polygenic risk score, and pathway analysis of impulsivity in JME (n = 381). Results were followed up with functional characterisation using a drosophila model. We identified genome-wide associated SNPs at 8q13.3 (P = 7.5 × 10−9) and 10p11.21 (P = 3.6 × 10−8). The 8q13.3 locus colocalizes with SLCO5A1 expression quantitative trait loci in cerebral cortex (P = 9.5 × 10−3). SLCO5A1 codes for an organic anion transporter and upregulates synapse assembly/organisation genes. Pathway analysis demonstrates 12.7-fold enrichment for presynaptic membrane assembly genes (P = 0.0005) and 14.3-fold enrichment for presynaptic organisation genes (P = 0.0005) including NLGN1 and PTPRD. RNAi knockdown of Oatp30B, the Drosophila polypeptide with the highest homology to SLCO5A1, causes over-reactive startling behaviour (P = 8.7 × 10−3) and increased seizure-like events (P = 6.8 × 10−7). Polygenic risk score for ADHD genetically correlates with impulsivity scores in JME (P = 1.60 × 10−3). SLCO5A1 loss-of-function represents an impulsivity and seizure mechanism. Synaptic assembly genes may inform the aetiology of impulsivity in health and disease