Search CORE

1 research outputs found

Recommended from our members

Data to science with AI and human-in-the-loop

Author: Perez Sarabia Gustavo
Publication venue: ScholarWorks@UMass Amherst
Publication date: 01/02/2024
Field of study

AI has the potential to accelerate scientific discovery by enabling scientists to analyze vast datasets more efficiently than traditional methods. For example, this thesis considers the detection of star clusters in high-resolution images of galaxies taken from space telescopes, as well as studying bird migration from RADAR images. In these applications, the goal is to make measurements to answer scientific questions, such as how the star formation rate is affected by mass, or how the phenology of bird migration is influenced by climate change. However, current computer vision systems are far from perfect for conducting these measurements directly. They may perform poorly when training data is limited, might introduce bias, and do not offer the statistical guarantees that scientists desire. This thesis addresses these challenges in three ways. First, we consider transfer learning to hyperspectral domains. The shape of the data, i.e., having more than three channels, restricts the use of pre-trained networks trained on color images. We design and investigate lightweight adapters that can be plugged into a pre-trained network to make it compatible with hyperspectral domains. Adapters allow for better generalization when training data is limited in various image classification tasks. Second, we explore how unlabeled data in a domain can be used to bootstrap a pre-trained network. We investigate the role of self-supervised learning in training networks for star cluster classification in astronomical images. Third, we address the scenario when a model is available but unreliable. This may be due to the task\u27s difficulty or the model being deployed on out-of-domain data where performance cannot be guaranteed. We develop human-in-the-loop techniques that incorporate human vetting of model outputs to produce estimates with statistical guarantees. We ground these approaches in applications in astronomy, ecology, and climate where data is heterogeneous and has different measurement needs. Manual measurements pose challenges due to the required domain expertise and the scale of the data being analyzed. We apply ideas from this thesis to develop StarcNet, a deep learning model capable of classifying star clusters in Hubble images. It achieves a level of human agreement comparable to existing catalogs and produces similar scientific conclusions, such as age/mass or frequency/mass distributions in galaxies with existing catalogs. In collaboration with others, we use the model to automatically analyze sources from the M101 galaxy and conduct preliminary studies on the near-infrared bands of the NGC4449 galaxy. In ecology, we study the behavior of roosting birds using weather radars. Weather radars around the globe continuously scan the airspace and are sensitive enough to detect flying animals. However, the sheer volume of data makes manual analysis impractical. We have designed an AI-assisted system capable of extracting research-grade roost annotations from radar data. This system combines ideas from adapter design to develop an accurate spatio-temporal roost detector with a human-in-the-loop vetting system that produces estimates with statistical guarantees. In collaboration with others, we use this framework to quantify long-term phenological patterns of aerial insectivores such as swallow and martin roosts. These analyses represent one of the most comprehensive long-term, broad-scale examinations of avian aerial insectivore species responding to environmental change. Lastly, we consider the estimation of damaged buildings from satellite imagery on regions struck by a natural disaster. During disaster response, aid organizations aim to quickly count damaged buildings in satellite images to plan relief missions, but pre-trained building and damage detectors often perform poorly due to domain shifts. In such cases, there is a need for human-in-the-loop approaches that can accurately count with minimal human effort. We propose techniques for counting over multiple spatial or temporal regions using a small amount of screening. We conclude by discussing how AI and humans can collaborate to tackle various measurement tasks and outlining the future challenges associated with deploying AI in scientific research

ScholarWorks@UMass Amherst