49 research outputs found
Modeling with the Crowd: Optimizing the Human-Machine Partnership with Zooniverse
LSST and Euclid must address the daunting challenge of analyzing the
unprecedented volumes of imaging and spectroscopic data that these
next-generation instruments will generate. A promising approach to overcoming
this challenge involves rapid, automatic image processing using appropriately
trained Deep Learning (DL) algorithms. However, reliable application of DL
requires large, accurately labeled samples of training data. Galaxy Zoo Express
(GZX) is a recent experiment that simulated using Bayesian inference to
dynamically aggregate binary responses provided by citizen scientists via the
Zooniverse crowd-sourcing platform in real time. The GZX approach enables
collaboration between human and machine classifiers and provides rapidly
generated, reliably labeled datasets, thereby enabling online training of
accurate machine classifiers. We present selected results from GZX and show how
the Bayesian aggregation engine it uses can be extended to efficiently provide
object-localization and bounding-box annotations of two-dimensional data with
quantified reliability. DL algorithms that are trained using these annotations
will facilitate numerous panchromatic data modeling tasks including
morphological classification and substructure detection in direct imaging, as
well as decontamination and emission line identification for slitless
spectroscopy. Effectively combining the speed of modern computational analyses
with the human capacity to extrapolate from few examples will be critical if
the potential of forthcoming large-scale surveys is to be realized.Comment: 5 pages, 1 figure. To appear in Proceedings of the International
Astronomical Unio
From fat droplets to floating forests: cross-domain transfer learning using a PatchGAN-based segmentation model
Many scientific domains gather sufficient labels to train machine algorithms
through human-in-the-loop techniques provided by the Zooniverse.org citizen
science platform. As the range of projects, task types and data rates increase,
acceleration of model training is of paramount concern to focus volunteer
effort where most needed. The application of Transfer Learning (TL) between
Zooniverse projects holds promise as a solution. However, understanding the
effectiveness of TL approaches that pretrain on large-scale generic image sets
vs. images with similar characteristics possibly from similar tasks is an open
challenge. We apply a generative segmentation model on two Zooniverse
project-based data sets: (1) to identify fat droplets in liver cells
(FatChecker; FC) and (2) the identification of kelp beds in satellite images
(Floating Forests; FF) through transfer learning from the first project. We
compare and contrast its performance with a TL model based on the COCO image
set, and subsequently with baseline counterparts. We find that both the FC and
COCO TL models perform better than the baseline cases when using >75% of the
original training sample size. The COCO-based TL model generally performs
better than the FC-based one, likely due to its generalized features. Our
investigations provide important insights into usage of TL approaches on
multi-domain data hosted across different Zooniverse projects, enabling future
projects to accelerate task completion.Comment: 5 pages, 4 figures, accepted for publication at the Proceedings of
the ACM/CIKM 2022 (Human-in-the-loop Data Curation Workshop
Machine learning for the Zwicky transient facility
The Zwicky Transient Facility is a large optical survey in multiple filters producing hundreds of thousands of transient alerts per night. We describe here various machine learning (ML) implementations and plans to make the maximal use of the large data set by taking advantage of the temporal nature of the data, and further combining it with other data sets. We start with the initial steps of separating bogus candidates from real ones, separating stars and galaxies, and go on to the classification of real objects into various classes. Besides the usual methods (e.g., based on features extracted from light curves) we also describe early plans for alternate methods including the use of domain adaptation, and deep learning. In a similar fashion we describe efforts to detect fast moving asteroids. We also describe the use of the Zooniverse platform for helping with classifications through the creation of training samples, and active learning. Finally we mention the synergistic aspects of ZTF and LSST from the ML perspective
Machine Learning for the Zwicky Transient Facility
The Zwicky Transient Facility is a large optical survey in multiple filters producing hundreds of thousands of transient alerts per night. We describe here various machine learning (ML) implementations and plans to make the maximal use of the large data set by taking advantage of the temporal nature of the data, and further combining it with other data sets. We start with the initial steps of separating bogus candidates from real ones, separating stars and galaxies, and go on to the classification of real objects into various classes. Besides the usual methods (e.g., based on features extracted from light curves) we also describe early plans for alternate methods including the use of domain adaptation, and deep learning. In a similar fashion we describe efforts to detect fast moving asteroids. We also describe the use of the Zooniverse platform for helping with classifications through the creation of training samples, and active learning. Finally we mention the synergistic aspects of ZTF and LSST from the ML perspective
Characterizing Novelty as a Motivator in Online Citizen Science
Citizen science projects rely on the voluntary contribution of nonscientists to take part in scientific research projects. Projects taking place exclusively over the Internet face significant challenges, chief among them is the attracting and keeping the critical mass of volunteers needed to conduct the work outlined by the science team. The extent to which platforms can design experiences that positively influence volunteers’ motivation can help address the contribution challenges. Consequently, project organizers need to develop strategies to attract new participants and keep existing ones. One strategy to encourage participation is implementing features, which re-enforce motives known to change people’s attitudes towards contributing positively. The literature in psychology noted that novelty is an attribute of objects and environments that occasion curiosity in humans leading to exploratory behaviors, e.g., prolonged engagement with the object or environment. This dissertation described the design, implementation, and evaluation of an experiment conducted in three online citizen science projects. Volunteers received novelty cues when they classified data objects that no other volunteer had previously seen. The hypothesis was that exposure to novelty cues while classifying data positively influences motivational attitudes leading to increased engagement in the classification task and increased retention. The experiments resulted in mixed results. In some projects, novelty cues were universally salient, and in other projects, novelty cues had no significant impact on volunteers’ contribution behaviors. The results, while mixed, are promising since differences in the observed behaviors arise because of individual personality differences and the unique attributes found in each project setting. This research contributes to empirically grounded studies on motivation in citizen science with analyses that produce new insights and questions into the functioning of novelty and its impact on volunteers’ behaviors
Perspectives in machine learning for wildlife conservation
Data acquisition in animal ecology is rapidly accelerating due to inexpensive
and accessible sensors such as smartphones, drones, satellites, audio recorders
and bio-logging devices. These new technologies and the data they generate hold
great potential for large-scale environmental monitoring and understanding, but
are limited by current data processing approaches which are inefficient in how
they ingest, digest, and distill data into relevant information. We argue that
machine learning, and especially deep learning approaches, can meet this
analytic challenge to enhance our understanding, monitoring capacity, and
conservation of wildlife species. Incorporating machine learning into
ecological workflows could improve inputs for population and behavior models
and eventually lead to integrated hybrid modeling tools, with ecological models
acting as constraints for machine learning models and the latter providing
data-supported insights. In essence, by combining new machine learning
approaches with ecological domain knowledge, animal ecologists can capitalize
on the abundance of data generated by modern sensor technologies in order to
reliably estimate population abundances, study animal behavior and mitigate
human/wildlife conflicts. To succeed, this approach will require close
collaboration and cross-disciplinary education between the computer science and
animal ecology communities in order to ensure the quality of machine learning
approaches and train a new generation of data scientists in ecology and
conservation
JUICINESS IN CITIZEN SCIENCE COMPUTER GAMES: ANALYSIS OF A PROTOTYPICAL GAME
Incorporating the collective problem-solving skills of non-experts could ac- celerate the advancement of scientific research. Citizen science games leverage puzzles to present computationally difficult problems to players. Such games typ- ically map the scientific problem to game mechanics and visual feed-back helps players improve their solutions. Like games for entertainment, citizen science games intend to capture and retain player attention. “Juicy” game design refers to augmented visual feedback systems that give a game personality without modi- fying fundamental game mechanics. A “juicy” game feels alive and polished. This thesis explores the use of “juicy” game design applied to the citizen science genre. We present the results of a user study in its effect on player motivation with a prototypical citizen science game inspired by clustering-based E. coli bacterial strain analysis