49 research outputs found

    Modeling with the Crowd: Optimizing the Human-Machine Partnership with Zooniverse

    Full text link
    LSST and Euclid must address the daunting challenge of analyzing the unprecedented volumes of imaging and spectroscopic data that these next-generation instruments will generate. A promising approach to overcoming this challenge involves rapid, automatic image processing using appropriately trained Deep Learning (DL) algorithms. However, reliable application of DL requires large, accurately labeled samples of training data. Galaxy Zoo Express (GZX) is a recent experiment that simulated using Bayesian inference to dynamically aggregate binary responses provided by citizen scientists via the Zooniverse crowd-sourcing platform in real time. The GZX approach enables collaboration between human and machine classifiers and provides rapidly generated, reliably labeled datasets, thereby enabling online training of accurate machine classifiers. We present selected results from GZX and show how the Bayesian aggregation engine it uses can be extended to efficiently provide object-localization and bounding-box annotations of two-dimensional data with quantified reliability. DL algorithms that are trained using these annotations will facilitate numerous panchromatic data modeling tasks including morphological classification and substructure detection in direct imaging, as well as decontamination and emission line identification for slitless spectroscopy. Effectively combining the speed of modern computational analyses with the human capacity to extrapolate from few examples will be critical if the potential of forthcoming large-scale surveys is to be realized.Comment: 5 pages, 1 figure. To appear in Proceedings of the International Astronomical Unio

    From fat droplets to floating forests: cross-domain transfer learning using a PatchGAN-based segmentation model

    Full text link
    Many scientific domains gather sufficient labels to train machine algorithms through human-in-the-loop techniques provided by the Zooniverse.org citizen science platform. As the range of projects, task types and data rates increase, acceleration of model training is of paramount concern to focus volunteer effort where most needed. The application of Transfer Learning (TL) between Zooniverse projects holds promise as a solution. However, understanding the effectiveness of TL approaches that pretrain on large-scale generic image sets vs. images with similar characteristics possibly from similar tasks is an open challenge. We apply a generative segmentation model on two Zooniverse project-based data sets: (1) to identify fat droplets in liver cells (FatChecker; FC) and (2) the identification of kelp beds in satellite images (Floating Forests; FF) through transfer learning from the first project. We compare and contrast its performance with a TL model based on the COCO image set, and subsequently with baseline counterparts. We find that both the FC and COCO TL models perform better than the baseline cases when using >75% of the original training sample size. The COCO-based TL model generally performs better than the FC-based one, likely due to its generalized features. Our investigations provide important insights into usage of TL approaches on multi-domain data hosted across different Zooniverse projects, enabling future projects to accelerate task completion.Comment: 5 pages, 4 figures, accepted for publication at the Proceedings of the ACM/CIKM 2022 (Human-in-the-loop Data Curation Workshop

    Machine learning for the Zwicky transient facility

    Get PDF
    The Zwicky Transient Facility is a large optical survey in multiple filters producing hundreds of thousands of transient alerts per night. We describe here various machine learning (ML) implementations and plans to make the maximal use of the large data set by taking advantage of the temporal nature of the data, and further combining it with other data sets. We start with the initial steps of separating bogus candidates from real ones, separating stars and galaxies, and go on to the classification of real objects into various classes. Besides the usual methods (e.g., based on features extracted from light curves) we also describe early plans for alternate methods including the use of domain adaptation, and deep learning. In a similar fashion we describe efforts to detect fast moving asteroids. We also describe the use of the Zooniverse platform for helping with classifications through the creation of training samples, and active learning. Finally we mention the synergistic aspects of ZTF and LSST from the ML perspective

    Machine Learning for the Zwicky Transient Facility

    Get PDF
    The Zwicky Transient Facility is a large optical survey in multiple filters producing hundreds of thousands of transient alerts per night. We describe here various machine learning (ML) implementations and plans to make the maximal use of the large data set by taking advantage of the temporal nature of the data, and further combining it with other data sets. We start with the initial steps of separating bogus candidates from real ones, separating stars and galaxies, and go on to the classification of real objects into various classes. Besides the usual methods (e.g., based on features extracted from light curves) we also describe early plans for alternate methods including the use of domain adaptation, and deep learning. In a similar fashion we describe efforts to detect fast moving asteroids. We also describe the use of the Zooniverse platform for helping with classifications through the creation of training samples, and active learning. Finally we mention the synergistic aspects of ZTF and LSST from the ML perspective

    Characterizing Novelty as a Motivator in Online Citizen Science

    Get PDF
    Citizen science projects rely on the voluntary contribution of nonscientists to take part in scientific research projects. Projects taking place exclusively over the Internet face significant challenges, chief among them is the attracting and keeping the critical mass of volunteers needed to conduct the work outlined by the science team. The extent to which platforms can design experiences that positively influence volunteers’ motivation can help address the contribution challenges. Consequently, project organizers need to develop strategies to attract new participants and keep existing ones. One strategy to encourage participation is implementing features, which re-enforce motives known to change people’s attitudes towards contributing positively. The literature in psychology noted that novelty is an attribute of objects and environments that occasion curiosity in humans leading to exploratory behaviors, e.g., prolonged engagement with the object or environment. This dissertation described the design, implementation, and evaluation of an experiment conducted in three online citizen science projects. Volunteers received novelty cues when they classified data objects that no other volunteer had previously seen. The hypothesis was that exposure to novelty cues while classifying data positively influences motivational attitudes leading to increased engagement in the classification task and increased retention. The experiments resulted in mixed results. In some projects, novelty cues were universally salient, and in other projects, novelty cues had no significant impact on volunteers’ contribution behaviors. The results, while mixed, are promising since differences in the observed behaviors arise because of individual personality differences and the unique attributes found in each project setting. This research contributes to empirically grounded studies on motivation in citizen science with analyses that produce new insights and questions into the functioning of novelty and its impact on volunteers’ behaviors

    Perspectives in machine learning for wildlife conservation

    Get PDF
    Data acquisition in animal ecology is rapidly accelerating due to inexpensive and accessible sensors such as smartphones, drones, satellites, audio recorders and bio-logging devices. These new technologies and the data they generate hold great potential for large-scale environmental monitoring and understanding, but are limited by current data processing approaches which are inefficient in how they ingest, digest, and distill data into relevant information. We argue that machine learning, and especially deep learning approaches, can meet this analytic challenge to enhance our understanding, monitoring capacity, and conservation of wildlife species. Incorporating machine learning into ecological workflows could improve inputs for population and behavior models and eventually lead to integrated hybrid modeling tools, with ecological models acting as constraints for machine learning models and the latter providing data-supported insights. In essence, by combining new machine learning approaches with ecological domain knowledge, animal ecologists can capitalize on the abundance of data generated by modern sensor technologies in order to reliably estimate population abundances, study animal behavior and mitigate human/wildlife conflicts. To succeed, this approach will require close collaboration and cross-disciplinary education between the computer science and animal ecology communities in order to ensure the quality of machine learning approaches and train a new generation of data scientists in ecology and conservation

    JUICINESS IN CITIZEN SCIENCE COMPUTER GAMES: ANALYSIS OF A PROTOTYPICAL GAME

    Get PDF
    Incorporating the collective problem-solving skills of non-experts could ac- celerate the advancement of scientific research. Citizen science games leverage puzzles to present computationally difficult problems to players. Such games typ- ically map the scientific problem to game mechanics and visual feed-back helps players improve their solutions. Like games for entertainment, citizen science games intend to capture and retain player attention. “Juicy” game design refers to augmented visual feedback systems that give a game personality without modi- fying fundamental game mechanics. A “juicy” game feels alive and polished. This thesis explores the use of “juicy” game design applied to the citizen science genre. We present the results of a user study in its effect on player motivation with a prototypical citizen science game inspired by clustering-based E. coli bacterial strain analysis
    corecore