4,136 research outputs found
Data-Centric Foundation Models in Computational Healthcare: A Survey
The advent of foundation models (FMs) as an emerging suite of AI techniques
has struck a wave of opportunities in computational healthcare. The interactive
nature of these models, guided by pre-training data and human instructions, has
ignited a data-centric AI paradigm that emphasizes better data
characterization, quality, and scale. In healthcare AI, obtaining and
processing high-quality clinical data records has been a longstanding
challenge, ranging from data quantity, annotation, patient privacy, and ethics.
In this survey, we investigate a wide range of data-centric approaches in the
FM era (from model pre-training to inference) towards improving the healthcare
workflow. We discuss key perspectives in AI security, assessment, and alignment
with human values. Finally, we offer a promising outlook of FM-based analytics
to enhance the performance of patient outcome and clinical workflow in the
evolving landscape of healthcare and medicine. We provide an up-to-date list of
healthcare-related foundation models and datasets at
https://github.com/Yunkun-Zhang/Data-Centric-FM-Healthcare
AI-assisted Automated Workflow for Real-time X-ray Ptychography Data Analysis via Federated Resources
We present an end-to-end automated workflow that uses large-scale remote
compute resources and an embedded GPU platform at the edge to enable
AI/ML-accelerated real-time analysis of data collected for x-ray ptychography.
Ptychography is a lensless method that is being used to image samples through a
simultaneous numerical inversion of a large number of diffraction patterns from
adjacent overlapping scan positions. This acquisition method can enable
nanoscale imaging with x-rays and electrons, but this often requires very large
experimental datasets and commensurately high turnaround times, which can limit
experimental capabilities such as real-time experimental steering and
low-latency monitoring. In this work, we introduce a software system that can
automate ptychography data analysis tasks. We accelerate the data analysis
pipeline by using a modified version of PtychoNN -- an ML-based approach to
solve phase retrieval problem that shows two orders of magnitude speedup
compared to traditional iterative methods. Further, our system coordinates and
overlaps different data analysis tasks to minimize synchronization overhead
between different stages of the workflow. We evaluate our workflow system with
real-world experimental workloads from the 26ID beamline at Advanced Photon
Source and ThetaGPU cluster at Argonne Leadership Computing Resources.Comment: 7 pages, 1 figure, to be published in High Performance Computing for
Imaging Conference, Electronic Imaging (HPCI 2023
Using visual analytics to develop situation awareness in astrophysics
We present a novel collaborative visual analytics application for cognitively overloaded users in the astrophysics domain. The system was developed for scientists who need to analyze heterogeneous, complex data under time pressure, and make predictions and time-critical decisions rapidly and correctly under a constant influx of changing data. The Sunfall Data Taking system utilizes several novel visualization and analysis techniques to enable a team of geographically distributed domain specialists to effectively and remotely maneuver a custom-built instrument under challenging operational conditions. Sunfall Data Taking has been in production use for 2 years by a major international astrophysics collaboration (the largest data volume supernova search currently in operation), and has substantially improved the operational efficiency of its users. We describe the system design process by an interdisciplinary team, the system architecture and the results of an informal usability evaluation of the production system by domain experts in the context of Endsley's three levels of situation awareness
A Taxonomy of Data Grids for Distributed Data Sharing, Management and Processing
Data Grids have been adopted as the platform for scientific communities that
need to share, access, transport, process and manage large data collections
distributed worldwide. They combine high-end computing technologies with
high-performance networking and wide-area storage management techniques. In
this paper, we discuss the key concepts behind Data Grids and compare them with
other data sharing and distribution paradigms such as content delivery
networks, peer-to-peer networks and distributed databases. We then provide
comprehensive taxonomies that cover various aspects of architecture, data
transportation, data replication and resource allocation and scheduling.
Finally, we map the proposed taxonomy to various Data Grid systems not only to
validate the taxonomy but also to identify areas for future exploration.
Through this taxonomy, we aim to categorise existing systems to better
understand their goals and their methodology. This would help evaluate their
applicability for solving similar problems. This taxonomy also provides a "gap
analysis" of this area through which researchers can potentially identify new
issues for investigation. Finally, we hope that the proposed taxonomy and
mapping also helps to provide an easy way for new practitioners to understand
this complex area of research.Comment: 46 pages, 16 figures, Technical Repor
- …