3 research outputs found
The tidyomics ecosystem: Enhancing omic data analyses
The growth of omic data presents evolving challenges in data manipulation, analysis, and integration. Addressing these challenges, Bioconductor1 provides an extensive community-driven biological data analysis platform. Meanwhile, tidy R programming2 offers a revolutionary standard for data organisation and manipulation. Here, we present the tidyomics software ecosystem, bridging Bioconductor to the tidy R paradigm. This ecosystem aims to streamline omic analysis, ease learning, and encourage cross-disciplinary collaborations. We demonstrate the effectiveness of tidyomics by analysing 7.5 million peripheral blood mononuclear cells from the Human Cell Atlas3, spanning six data frameworks and ten analysis tools.Competing Interest StatementR.G. has received consulting income from Takeda and Sanofi, and declares ownership in Ozette Technologies. M.K. is an employee of and declares ownership in Achilles Therapeutics. ​​The remaining authors declare no competing interests
Computation Foundations of Spatial Transcriptomics
Single-cell and spatial transcriptomics have come of age in the past few years; datasets and data analysis software packages have proliferated. With the increasing sizes of datasets, proliferating new data collection technologies, and mainstreaming of high-throughput technologies, the software can be improved for better speed and memory efficiency, standardized and consistent user interface for multiple technologies, and in documentation to onboard new users. First, I collected a database of spatial transcriptomics literature and analyzed the data on trends and sociology in this field. Based on the database and data analyses, I wrote a comprehensive book both qualitatively and quantitatively documenting the history of the field since the 1960s and reviewing more recent developments, which informed the software and methods I later developed. Then, to address the challenges with the pre-processing large datasets, we developed \texttt{kallisto} \texttt{bustools} for fast and modular pseudoalignment of sequencing reads to the transcriptome in single-cell RNA-seq (scRNA-seq), giving consistent results with the established and much more computationally demanding alignment method Cell Ranger. Briefly summarized are my attempt to map dissociated cells in scRNA-seq to a spatial gene expression reference and to build a image processing pipeline for image based spatial transcriptomics data analysis. Finally, to address the challenges in downstream analyses of spatial -omics data, I first wrote the new \texttt{SpatialFeatureExperiment} (SFE) data structure to represent and operate on geometries in spatial transcriptomics data and to organize results from spatial analyses. Based on SFE, I wrote Voyager, which brings decades of research in geospatial data analysis to spatial transcriptomics, to better utilize the opportunities from spatial information to gain novel biological insights. To reduce user learning curve, Voyager conforms to SCE styles and conventions and has a comprehensive documentation website and consistent user interface to many geospatial methods.</p