Search CORE

24 research outputs found

Quantifying the tradeoff between sequencing depth and cell number in single-cell RNA-seq

Author: da Veiga Beltrame Eduardo
Pachter Lior
Svensson Valentine
Publication venue
Publication date: 09/09/2019
Field of study

The allocation of a sequencing budget when designing single cell RNA-seq experiments requires consideration of the tradeoff between number of cells sequenced and the read depth per cell. One approach to the problem is to perform a power analysis for a univariate objective such as differential expression. However, many of the goals of single-cell analysis requires consideration of the multivariate structure of gene expression, such as clustering. We introduce an approach to quantifying the impact of sequencing depth and cell number on the estimation of a multivariate generative model for gene expression that is based on error analysis in the framework of a variational autoencoder. We find that at shallow depths, the marginal benefit of deeper sequencing per cell significantly outweighs the benefit of increased cell numbers. Above about 15,000 reads per cell the benefit of increased sequencing depth is minor. Code for the workflow reproducing the results of the paper is available at https://github.com/pachterlab/SBP_2019/

A curated database reveals trends in single cell transcriptomics

Author: da Veiga Beltrame Eduardo
Pachter Lior
Svensson Valentine
Publication venue: 'Oxford University Press (OUP)'
Publication date: 28/11/2020
Field of study

The more than 1000 single-cell transcriptomics studies that have been published to date constitute a valuable and vast resource for biological discovery. While various ‘atlas’ projects have collated some of the associated datasets, most questions related to specific tissue types, species or other attributes of studies require identifying papers through manual and challenging literature search. To facilitate discovery with published single-cell transcriptomics data, we have assembled a near exhaustive, manually curated database of single-cell transcriptomics studies with key information: descriptions of the type of data and technologies used, along with descriptors of the biological systems studied. Additionally, the database contains summarized information about analysis in the papers, allowing for analysis of trends in the field. As an example, we show that the number of cell types identified in scRNA-seq studies is proportional to the number of cells analysed

Principles of open source bioinstrumentation applied to the poseidon syringe pump system

Author: Bannon Dylan
Booeshaghi A. Sina
da Veiga Beltrame Eduardo
Gehring Jase
Pachter Lior
Publication venue: Nature Publishing Group
Publication date: 27/08/2019
Field of study

The poseidon syringe pump and microscope system is an open source alternative to commercial systems. It costs less than $400 and can be assembled in under an hour using the instructions and source files available at https://pachterlab.github.io/poseidon. We describe the poseidon system and use it to illustrate design principles that can facilitate the adoption and development of open source bioinstruments. The principles are functionality, robustness, safety, simplicity, modularity, benchmarking, and documentation

Caltech Authors

Modular and efficient pre-processing of single-cell RNA-seq

Author: Booeshaghi A. Sina
da Veiga Beltrame Eduardo
Gao Fan
Gehring Jase
Hjorleifsson Kristján Eldjárn
Lu Lambda
Melsted Páll
Pachter Lior
Publication venue
Publication date: 17/06/2019
Field of study

Analysis of single-cell RNA-seq data begins with the pre-processing of reads to generate count matrices. We investigate algorithm choices for the challenges of pre-processing, and describe a workflow that balances efficiency and accuracy. Our workflow is based on the kallisto and bustools programs, and is near-optimal in speed and memory. The workflow is modular, and we demonstrate its flexibility by showing how it can be used for RNA velocity analyses

Stories in Single Cell RNA Sequencing

Author: da Veiga Beltrame Eduardo
Publication venue
Publication date: 01/01/2022
Field of study

This thesis describes the projects I have worked on since starting the Caltech bioengineering program in fall 2017. The general theme of my projects is that they are all about single cell RNA sequencing (scRNA-seq), spanning the experimental and computational realms. Chapter 1 is an introduction explaining the essential concepts and is meant to be readable by a wide audience. For the other chapters, each one describes a separate project in a succinct manner, including links to the related preprint, published paper or code repositories at the start of each chapter. Chapter 2 describes the scVI generative model for scRNA-seq data and the scvi-tools framework, which forms the basis of many of my computational projects. Chapter 3 describes an open source 3D printable syringe pump system that was developed envisioning facilitating many kinds of experiments, in particular droplet based scRNA-seq. Chapter 4 describes a new way of fabricating hydrogel beads with unique DNA barcodes that are used for scRNA-seq experiments. Chapter 5 describes a database listing most published scRNA-seq studies that I helped create, and provides a useful overview of the state of the field. Chapter 6 describes the kallisto bus workflow, which is used for pre-processing scRNA-seq data, going from FASTQ file to gene count matrix in a very efficient manner. Chapter 7 describes a new way of using scVI to quantify the trade- off in the quality of scRNA-seq of a given dataset when surveying more cells or sequencing more reads per cell. Chapter 8 describes tools developed for the WormBase users to leverage scRNA-seq data on C. elegans, and which can be deployed with any other scRNA-seq dataset. Chapter 9 describes a remarkably successful offshoot of the devel- opment of these tools: a simple scVI based analysis and visualization strategy for finding candidate marker genes using C. elegans scRNA-seq data, which was experimentally validated by members of the Sternberg lab.</p

Caltech Theses and Dissertations

Ben-David 2021 wormcells-viz data

Author: da Veiga Beltrame Eduardo
Publication venue: CaltechDATA
Publication date: 17/06/2021
Field of study

Processed data for the WormBase wormcells-viz app https://github.com/WormBase/wormcells-viz Data from https://data.caltech.edu/records/1972 === Original study === Whole-organism eQTL mapping at cellular resolution with single-cell sequencing Eyal Ben-David, James Boocock, Longhua Guo, Stefan Zdraljevic, Joshua S Bloom, Leonid Kruglyak eLife 2021;10:e65857, DOI: https://doi.org/10.7554/eLife.6585

CaltechDATA (California Institute of Technology Research Data Repository)

CeNGEN scRNAseq dataset wrangled into standard WormBase anndata - 100k C. elegans cells FACS sorted for neurons in L4 larvae

Author: da Veiga Beltrame Eduardo
Publication venue: CaltechDATA
Publication date: 26/05/2021
Field of study

CeNGEN (Taylor et al 2021) scRNAseq dataset wrangled into standard WormBase anndata 100k C. elegans cells FACS sorted to enrich for neurons - 70k cells are neurons, 30k are other types For more information about the data see the CeNGEN project website: https://cengen.org Preprint describing data and analysises: https://doi.org/10.1101/2020.12.15.422897 Molecular topography of an entire nervous system - bioRxiv 2020 Seth R Taylor, Gabriel Santpere, Alexis Weinreb, Alec Barrett, Molly B. Reilly, Chuan Xu, Erdem Varol, Panos Oikonomou, Lori Glenwinkel, Rebecca McWhirter, Abigail Poff, Manasa Basavaraju, Ibnul Rafi, Eviatar Yemini, Steven J Cook, Alexander Abrams, Berta Vidal, Cyril Cros, Saeed Tavazoie, Nenad Sestan, Marc Hammarlund, View ORCID ProfileOliver Hobert, David M. Miller III The file `cengen_author_provided_matrices.zip` contains the original mtx files provided by the authors as outputted by CellRanger, both filtered and unfiltered. The file `taylor2020.h5ad` contains those matrices and the metadata from the 2020 cengen data release wrangled into the WormBase standard format, which is described at: https://github.com/WormBase/anndata-wrangling

CaltechDATA (California Institute of Technology Research Data Repository)

Packer et al 2019 scRNAseq dataset wrangled into standard WormBase anndata- 89k cells profiled with 10xv2 across multiple timepoints of development

Author: da Veiga Beltrame Eduardo
Publication venue: CaltechDATA
Publication date: 06/04/2021
Field of study

This is part of a curated collection of all C. elegans single cell RNA sequencing high throughput data wrangled into the anndata format in .h5ad files with standard fields, plus any number of optional fields that vary depending on the metadata the authors provide. As possible, we attempt to keep the field names lower case, short, descriptive, and only using valid Python variable names so they may be accessed via the syntax adata.var.field_name For the convention used to wrangle the h5ad files see https://github.com/WormBase/single-cell/blob/main/data_wrangling_convention.md Notebook used to wrangle the data: https://github.com/WormBase/wormcells-notebooks/blob/main/wormcells_wrangle_packer2019_h5ad.ipynb Original study: A lineage-resolved molecular atlas of C. elegans embryogenesis at single-cell resolution Packer, Jonathan S. and Zhu, Qin and Huynh, Chau and Sivaramakrishnan, Priya and Preston, Elicia and Dueck, Hannah and Stefanik, Derek and Tan, Kai and Trapnell, Cole and Kim, Junhyong and Waterston, Robert H. and Murray, John I. Science 20 Sep 2019: Vol. 365, Issue 6459, eaax1971 DOI: 10.1126/science.aax1971 https://science.sciencemag.org/content/365/6459/eaax1971.editor-summary Data description: 89,701 cells profiled with 10xv2 across multiple timepoints of development Data available at: https://www.ncbi.nlm.nih.gov/geo/query/acc.cgi?acc=GSE126954 Fields of the anndata object as printed from Python: ``` print(adata) AnnData object with n_obs × n_vars = 20222 × 89701 obs: 'gene_id', 'gene_name' var: 'study', 'batch', 'sample', 'sample_description', 'barcode', 'cell_type', 'n_umi', 'time_point', 'size_factor', 'cell_subtype', 'plot_cell_type', 'raw_embryo_time', 'embryo_time', 'embryo_time_bin', 'raw_embryo_time_bin', 'lineage', 'passed_qc' print(adata.var.head(1).T) AnnData object with n_obs × n_vars = 20222 × 89701 obs: 'gene_id', 'gene_name' var: 'study', 'batch', 'sample', 'sample_description', 'barcode', 'cell_type', 'n_umi', 'time_point', 'size_factor', 'cell_subtype', 'plot_cell_type', 'raw_embryo_time', 'embryo_time', 'embryo_time_bin', 'raw_embryo_time_bin', 'lineage', 'passed_qc' print(adata.obs.head(1).T) WBGene00010957 gene_id WBGene00010957 gene_name nduo-6 ``

CaltechDATA (California Institute of Technology Research Data Repository)