19 research outputs found

    A computational method to aid the design and analysis of single cell RNA-seq experiments for cell type identification.

    Get PDF
    BACKGROUND: The advent of single cell RNA sequencing (scRNA-seq) enabled researchers to study transcriptomic activity within individual cells and identify inherent cell types in the sample. Although numerous computational tools have been developed to analyze single cell transcriptomes, there are no published studies and analytical packages available to guide experimental design and to devise suitable analysis procedure for cell type identification. RESULTS: We have developed an empirical methodology to address this important gap in single cell experimental design and analysis into an easy-to-use tool called SCEED (Single Cell Empirical Experimental Design and analysis). With SCEED, user can choose a variety of combinations of tools for analysis, conduct performance analysis of analytical procedures and choose the best procedure, and estimate sample size (number of cells to be profiled) required for a given analytical procedure at varying levels of cell type rarity and other experimental parameters. Using SCEED, we examined 3 single cell algorithms using 48 simulated single cell datasets that were generated for varying number of cell types and their proportions, number of genes expressed per cell, number of marker genes and their fold change, and number of single cells successfully profiled in the experiment. CONCLUSIONS: Based on our study, we found that when marker genes are expressed at fold change of 4 or more, either Seurat or SIMLR algorithm can be used to analyze single cell dataset for any number of single cells isolated (minimum 1000 single cells were tested). However, when marker genes are expected to be only up to fold change of 2, choice of the single cell algorithm is dependent on the number of single cells isolated and rarity of cell types to be identified. In conclusion, our work allows the assessment of various single cell methods and also aids in the design of single cell experiments

    Transfer learning-trained convolutional neural networks identify novel MRI biomarkers of Alzheimer\u27s disease progression.

    Get PDF
    Introduction: Genome-wide association studies (GWAS) for late onset Alzheimer\u27s disease (AD) may miss genetic variants relevant for delineating disease stages when using clinically defined case/control as a phenotype due to its loose definition and heterogeneity. Methods: We use a transfer learning technique to train three-dimensional convolutional neural network (CNN) models based on structural magnetic resonance imaging (MRI) from the screening stage in the Alzheimer\u27s Disease Neuroimaging Initiative consortium to derive image features that reflect AD progression. Results: CNN-derived image phenotypes are significantly associated with fasting metabolites related to early lipid metabolic changes as well as insulin resistance and with genetic variants mapped to candidate genes enriched for amyloid beta degradation, tau phosphorylation, calcium ion binding-dependent synaptic loss, Discussion: This is the first attempt to show that non-invasive MRI biomarkers are linked to AD progression characteristics, reinforcing their use in early AD diagnosis and monitoring

    Genomic data analysis workflows for tumors from patient-derived xenografts (PDXs): challenges and guidelines.

    Get PDF
    BACKGROUND: Patient-derived xenograft (PDX) models are in vivo models of human cancer that have been used for translational cancer research and therapy selection for individual patients. The Jackson Laboratory (JAX) PDX resource comprises 455 models originating from 34 different primary sites (as of 05/08/2019). The models undergo rigorous quality control and are genomically characterized to identify somatic mutations, copy number alterations, and transcriptional profiles. Bioinformatics workflows for analyzing genomic data obtained from human tumors engrafted in a mouse host (i.e., Patient-Derived Xenografts; PDXs) must address challenges such as discriminating between mouse and human sequence reads and accurately identifying somatic mutations and copy number alterations when paired non-tumor DNA from the patient is not available for comparison. RESULTS: We report here data analysis workflows and guidelines that address these challenges and achieve reliable identification of somatic mutations, copy number alterations, and transcriptomic profiles of tumors from PDX models that lack genomic data from paired non-tumor tissue for comparison. Our workflows incorporate commonly used software and public databases but are tailored to address the specific challenges of PDX genomics data analysis through parameter tuning and customized data filters and result in improved accuracy for the detection of somatic alterations in PDX models. We also report a gene expression-based classifier that can identify EBV-transformed tumors. We validated our analytical approaches using data simulations and demonstrated the overall concordance of the genomic properties of xenograft tumors with data from primary human tumors in The Cancer Genome Atlas (TCGA). CONCLUSIONS: The analysis workflows that we have developed to accurately predict somatic profiles of tumors from PDX models that lack normal tissue for comparison enable the identification of the key oncogenic genomic and expression signatures to support model selection and/or biomarker development in therapeutic studies. A reference implementation of our analysis recommendations is available at https://github.com/TheJacksonLaboratory/PDX-Analysis-Workflows

    Transcriptome Analysis of Zebrafish Embryogenesis Using Microarrays

    Get PDF
    Zebrafish (Danio rerio) is a well-recognized model for the study of vertebrate developmental genetics, yet at the same time little is known about the transcriptional events that underlie zebrafish embryogenesis. Here we have employed microarray analysis to study the temporal activity of developmentally regulated genes during zebrafish embryogenesis. Transcriptome analysis at 12 different embryonic time points covering five different developmental stages (maternal, blastula, gastrula, segmentation, and pharyngula) revealed a highly dynamic transcriptional profile. Hierarchical clustering, stage-specific clustering, and algorithms to detect onset and peak of gene expression revealed clearly demarcated transcript clusters with maximum gene activity at distinct developmental stages as well as co-regulated expression of gene groups involved in dedicated functions such as organogenesis. Our study also revealed a previously unidentified cohort of genes that are transcribed prior to the mid-blastula transition, a time point earlier than when the zygotic genome was traditionally thought to become active. Here we provide, for the first time to our knowledge, a comprehensive list of developmentally regulated zebrafish genes and their expression profiles during embryogenesis, including novel information on the temporal expression of several thousand previously uncharacterized genes. The expression data generated from this study are accessible to all interested scientists from our institute resource database (http://giscompute.gis.a-star.edu.sg/~govind/zebrafish/data_download.html)

    High-resolution deconstruction of evolution induced by chemotherapy treatments in breast cancer xenografts.

    Get PDF
    The processes by which tumors evolve are essential to the efficacy of treatment, but quantitative understanding of intratumoral dynamics has been limited. Although intratumoral heterogeneity is common, quantification of evolution is difficult from clinical samples because treatment replicates cannot be performed and because matched serial samples are infrequently available. To circumvent these problems we derived and assayed large sets of human triple-negative breast cancer xenografts and cell cultures from two patients, including 86 xenografts from cyclophosphamide, doxorubicin, cisplatin, docetaxel, or vehicle treatment cohorts as well as 45 related cell cultures. We assayed these samples via exome-seq and/or high-resolution droplet digital PCR, allowing us to distinguish complex therapy-induced selection and drift processes among endogenous cancer subclones with cellularity uncertaint

    Adaptive Sentinel Testing in Workplace for COVID-19 Pandemic.

    No full text
    Testing and isolation of infectious employees is one of the critical strategies to make the workplace safe during the pandemic for many organizations. Adaptive testing frequency reduces cost while keeping the pandemic under control at the workplace. However, most models aimed at estimating test frequencies were structured for municipalities or large organizations such as university campuses of highly mobile individuals. By contrast, the workplace exhibits distinct characteristics: employee positivity rate may be different from the local community because of rigorous protective measures at workplace, or self-selection of co-workers with common behavioral tendencies for adherence to pandemic mitigation guidelines. Moreover, dual exposure to COVID-19 occurs at work and home that complicates transmission modeling, as does transmission tracing at the workplace. Hence, we developed bi-modal SEIR (Susceptible, Exposed, Infectious, and Removed) model and R-shiny tool that accounts for these differentiating factors to adaptively estimate the testing frequency for workplace. Our tool uses easily measurable parameters: community incidence rate, risks of acquiring infection from community and workplace, workforce size, and sensitivity of testing. Our model is best suited for moderate-sized organizations with low internal transmission rates, no-outward facing employees whose position demands frequent in-person interactions with the public, and low to medium population positivity rates. Simulations revealed that employee behavior in adherence to protective measures at work and in their community, and the onsite workforce size have large effects on testing frequency. Reducing workplace transmission rate through workplace mitigation protocols and higher sensitivity of the test deployed, although to a lesser extent. Furthermore, our simulations showed that sentinel testing leads to only marginal increase in the number of infections even for high community incidence rates, suggesting that this may be a cost-effective approach in future pandemics. We used our model to accurately guide testing regimen for three campuses of the Jackson Laboratory

    Bioinformatics Core Survey Highlights the Challenges Facing Data Analysis Facilities.

    No full text
    Over the last decade, the cost of -omics data creation has decreased 10-fold, whereas the need for analytical support for those data has increased exponentially. Consequently, bioinformaticians face a second wave of challenges: novel applications of existing approaches (e.g., single-cell RNA sequencing), integration of -omics data sets of differing size and scale (e.g., spatial transcriptomics), as well as novel computational and statistical methods, all of which require more sophisticated pipelines and data management. Nonetheless, bioinformatics cores are often asked to operate under primarily a cost-recovery model, with limited institutional support. Seeing the need to assess bioinformatics core operations, the Association of Biomolecular Resource Facilities Genomics Bioinformatics Research Group conducted a survey to answer questions about staffing, services, financial models, and challenges to better understand the challenges bioinformatics core facilities are currently faced with and will need to address going forward. Of the respondent groups, we chose to focus on the survey data from smaller cores, which made up the majority. Although all cores indicated similar challenges in terms of changing technologies and analysis needs, small cores tended to have the added challenge of funding their operations largely through cost-recovery models with heavy administrative burdens

    Development and validation of the JAX Cancer Treatment Profile™ for detection of clinically actionable mutations in solid tumors.

    No full text
    BACKGROUND: The continued development of targeted therapeutics for cancer treatment has required the concomitant development of more expansive methods for the molecular profiling of the patient\u27s tumor. We describe the validation of the JAX Cancer Treatment Profile™ (JAX-CTP™), a next generation sequencing (NGS)-based molecular diagnostic assay that detects actionable mutations in solid tumors to inform the selection of targeted therapeutics for cancer treatment. METHODS: NGS libraries are generated from DNA extracted from formalin fixed paraffin embedded tumors. Using hybrid capture, the genes of interest are enriched and sequenced on the Illumina HiSeq 2500 or MiSeq sequencers followed by variant detection and functional and clinical annotation for the generation of a clinical report. RESULTS: The JAX-CTP™ detects actionable variants, in the form of single nucleotide variations and small insertions and deletions (≤50 bp) in 190 genes in specimens with a neoplastic cell content of ≥10%. The JAX-CTP™ is also validated for the detection of clinically actionable gene amplifications. CONCLUSIONS: There is a lack of consensus in the molecular diagnostics field on the best method for the validation of NGS-based assays in oncology, thus the importance of communicating methods, as contained in this report. The growing number of targeted therapeutics and the complexity of the tumor genome necessitate continued development and refinement of advanced assays for tumor profiling to enable precision cancer treatment. Exp Mol Pathol 2015 Feb; 98(1):106-112

    CUP-AI-Dx: A tool for inferring cancer tissue of origin and molecular subtype using RNA gene-expression data and artificial intelligence.

    No full text
    BACKGROUND: Cancer of unknown primary (CUP), representing approximately 3-5% of all malignancies, is defined as metastatic cancer where a primary site of origin cannot be found despite a standard diagnostic workup. Because knowledge of a patient\u27s primary cancer remains fundamental to their treatment, CUP patients are significantly disadvantaged and most have a poor survival outcome. Developing robust and accessible diagnostic methods for resolving cancer tissue of origin, therefore, has significant value for CUP patients. METHODS: We developed an RNA-based classifier called CUP-AI-Dx that utilizes a 1D Inception convolutional neural network (1D-Inception) model to infer a tumor\u27s primary tissue of origin. CUP-AI-Dx was trained using the transcriptional profiles of 18,217 primary tumours representing 32 cancer types from The Cancer Genome Atlas project (TCGA) and International Cancer Genome Consortium (ICGC). Gene expression data was ordered by gene chromosomal coordinates as input to the 1D-CNN model, and the model utilizes multiple convolutional kernels with different configurations simultaneously to improve generality. The model was optimized through extensive hyperparameter tuning, including different max-pooling layers and dropout settings. For 11 tumour types, we also developed a random forest model that can classify the tumour\u27s molecular subtype according to prior TCGA studies. The optimised CUP-AI-Dx tissue of origin classifier was tested on 394 metastatic samples from 11 tumour types from TCGA and 92 formalin-fixed paraffin-embedded (FFPE) samples representing 18 cancer types from two clinical laboratories. The CUP-AI-Dx molecular subtype was also independently tested on independent ovarian and breast cancer microarray datasets FINDINGS: CUP-AI-Dx identifies the primary site with an overall top-1-accuracy of 98.54% in cross-validation and 96.70% on a test dataset. When applied to two independent clinical-grade RNA-seq datasets generated from two different institutes from the US and Australia, our model predicted the primary site with a top-1-accuracy of 86.96% and 72.46% respectively. INTERPRETATION: The CUP-AI-Dx predicts tumour primary site and molecular subtype with high accuracy and therefore can be used to assist the diagnostic work-up of cancers of unknown primary or uncertain origin using a common and accessible genomics platform. FUNDING: NIH R35 GM133562, NCI P30 CA034196, Victorian Cancer Agency Australia
    corecore