11 research outputs found
Using microarray-based subtyping methods for breast cancer in the era of high-throughput RNA sequencing
Breast cancer is a highly heterogeneous disease that can be classified into multiple subtypes based on the tumor transcriptome. Most of the subtyping schemes used in clinics today are derived from analyses of microarray data from thousands of different tumors together with clinical data for the patients from which the tumors were isolated. However, RNA sequencing (RNAâSeq) is gradually replacing microarrays as the preferred transcriptomics platform, and although transcript abundances measured by the two different technologies are largely compatible, subtyping methods developed for probeâbased microarray data are incompatible with RNAâSeq as input data. Here, we present an RNAâSeq data processing pipeline, which relies on the mapping of sequencing reads to the probe set target sequences instead of the human reference genome, thereby enabling probeâbased subtyping of breast cancer tumor tissue using sequencingâbased transcriptomics. By analyzing 66 breast cancer tumors for which gene expression was measured using both microarrays and RNAâSeq, we show that RNAâSeq data can be directly compared to microarray data using our pipeline. Additionally, we demonstrate that the established subtyping method CITBCMST (Guedj et al., ), which relies on a 375 probe setâsignature to classify samples into the six subtypes basL, lumA, lumB, lumC, mApo, and normL, can be applied without further modifications. This pipeline enables a seamless transition to sequencingâbased transcriptomics for future clinical purposes
Building flexible and robust analysis frameworks for molecular subtyping of cancers
Molecular subtyping is essential to infer tumor aggressiveness and predict prognosis. In practice, tumor profiling requires inâdepth knowledge of bioinformatics tools involved in the processing and analysis of the generated data. Additionally, data incompatibility (e.g., microarray versus RNA sequencing data) and technical and uncharacterized biological variance between training and test data can pose challenges in classifying individual samples. In this article, we provide a roadmap for implementing bioinformatics frameworks for molecular profiling of human cancers in a clinical diagnostic setting. We describe a framework for integrating several methods for quality control, normalization, batch correction, classification and reporting, and develop a use case of the framework in breast cancer
Recommended from our members
cyCombine allows for robust integration of single-cell cytometry datasets within and across technologies.
Combining single-cell cytometry datasets increases the analytical flexibility and the statistical power of data analyses. However, in many cases the full potential of co-analyses is not reached due to technical variance between data from different experimental batches. Here, we present cyCombine, a method to robustly integrate cytometry data from different batches, experiments, or even different experimental techniques, such as CITE-seq, flow cytometry, and mass cytometry. We demonstrate that cyCombine maintains the biological variance and the structure of the data, while minimizing the technical variance between datasets. cyCombine does not require technical replicates across datasets, and computation time scales linearly with the number of cells, allowing for integration of massive datasets. Robust, accurate, and scalable integration of cytometry data enables integration of multiple datasets for primary data analyses and the validation of results using public datasets
Recommended from our members
cyCombine allows for robust integration of single-cell cytometry datasets within and across technologies
Combining single-cell cytometry datasets increases the analytical flexibility and the statistical power of data analyses. However, in many cases the full potential of co-analyses is not reached due to technical variance between data from different experimental batches. Here, we present cyCombine, a method to robustly integrate cytometry data from different batches, experiments, or even different experimental techniques, such as CITE-seq, flow cytometry, and mass cytometry. We demonstrate that cyCombine maintains the biological variance and the structure of the data, while minimizing the technical variance between datasets. cyCombine does not require technical replicates across datasets, and computation time scales linearly with the number of cells, allowing for integration of massive datasets. Robust, accurate, and scalable integration of cytometry data enables integration of multiple datasets for primary data analyses and the validation of results using public datasets