Search CORE

1,162 research outputs found

Sparse tree-based clustering of microbiome data to characterize microbiome heterogeneity in pancreatic cancer

Author: Do Kim-Anh
Jenq Robert
Peterson Christine
Shi Yushu
Zhang Liangliang
Publication venue
Publication date: 02/03/2021
Field of study

There is a keen interest in characterizing variation in the microbiome across cancer patients, given increasing evidence of its important role in determining treatment outcomes. Here our goal is to discover subgroups of patients with similar microbiome profiles. We propose a novel unsupervised clustering approach in the Bayesian framework that innovates over existing model-based clustering approaches, such as the Dirichlet multinomial mixture model, in three key respects: we incorporate feature selection, learn the appropriate number of clusters from the data, and integrate information on the tree structure relating the observed features. We compare the performance of our proposed method to existing methods on simulated data designed to mimic real microbiome data. We then illustrate results obtained for our motivating data set, a clinical study aimed at characterizing the tumor microbiome of pancreatic cancer patients

arXiv.org e-Print Archive

DigitalCommons@The Texas Medical Center

A Machine Learning-Based Method for Automatically Identifying Novel Cells in Annotating Single-Cell RNA-Seq Data

Author: Colla Simona
Do Kim-Anh
Ganan-Gomez Irene
Li Ziyi
Wang Yizhuo
Publication venue: DigitalCommons@TMC
Publication date: 31/10/2022
Field of study

MOTIVATION: Single-cell RNA sequencing (scRNA-seq) has been widely used to decompose complex tissues into functionally distinct cell types. The first and usually the most important step of scRNA-seq data analysis is to accurately annotate the cell labels. In recent years, many supervised annotation methods have been developed and shown to be more convenient and accurate than unsupervised cell clustering. One challenge faced by all the supervised annotation methods is the identification of the novel cell type, which is defined as the cell type that is not present in the training data, only exists in the testing data. Existing methods usually label the cells simply based on the correlation coefficients or confidence scores, which sometimes results in an excessive number of unlabeled cells. RESULTS: We developed a straightforward yet effective method combining autoencoder with iterative feature selection to automatically identify novel cells from scRNA-seq data. Our method trains an autoencoder with the labeled training data and applies the autoencoder to the testing data to obtain reconstruction errors. By iteratively selecting features that demonstrate a bi-modal pattern and reclustering the cells using the selected feature, our method can accurately identify novel cells that are not present in the training data. We further combined this approach with a support vector machine to provide a complete solution for annotating the full range of cell types. Extensive numerical experiments using five real scRNA-seq datasets demonstrated favorable performance of the proposed method over existing methods serving similar purposes. AVAILABILITY AND IMPLEMENTATION:Our R software package CAMLU is publicly available through the Zenodo repository (https://doi.org/10.5281/zenodo.7054422) or GitHub repository (https://github.com/ziyili20/CAMLU)

DigitalCommons@The Texas Medical Center

survivalContour: Visualizing predicted survival via colored contour plots

Author: Do Kim-Anh
Jenq Robert R.
Peterson Christine B.
Shi Yushu
Zhang Liangliang
Publication venue
Publication date: 12/01/2024
Field of study

Advances in survival analysis have facilitated unprecedented flexibility in data modeling, yet there remains a lack of tools for graphically illustrating the influence of continuous covariates on predicted survival outcomes. We propose the utilization of a colored contour plot to depict the predicted survival probabilities over time, and provide a Shiny app and R package as implementations of this tool. Our approach is capable of supporting conventional models, including the Cox and Fine-Gray models. However, its capability shines when coupled with cutting-edge machine learning models such as random survival forests and deep neural networks

arXiv.org e-Print Archive

survivalContour: Visualizing Predicted Survival via Colored Contour Plots

Author: Do Kim-Anh
Jenq Robert R
Peterson Christine B
Shi Yushu
Zhang Liangliang
Publication venue: DigitalCommons@TMC
Publication date: 01/01/2024
Field of study

Advances in survival analysis have facilitated unprecedented flexibility in data modeling, yet there remains a lack of tools for illustrating the influence of continuous covariates on predicted survival outcomes. We propose the utilization of a colored contour plot to depict the predicted survival probabilities over time. Our approach is capable of supporting conventional models, including the Cox and Fine–Gray models. However, its capability shines when coupled with cutting-edge machine learning models such as random survival forests and deep neural networks. Availability and implementation We provide a Shiny app at https://biostatistics.mdanderson.org/shinyapps/survivalContour/ and an R package available at https://github.com/YushuShi/survivalContour as implementations of this tool

DigitalCommons@The Texas Medical Center

A Unified Mediation Analysis Framework for Integrative Cancer Proteogenomics with Clinical Outcomes

Author: Do Kim-Anh
Doecke James D
Ha Min Jin
Huang Licai
Irajizad Ehsan
Long James P
Publication venue: DigitalCommons@TMC
Publication date: 01/01/2023
Field of study

MOTIVATION: Multilevel molecular profiling of tumors and the integrative analysis with clinical outcomes have enabled a deeper characterization of cancer treatment. Mediation analysis has emerged as a promising statistical tool to identify and quantify the intermediate mechanisms by which a gene affects an outcome. However, existing methods lack a unified approach to handle various types of outcome variables, making them unsuitable for high-throughput molecular profiling data with highly interconnected variables. RESULTS: We develop a general mediation analysis framework for proteogenomic data that include multiple exposures, multivariate mediators on various scales of effects as appropriate for continuous, binary and survival outcomes. Our estimation method avoids imposing constraints on model parameters such as the rare disease assumption, while accommodating multiple exposures and high-dimensional mediators. We compare our approach to other methods in extensive simulation studies at a range of sample sizes, disease prevalence and number of false mediators. Using kidney renal clear cell carcinoma proteogenomic data, we identify genes that are mediated by proteins and the underlying mechanisms on various survival outcomes that capture short- and long-term disease-specific clinical characteristics

PubMed Central

DigitalCommons@The Texas Medical Center

Attempts to Understand Oral Mucositis in Head and Neck Cancer Patients through Omics Studies: A Narrative Review

Author: Do Kim-Anh
Reyes-Gibby Cielito C
San Valentin Erin Marie D
Yeung Sai-Ching J
Publication venue: DigitalCommons@TMC
Publication date: 30/11/2023
Field of study

Oral mucositis (OM) is a common and clinically impactful side effect of cytotoxic cancer treatment, particularly in patients with head and neck squamous cell carcinoma (HNSCC) who undergo radiotherapy with or without concomitant chemotherapy. The etiology and pathogenic mechanisms of OM are complex, multifaceted and elicit both direct and indirect damage to the mucosa. In this narrative review, we describe studies that use various omics methodologies (genomics, transcriptomics, microbiomics and metabolomics) in attempts to elucidate the biological pathways associated with the development or severity of OM. Integrating different omics into multi-omics approaches carries the potential to discover links among host factors (genomics), host responses (transcriptomics, metabolomics), and the local environment (microbiomics)

DigitalCommons@The Texas Medical Center