This paper investigates the application of Geneformer, a transformer-based model, for identifying genes that cause transitions between radiation levels in data-sparse situations. Traditional differential gene expression (DGE) methods often face limitations when data availability is minimal. Preprocessing was done to leverage high-throughput single-cell RNA sequencing data to ensure accurate analysis of the genes responsible for transitions in irradiated cell states. Statistical techniques, including t-tests, Wilcoxon rank-sum, and logistic regression, were employed to rank gene expression across four radiation exposures (0, 10, 100, and 1000 mGy). The Geneformer transformer-based model was fine-tuned on the tokenized data with hyperparameter optimization. This yielded significant improvements in classification accuracy as validated by two-dimensional embedding representations and in-silico perturbation experiments. When both processes were tested on data subsets consisting of 1024, 256, and 128 cells, the finetuned Geneformer model consistently outperformed the traditional DGE method. Overall, the findings demonstrate how Geneformer detects subtle shifts in gene expression with high precision and reliably identifies key genetic drivers of radiation response, thereby offering a viable alternative to conventional DGE approaches in low-data environments
Is data on this page outdated, violates copyrights or anything else? Report the problem now and we will take corresponding actions after reviewing your request.