Search CORE

36 research outputs found

PACS: Prediction and analysis of cancer subtypes from multi-omics data based on a multi-head attention mechanism model

Author: Feng Zhichao
Liu Dazheng
Liu Wenjuan
Pan Liangrui
Peng Shaoliang
Publication venue
Publication date: 20/08/2023
Field of study

Due to the high heterogeneity and clinical characteristics of cancer, there are significant differences in multi-omic data and clinical characteristics among different cancer subtypes. Therefore, accurate classification of cancer subtypes can help doctors choose the most appropriate treatment options, improve treatment outcomes, and provide more accurate patient survival predictions. In this study, we propose a supervised multi-head attention mechanism model (SMA) to classify cancer subtypes successfully. The attention mechanism and feature sharing module of the SMA model can successfully learn the global and local feature information of multi-omics data. Second, it enriches the parameters of the model by deeply fusing multi-head attention encoders from Siamese through the fusion module. Validated by extensive experiments, the SMA model achieves the highest accuracy, F1 macroscopic, F1 weighted, and accurate classification of cancer subtypes in simulated, single-cell, and cancer multiomics datasets compared to AE, CNN, and GNN-based models. Therefore, we contribute to future research on multiomics data using our attention-based approach.Comment: Submitted to BIBM202

arXiv.org e-Print Archive

RefSelect: a reference sequence selection algorithm for planted (l, d) motif search

Author: Feng Dazheng
Huan Jun
Huo Hongwei
Vitter Jeffrey Scott
Yu Qiang
Zhao Ruixing
Publication venue: 'Springer Science and Business Media LLC'
Publication date: 19/07/2016
Field of study

Background The planted (l, d) motif search (PMS) is an important yet challenging problem in computational biology. Pattern-driven PMS algorithms usually use k out of t input sequences as reference sequences to generate candidate motifs, and they can find all the (l, d) motifs in the input sequences. However, most of them simply take the first k sequences in the input as reference sequences without elaborate selection processes, and thus they may exhibit sharp fluctuations in running time, especially for large alphabets. Results In this paper, we build the reference sequence selection problem and propose a method named RefSelect to quickly solve it by evaluating the number of candidate motifs for the reference sequences. RefSelect can bring a practical time improvement of the state-of-the-art pattern-driven PMS algorithms. Experimental results show that RefSelect (1) makes the tested algorithms solve the PMS problem steadily in an efficient way, (2) particularly, makes them achieve a speedup of up to about 100× on the protein data, and (3) is also suitable for large data sets which contain hundreds or more sequences. Conclusions The proposed algorithm RefSelect can be used to solve the problem that many pattern-driven PMS algorithms present execution time instability. RefSelect requires a small amount of storage space and is capable of selecting reference sequences efficiently and effectively. Also, the parallel version of RefSelect is provided for handling large data sets

KU ScholarWorks

PubMed Central

Image registration algorithm using Mexican hat function-based operator and grouped feature matching strategy.

Author: Dazheng Feng
Feng Jin
Publication venue: Public Library of Science (PLoS)
Publication date: 21/04/2014
Field of study

Feature detection and matching are crucial for robust and reliable image registration. Although many methods have been developed, they commonly focus on only one class of image features. The methods that combine two or more classes of features are still novel and significant. In this work, methods for feature detection and matching are proposed. A Mexican hat function-based operator is used for image feature detection, including the local area detection and the feature point detection. For the local area detection, we use the Mexican hat operator for image filtering, and then the zero-crossing points are extracted and merged into the area borders. For the feature point detection, the Mexican hat operator is performed in scale space to get the key points. After the feature detection, an image registration is achieved by using the two classes of image features. The feature points are grouped according to a standardized region that contains correspondence to the local area, precise registration is achieved eventually by the grouped points. An image transformation matrix is estimated by the feature points in a region and then the best one is chosen through competition of a set of the transformation matrices. This strategy has been named the Grouped Sample Consensus (GCS). The GCS has also ability for removing the outliers effectively. The experimental results show that the proposed algorithm has high registration accuracy and small computational volume

Public Library of Science (PLOS)

Directory of Open Access Journals

PubMed Central