Search CORE

27 research outputs found

Applying Penalized Binary Logistic Regression with Correlation Based Elastic Net for Variables Selection

Author: Algamal Zakariya Yahya
Lee Muhammad Hisyam
Publication venue: DigitalCommons@WayneState
Publication date: 01/05/2015
Field of study

Reduction of the high dimensional classification using penalized logistic regression is one of the challenges in applying binary logistic regression. The applied penalized method, correlation based elastic penalty (CBEP), was used to overcome the limitation of LASSO and elastic net in variable selection when there are perfect correlation among explanatory variables. The performance of the CBEP was demonstrated through its application in analyzing two well-known high dimensional binary classification data sets. The CBEP provided superior classification performance and variable selection compared with other existing penalized methods. It is a reliable penalized method in binary logistic regression

Crossref

Digital Commons@Wayne State University

An efficient gene selection method for high-dimensional microarray data based on sparse logistic regression

Author: Algamal Zakariya
Publication venue: Coordinamento SIBA - Università del Salento
Publication date: 26/04/2017
Field of study

Gene selection in high-dimensional microarray data has become increasingly important in cancer classification. The high dimensionality of microarray data makes the application of many expert classifier systems difficult.To simultaneously perform gene selection and estimate the gene coefficientsin the model, sparse logistic regression using L1-norm was successfully applied in high-dimensional microarray data. However, when there are highcorrelation among genes, L1-norm cannot perform effectively. To addressthis issue, an efficient sparse logistic regression (ESLR) is proposed. Extensive applications using high-dimensional gene expression data show that ourproposed method can successfully select the highly correlated genes. Furthermore, ESLR is compared with other three methods and exhibits competitiveperformance in both classification accuracy and Youdens index. Thus, wecan conclude that ESLR has significant impact in sparse logistic regressionmethod and could be used in the field of high-dimensional microarray datacancer classification

ESE - Salento University Publishing

Università del Salento: ESE - Salento University Publishing

Multinomial Regression with Elastic Net Penalty and Its Grouping Effect in Gene Selection

Author: Jie Yang
Juntao Li
Liuyuan Chen
Xiaoyu Wang
Publication venue: 'Hindawi Limited'
Publication date: 01/01/2014
Field of study

For the multiclass classification problem of microarray data, a new optimization model named multinomial regression with the elastic net penalty was proposed in this paper. By combining the multinomial likeliyhood loss and the multiclass elastic net penalty, the optimization model was constructed, which was proved to encourage a grouping effect in gene selection for multiclass classification

Crossref

Directory of Open Access Journals

Classifying clinically actionable genetic mutations using KNN and SVM

Author: Chivukula Rohit
Pavani Satti Thanuja
Tangirala Jaya Lakshmi
Uday Sanku Satya
Publication venue: Institute of Advanced Engineering and Science
Publication date: 01/12/2021
Field of study

Cancer is one of the major causes of death in humans. Early diagnosis of genetic mutations that cause cancer tumor growth leads to personalized medicine to the decease and can save the life of majority of patients. With this aim, Kaggle has conducted a competition to classify clinically actionable gene mutations based on clinical evidence and some other features related to gene mutations. The dataset contains 3321 training data points that can be classified into 9 classes. In this work, an attempt is made to classify these data points using K-nearest neighbors (KNN) and linear support vector machines (SVM) in a multi class environment. As the features are categorical, one hot encoding as well as response coding are applied to make them suitable to the classifiers. The prediction performance is evaluated using log loss and KNN has performed better with a log loss value of 1.10 compared to that of SVM 1.24

Sheffield Hallam University Research Archive

Polytomy identification in microbial phylogenetic reconstruction

Author: Lin Guan Ning
Xu Dong
Zhang Chao
Publication venue: BioMed Central
Publication date: 01/01/2011
Field of study

Abstract Background A phylogenetic tree, showing ancestral relations among organisms, is commonly represented as a rooted tree with sets of bifurcating branches (dichotomies) for simplicity, although polytomies (multifurcating branches) may reflect more accurate evolutionary relationships. To represent the true evolutionary relationships, it is important to systematically identify the polytomies from a bifurcating tree and generate a taxonomy-compatible multifurcating tree. For this purpose we propose a novel approach, "PolyPhy", which would classify a set of bifurcating branches of a phylogenetic tree into a set of branches with dichotomies and polytomies by considering genome distances among genomes and tree topological properties. Results PolyPhy employs a machine learning technique, BLR (Bayesian logistic regression) classifier, to identify possible bifurcating subtrees as polytomies from the trees resulted from ComPhy. Other than considering genome-scale distances between all pairs of species, PolyPhy also takes into account different properties of tree topology between dichotomy and polytomy, such as long-branch retraction and short-branch contraction, and quantifies these properties into comparable rates among different sub-branches. We extract three tree topological features, 'LR' (Leaf rate), 'IntraR' (Intra-subset branch rate) and 'InterR' (Inter-subset branch rate), all of which are calculated from bifurcating tree branch sets for classification. We have achieved F-measure (balanced measure between precision and recall) of 81% with about 0.9 area under the curve (AUC) of ROC. Conclusions PolyPhy is a fast and robust method to identify polytomies from phylogenetic trees based on genome-wide inference of evolutionary relationships among genomes. The software package and test data can be downloaded from <url>http://digbio.missouri.edu/ComPhy/phyloTreeBiNonBi-1.0.zip</url>.</p

CiteSeerX

Crossref

Springer - Publisher Connector

Directory of Open Access Journals

PubMed Central

eScholarship - University of California

Sparse logistic regression with a L1/2 penalty for gene selection in cancer classification

Author: A Ben-Dor
A Nagai
AJ Yang
C Ding
C Moroz
CC Gavin
CH Zhang
Cheng Liu
DA Notterman
G Monari
H Zou
Hai Zhang
HD Li
I Guyon
I Rivals
I Sohn
J Fan
J Fiedman
J Fiedman
J Wiese AH
JH Dai
JW Lee
K Shailubhai
K Yang
Kwong-Sak Leung
MA Shipp
R Maglietta
R Tibshirani
S Dudoit
SK Shevade
SL Wang
T Golub
T Li
Tak-Ming Chan
U Alon
Xin-Ze Luan
Yong Liang
ZB Xu
ZB Xu
Zong-Ben Xu
Publication venue: 'Springer Science and Business Media LLC'
Publication date
Field of study

Crossref

An Elastic-net Logistic Regression Approach to Generate Classifiers and Gene Signatures for Types of Immune Cells and T Helper Cell Subsets

Author: Gupta Paraag
Klinke II David J.
Torang Arezo
Publication venue: The Research Repository @ WVU
Publication date: 01/01/2019
Field of study

Background: Host immune response is coordinated by a variety of different specialized cell types that vary in time and location. While host immune response can be studied using conventional low-dimensional approaches, advances in transcriptomics analysis may provide a less biased view. Yet, leveraging transcriptomics data to identify immune cell subtypes presents challenges for extracting informative gene signatures hidden within a high dimensional transcriptomics space characterized by low sample numbers with noisy and missing values. To address these challenges, we explore using machine learning methods to select gene subsets and estimate gene coefficients simultaneously. Results: Elastic-net logistic regression, a type of machine learning, was used to construct separate classifiers for ten different types of immune cell and for five T helper cell subsets. The resulting classifiers were then used to develop gene signatures that best discriminate among immune cell types and T helper cell subsets using RNA-seq datasets. We validated the approach using single-cell RNA-seq (scRNA-seq) datasets, which gave consistent results. In addition, we classified cell types that were previously unannotated. Finally, we benchmarked the proposed gene signatures against other existing gene signatures. Conclusions: Developed classifiers can be used as priors in predicting the extent and functional orientation of the host immune response in diseases, such as cancer, where transcriptomic profiling of bulk tissue samples and single cells are routinely employed. Information that can provide insight into the mechanistic basis of disease and therapeutic response. The so

The Research Repository @ WVU (West Virginia University)

Error margin analysis for feature gene extraction

Author: Chow Chi Kin
Kuo Winston P
Lacy Jessica
Zhu Hai Long
Publication venue: BioMed Central
Publication date: 01/01/2010
Field of study

Abstract Background Feature gene extraction is a fundamental issue in microarray-based biomarker discovery. It is normally treated as an optimization problem of finding the best predictive feature genes that can effectively and stably discriminate distinct types of disease conditions, e.g. tumors and normals. Since gene microarray data normally involves thousands of genes at, tens or hundreds of samples, the gene extraction process may fall into local optimums if the gene set is optimized according to the maximization of classification accuracy of the classifier built from it. Results In this paper, we propose a novel gene extraction method of error margin analysis to optimize the feature genes. The proposed algorithm has been tested upon one synthetic dataset and two real microarray datasets. Meanwhile, it has been compared with five existing gene extraction algorithms on each dataset. On the synthetic dataset, the results show that the feature set extracted by our algorithm is the closest to the actual gene set. For the two real datasets, our algorithm is superior in terms of balancing the size and the validation accuracy of the resultant gene set when comparing to other algorithms. Conclusion Because of its distinct features, error margin analysis method can stably extract the relevant feature genes from microarray data for high-performance classification.</p

The Hong Kong Polytechnic University Pao Yue-kong Library

Crossref

Springer - Publisher Connector

Directory of Open Access Journals

PolyU Institutional Repository

PubMed Central

HKU Scholars Hub