Search CORE

35 research outputs found

Minimum Enclosing Spheres Formulations for Support Vector Ordinal Regression

Author: Chu Wei
Shevade SK
Publication venue: 'Institute of Electrical and Electronics Engineers (IEEE)'
Publication date
Field of study

We present two new support vector approaches for ordinal regression. These approaches find the concentric spheres with minimum volume that contain most of the training samples. Both approaches guarantee that the radii of the spheres are properly ordered at the optimal solution. The size of the optimization problem is linear in the number of training samples. The popular SMO algorithm is adapted to solve the resulting optimization problem. Numerical experiments on some real-world data sets verify the usefulness of our approaches for data mining

Crossref

An incremental dual nu-support vector regression algorithm

Author: AJ Smola
B Gu
B Gu
B Gu
B Gu
B Scholkopf
C-C Chang
D Meyer
DH Hong
G Chen
G Huang
J Ma
N Takahashi
OA Omitaomu
OA Omitaomu
R Collobert
SK Shevade
X Peng
X Peng
X Yang
X Yang
Publication venue: 'Springer Science and Business Media LLC'
Publication date: 01/01/2018
Field of study

© 2018, Springer International Publishing AG, part of Springer Nature. Support vector regression (SVR) has been a hot research topic for several years as it is an effective regression learning algorithm. Early studies on SVR mostly focus on solving large-scale problems. Nowadays, an increasing number of researchers are focusing on incremental SVR algorithms. However, these incremental SVR algorithms cannot handle uncertain data, which are very common in real life because the data in the training example must be precise. Therefore, to handle the incremental regression problem with uncertain data, an incremental dual nu-support vector regression algorithm (dual-v-SVR) is proposed. In the algorithm, a dual-v-SVR formulation is designed to handle the uncertain data at first, then we design two special adjustments to enable the dual-v-SVR model to learn incrementally: incremental adjustment and decremental adjustment. Finally, the experiment results demonstrate that the incremental dual-v-SVR algorithm is an efficient incremental algorithm which is not only capable of solving the incremental regression problem with uncertain data, it is also faster than batch or other incremental SVR algorithms

Crossref

OPUS - University of Technology Sydney

A novel application of quantile regression for identification of biomarkers exemplified by equine cartilage microarray data

Author: Arne C Bathke
Arnold J Stromberg
AV Loguinov
Christopher P Saunders
H Wang
H Wang
James N MacLeod
Liping Huang
M Schena
Mai Zhou
MS Pepe
MS Pepe
R Koenker
R Koenker
RA Vaishnav
S Dudoit
SK Shevade
W Chu
Wenying Zhu
YH Yang
Publication venue: BioMed Central
Publication date: 01/01/2008
Field of study

Abstract Background Identification of biomarkers among thousands of genes arrayed for disease classification has been the subject of considerable research in recent years. These studies have focused on disease classification, comparing experimental groups of effected to normal patients. Related experiments can be done to identify tissue-restricted biomarkers, genes with a high level of expression in one tissue compared to other tissue types in the body. Results In this study, cartilage was compared with ten other body tissues using a two color array experimental design. Thirty-seven probe sets were identified as cartilage biomarkers. Of these, 13 (35%) have existing annotation associated with cartilage including several well-established cartilage biomarkers. These genes comprise a useful database from which novel targets for cartilage biology research can be selected. We determined cartilage specific Z-scores based on the observed M to classify genes with Z-scores ≥ 1.96 in all ten cartilage/tissue comparisons as cartilage-specific genes. Conclusion Quantile regression is a promising method for the analysis of two color array experiments that compare multiple samples in the absence of biological replicates, thereby limiting quantifiable error. We used a nonparametric approach to reveal the relationship between percentiles of M and A, where M is log2(R/G) and A is 0.5 log2(RG) with R representing the gene expression level in cartilage and G representing the gene expression level in one of the other 10 tissues. Then we performed linear quantile regression to identify genes with a cartilage-restricted pattern of expression.</p

Crossref

Springer - Publisher Connector

Directory of Open Access Journals

PubMed Central

University of Kentucky

Graphical modeling of binary data using the LASSO: a simulation study

Author: A Agresti
A Genkin
A Wille
AJ Rothman
AP Dempster
B Krishnapuram
C Ambroise
CL Tsai
CP Robert
D Edwards
Eva Grill
F Bunea
F Wong
FR Bach
J Friedman
J Goeman
J Lokhorst
JJ Goeman
L Breiman
L Breiman
M Kalisch
M Yuan
MA Hernan
MJ Wainwright
N Meinshausen
N Meinshausen
O Banerjee
P Ravikumar
P Wang
R Development Core Team
R Strobl
R Tibshirani
R Tibshirani
Ralf Strobl
S Becker
SA van der Geer
SK Shevade
Ulrich Mansmann
V Viallon
World Health Organization
X Gao
Publication venue: BioMed Central
Publication date: 01/01/2012
Field of study

Background: Graphical models were identified as a promising new approach to modeling high-dimensional clinical data. They provided a probabilistic tool to display, analyze and visualize the net-like dependence structures by drawing a graph describing the conditional dependencies between the variables. Until now, the main focus of research was on building Gaussian graphical models for continuous multivariate data following a multivariate normal distribution. Satisfactory solutions for binary data were missing. We adapted the method of Meinshausen and Buhlmann to binary data and used the LASSO for logistic regression. Objective of this paper was to examine the performance of the Bolasso to the development of graphical models for high dimensional binary data. We hypothesized that the performance of Bolasso is superior to competing LASSO methods to identify graphical models. Methods: We analyzed the Bolasso to derive graphical models in comparison with other LASSO based method. Model performance was assessed in a simulation study with random data generated via symmetric local logistic regression models and Gibbs sampling. Main outcome variables were the Structural Hamming Distance and the Youden Index. We applied the results of the simulation study to a real-life data with functioning data of patients having head and neck cancer. Results: Bootstrap aggregating as incorporated in the Bolasso algorithm greatly improved the performance in higher sample sizes. The number of bootstraps did have minimal impact on performance. Bolasso performed reasonable well with a cutpoint of 0.90 and a small penalty term. Optimal prediction for Bolasso leads to very conservative models in comparison with AIC, BIC or cross-validated optimal penalty terms. Conclusions: Bootstrap aggregating may improve variable selection if the underlying selection process is not too unstable due to small sample size and if one is mainly interested in reducing the false discovery rate. We propose using the Bolasso for graphical modeling in large sample sizes

Crossref

Springer - Publisher Connector

PubMed Central

Open Access LMU

Sparse logistic regression with a L1/2 penalty for gene selection in cancer classification

Author: A Ben-Dor
A Nagai
AJ Yang
C Ding
C Moroz
CC Gavin
CH Zhang
Cheng Liu
DA Notterman
G Monari
H Zou
Hai Zhang
HD Li
I Guyon
I Rivals
I Sohn
J Fan
J Fiedman
J Fiedman
J Wiese AH
JH Dai
JW Lee
K Shailubhai
K Yang
Kwong-Sak Leung
MA Shipp
R Maglietta
R Tibshirani
S Dudoit
SK Shevade
SL Wang
T Golub
T Li
Tak-Ming Chan
U Alon
Xin-Ze Luan
Yong Liang
ZB Xu
ZB Xu
Zong-Ben Xu
Publication venue: 'Springer Science and Business Media LLC'
Publication date
Field of study

Crossref

Predicting risk for Alcohol Use Disorder using longitudinal data with multimodal biomarkers and family history: a machine learning study.

Author: Adi Maron-Katz
Andrey P. Anokhin
Ashwini K. Pandey
Bernice Porjesz
CA Prescott
Chella Kamarajan
D Ghosh
D Lai
D Librenza-Garcia
D Meunier
Dan Pitti
David B. Chorlian
DB Chorlian
E Jorgenson
E Lopez-Caneda
EG Peniston
F Zappasodi
G Xue
Gayathri Pandey
GH Klem
GW Mies
H Begleiter
H Yang
HJ Edenberg
Howard J. Edenberg
I Guyon
J Bi
J Gelernter
J Lagopoulos
Jacquelyn L. Meyers
Jian Zhang
JJ Newson
JL Meyers
JN Giedd
K Xu
L Li
L Wetherill
Lance Bauer
M Rangaswamy
M Rangaswamy
M Shim
Marc A. Schuckit
MD Sacchet
ME Patrick
P Sankar
R Oostenveld
R Polimanti
R Tibshirani
R Whelan
RO Affan
S Kinreich
S Macleod
SE Ali-Khan
Sivan Kinreich
SK Shevade
SM Park
ST Charles
Stacey Subbie-Saenz de Viteri
T Reich
T van Kerkoerle
TC Ngun
TK Clarke
TW Pierce
UR Acharya
V Menon
Victor Hesselbrock
W Mumtaz
W Mumtaz
Publication venue: eScholarship, University of California
Publication date: 01/04/2021
Field of study

Predictive models have succeeded in distinguishing between individuals with Alcohol use Disorder (AUD) and controls. However, predictive models identifying who is prone to develop AUD and the biomarkers indicating a predisposition to AUD are still unclear. Our sample (n = 656) included offspring and non-offspring of European American (EA) and African American (AA) ancestry from the Collaborative Study of the Genetics of Alcoholism (COGA) who were recruited as early as age 12 and were unaffected at first assessment and reassessed years later as AUD (DSM-5) (n = 328) or unaffected (n = 328). Machine learning analysis was performed for 220 EEG measures, 149 alcohol-related single nucleotide polymorphisms (SNPs) from a recent large Genome-wide Association Study (GWAS) of alcohol use/misuse and two family history (mother DSM-5 AUD and father DSM-5 AUD) features using supervised, Linear Support Vector Machine (SVM) classifier to test which features assessed before developing AUD predict those who go on to develop AUD. Age, gender, and ancestry stratified analyses were performed. Results indicate significant and higher accuracy rates for the AA compared with the EA prediction models and a higher model accuracy trend among females compared with males for both ancestries. Combined EEG and SNP features model outperformed models based on only EEG features or only SNP features for both EA and AA samples. This multidimensional superiority was confirmed in a follow-up analysis in the AA age groups (12-15, 16-19, 20-30) and EA age group (16-19). In both ancestry samples, the youngest age group achieved higher accuracy score than the two other older age groups. Maternal AUD increased the model's accuracy in both ancestries' samples. Several discriminative EEG measures and SNPs features were identified, including lower posterior gamma, higher slow wave connectivity (delta, theta, alpha), higher frontal gamma ratio, higher beta correlation in the parietal area, and 5 SNPs: rs4780836, rs2605140, rs11690265, rs692854, and rs13380649. Results highlight the significance of sampling uniformity followed by stratified (e.g., ancestry, gender, developmental period) analysis, and wider selection of features, to generate better prediction scores allowing a more accurate estimation of AUD development

Crossref

PubMed Central

eScholarship - University of California

Individualized markers optimize class prediction of microarray data

Author: A Ben-Dor
A von Heydebreck
A Wong
AA Ferrando
AR Dabney
C Chang
C Motz
C Wu
CA Iacobuzio-Donahue
E Coustan-Smith
E Schleiff
F Martella
G Callagy
G Salomons
I Guyon
I Inza
J Catlett
J Held-Feindt
J Li
J Lyons-Weiler
J Reiss
J Weston
J Zhang
JG Thomas
K Bloch
M Sanchez-Carbayo
M Steinau
M West
ME Lenburg
ME Ross
MI Ryder
MS Felipe
P Baldi
P Ganigi
P Luciani
P Luciani
Panayiota Poirazi
Pavlos Pavlidis
R Bijlani
R Diaz-Uriarte
S Aulwurm
S Ilyin
S Ilyin
S Kumar
S Nambiar
S Steller
S Varma
SA Armstrong
SK Shevade
SL Pomeroy
SM Arfin
T Karakas
TR Golub
TS Tanaka
U Fayyad
V Sriuranpong
W Kolch
X Chen
X Liu
X Liu
X Yan
Y Chen
Y Cheng
Y Li
Y Wang
Y Yanagi
Publication venue: BioMed Central
Publication date: 01/07/2006
Field of study

BACKGROUND: Identification of molecular markers for the classification of microarray data is a challenging task. Despite the evident dissimilarity in various characteristics of biological samples belonging to the same category, most of the marker – selection and classification methods do not consider this variability. In general, feature selection methods aim at identifying a common set of genes whose combined expression profiles can accurately predict the category of all samples. Here, we argue that this simplified approach is often unable to capture the complexity of a disease phenotype and we propose an alternative method that takes into account the individuality of each patient-sample. RESULTS: Instead of using the same features for the classification of all samples, the proposed technique starts by creating a pool of informative gene-features. For each sample, the method selects a subset of these features whose expression profiles are most likely to accurately predict the sample's category. Different subsets are utilized for different samples and the outcomes are combined in a hierarchical framework for the classification of all samples. Moreover, this approach can innately identify subgroups of samples within a given class which share common feature sets thus highlighting the effect of individuality on gene expression. CONCLUSION: In addition to high classification accuracy, the proposed method offers a more individualized approach for the identification of biological markers, which may help in better understanding the molecular background of a disease and emphasize the need for more flexible medical interventions

Crossref

Directory of Open Access Journals

PubMed Central

Scalable Rough Support Vector Clustering

Author: Asharaf S
Murty Narasimha M
Shevade SK
Publication venue
Publication date
Field of study

In this paper a novel scalable soft support vector clustering algorithm is proposed. Here softness is imparted to Support Vector Clustering paradigm by employing rough set theory and scalability is achieved using Multi Sphere Support Vector Clustering method. Empirical results show that the proposed method gives meaningful cluster abstractions

Open Access Repository of IISc Research Publications

Predictive Approaches for Sparse Model Learning

Author: Keerthi SS
Shevade SK
Sundararajan S
Publication venue: Springer Verlag
Publication date
Field of study

In this paper we investigate cross validation and Geisser’s sample reuse approaches for designing linear regression models. These approaches generate sparse models by optimizing multiple smoothing parameters. Within certain approximation, we establish equivalence relationships that exist among these approaches. The computational complexity, sparseness and performance on some benchmark data sets are compared with those obtained using relevance vector machine

Open Access Repository of IISc Research Publications

Scalable non-linear Support Vector Machine using hierarchical clustering

Author: Asharaf S
Murty Narasimha M
Shevade SK
Publication venue: IEEE
Publication date
Field of study

This paper discusses a method for scaling SVM with Gaussian kernel function to handle large data sets by using a selective sampling strategy for the training set. It employs a scalable hierarchical clustering algorithm to construct cluster indexing structures of the training data in the kernel induced feature space. These are then used for selective sampling of the training data for SVM to impart scalability to the training process. Empirical studies made on real world data sets show that the proposed strategy performs well on large data sets

Open Access Repository of IISc Research Publications