Search CORE

39 research outputs found

The efficiency of logistic regression compared to normal discriminant analysis under class-conditional classification noise

Author: Bi Yingtao
Jeske Daniel R.
Publication venue: Elsevier Inc.
Publication date: 31/08/2010
Field of study

AbstractIn many real world classification problems, class-conditional classification noise (CCC-Noise) frequently deteriorates the performance of a classifier that is naively built by ignoring it. In this paper, we investigate the impact of CCC-Noise on the quality of a popular generative classifier, normal discriminant analysis (NDA), and its corresponding discriminative classifier, logistic regression (LR). We consider the problem of two multivariate normal populations having a common covariance matrix. We compare the asymptotic distribution of the misclassification error rate of these two classifiers under CCC-Noise. We show that when the noise level is low, the asymptotic error rates of both procedures are only slightly affected. We also show that LR is less deteriorated by CCC-Noise compared to NDA. Under CCC-Noise contexts, the Mahalanobis distance between the populations plays a vital role in determining the relative performance of these two procedures. In particular, when this distance is small, LR tends to be more tolerable to CCC-Noise compared to NDA

Elsevier - Publisher Connector

Tree-Based Position Weight Matrix Approach to Model Transcription Factor Binding Site Profiles

Author: Bi Yingtao
Davuluri Ramana V.
Gupta Ravi
Kim Hyunsoo
Publication venue: Public Library of Science
Publication date: 01/01/2011
Field of study

Most of the position weight matrix (PWM) based bioinformatics methods developed to predict transcription factor binding sites (TFBS) assume each nucleotide in the sequence motif contributes independently to the interaction between protein and DNA sequence, usually producing high false positive predictions. The increasing availability of TF enrichment profiles from recent ChIP-Seq methodology facilitates the investigation of dependent structure and accurate prediction of TFBSs. We develop a novel Tree-based PWM (TPWM) approach to accurately model the interaction between TF and its binding site. The whole tree-structured PWM could be considered as a mixture of different conditional-PWMs. We propose a discriminative approach, called TPD (TPWM based Discriminative Approach), to construct the TPWM from the ChIP-Seq data with a pre-existing PWM. To achieve the maximum discriminative power between the positive and negative datasets, the cutoff value is determined based on the Matthew Correlation Coefficient (MCC). The resulting TPWMs are evaluated with respect to accuracy on extensive synthetic datasets. We then apply our TPWM discriminative approach on several real ChIP-Seq datasets to refine the current TFBS models stored in the TRANSFAC database. Experiments on both the simulated and real ChIP-Seq data show that the proposed method starting from existing PWM has consistently better performance than existing tools in detecting the TFBSs. The improved accuracy is the result of modelling the complete dependent structure of the motifs and better prediction of true positive rate. The findings could lead to better understanding of the mechanisms of TF-DNA interactions

CiteSeerX

Public Library of Science (PLOS)

PubMed Central

Evaluation of data discretization methods to derive platform independent isoform expression signatures for multi-class tumor subtyping

Author: Ramana V Davuluri
Segun Jung
Yingtao Bi
Publication venue: 'Springer Science and Business Media LLC'
Publication date: 01/01/2015
Field of study

Crossref

Recommended from our members

Identifying the substrate proteins of U-box E3s E4B and CHIP by orthogonal ubiquitin transfer

Author: Bhuripanyo Karan
Bi Yingtao
Chazin Walter J.
Chen Geng
Duong Duc
Kiyokawa Hiroaki
Liu Ruochuan
Liu Xianpeng
Seyfried Nicholas T.
Wang Yiyang
Yin Jun
Zhao Bo
Zhou Han
Zhou Li
Publication venue
Publication date: 12/02/2024
Field of study

E3 ubiquitin (UB) ligases E4B and carboxyl terminus of Hsc70-interacting protein (CHIP) use a common U-box motif to transfer UB from E1 and E2 enzymes to their substrate proteins and regulate diverse cellular processes. To profile their ubiquitination targets in the cell, we used phage display to engineer E2-E4B and E2-CHIP pairs that were free of cross-reactivity with the native UB transfer cascades. We then used the engineered E2-E3 pairs to construct “orthogonal UB transfer (OUT)” cascades so that a mutant UB (xUB) could be exclusively used by the engineered E4B or CHIP to label their substrate proteins. Purification of xUB-conjugated proteins followed by proteomics analysis enabled the identification of hundreds of potential substrates of E4B and CHIP in human embryonic kidney 293 cells. Kinase MAPK3 (mitogen-activated protein kinase 3), methyltransferase PRMT1 (protein arginine N-methyltransferase 1), and phosphatase PPP3CA (protein phosphatase 3 catalytic subunit alpha) were identified as the shared substrates of the two E3s. Phosphatase PGAM5 (phosphoglycerate mutase 5) and deubiquitinase OTUB1 (ovarian tumor domain containing ubiquitin aldehyde binding protein 1) were confirmed as E4B substrates, and b-catenin and CDK4 (cyclin-dependent kinase 4) were confirmed as CHIP substrates. On the basis of the CHIP-CDK4 circuit identified by OUT, we revealed that CHIP signals CDK4 degradation in response to endoplasmic reticulum stress

Knowledge UChicago

Distinct mechanisms control genome recognition by p53 at its target genes linked to different cell fates.

Author: Bi Yingtao
Davuluri Ramana V
Debler Erik W.
Farkas Marina
Hashimoto Hideharu
Manfredi James J.
McMahon Steven B.
Resnick-Silverman Lois
Publication venue: Jefferson Digital Commons
Publication date: 20/01/2021
Field of study

The tumor suppressor p53 integrates stress response pathways by selectively engaging one of several potential transcriptomes, thereby triggering cell fate decisions (e.g., cell cycle arrest, apoptosis). Foundational to this process is the binding of tetrameric p53 to 20-bp response elements (REs) in the genome (RRRCWWGYYYN0-13RRRCWWGYYY). In general, REs at cell cycle arrest targets (e.g. p21) are of higher affinity than those at apoptosis targets (e.g., BAX). However, the RE sequence code underlying selectivity remains undeciphered. Here, we identify molecular mechanisms mediating p53 binding to high- and low-affinity REs by showing that key determinants of the code are embedded in the DNA shape. We further demonstrate that differences in minor/major groove widths, encoded by G/C or A/T bp content at positions 3, 8, 13, and 18 in the RE, determine distinct p53 DNA-binding modes by inducing different Arg248 and Lys120 conformations and interactions. The predictive capacity of this code was confirmed in vivo using genome editing at the BAX RE to interconvert the DNA-binding modes, transcription pattern, and cell fate outcome

Jefferson Digital Commons

NPEBseq: nonparametric empirical bayesian-based procedure for differential expression analysis of RNA-seq data

Author: A Mortazavi
A Oshlack
A Oshlack
AN Brooks
C Trapnell
Cancer Genome Atlas N
CX Mao
CX Mao
D Risso
EP Consortium
H Jiang
H Kim
HK Ji
J Feng
J Li
JC Marioni
JH Bullard
JPZ Wang
K Kadota
KD Hansen
L Shi
LM McIntyre
M Evans
MA Dillies
MA Van De Wiel
MD Robinson
MD Robinson
MD Robinson
N Leng
P Glaus
PJ Balwierz
PS Hammerman
Ramana V Davuluri
RC Gentleman
RD Canales
S Anders
S Anders
S Durinck
S Pal
S Pal
S Tarazona
S Zheng
SB Montgomery
TD Schmittgen
TJ Hardcastle
Yingtao Bi
Publication venue: 'Springer Science and Business Media LLC'
Publication date
Field of study

Crossref

The efficiency of logistic regression compared to normal discriminant analysis under class-conditional classification noise

Author: Bi Yingtao
Jeske Daniel R.
Publication venue
Publication date
Field of study

In many real world classification problems, class-conditional classification noise (CCC-Noise) frequently deteriorates the performance of a classifier that is naively built by ignoring it. In this paper, we investigate the impact of CCC-Noise on the quality of a popular generative classifier, normal discriminant analysis (NDA), and its corresponding discriminative classifier, logistic regression (LR). We consider the problem of two multivariate normal populations having a common covariance matrix. We compare the asymptotic distribution of the misclassification error rate of these two classifiers under CCC-Noise. We show that when the noise level is low, the asymptotic error rates of both procedures are only slightly affected. We also show that LR is less deteriorated by CCC-Noise compared to NDA. Under CCC-Noise contexts, the Mahalanobis distance between the populations plays a vital role in determining the relative performance of these two procedures. In particular, when this distance is small, LR tends to be more tolerable to CCC-Noise compared to NDA.Class noise Misclassification rate Misspecified model Asymptotic distribution

Research Papers in Economics

Comparative evaluation of isoform-level gene expression estimation algorithms for RNA-seq and exon-array platforms

Author: Curia
Manoj Kandpal
Matthew Dapas
Pal
Ramana V. Davuluri
Shulzhenko
Yingtao Bi
Publication venue: 'Oxford University Press (OUP)'
Publication date
Field of study

Crossref