Search CORE

22 research outputs found

Automatic structure classification of small proteins using random forest

Author: A Andreeva
A Andreeva
AG Murzin
AV Levitin
C Hadley
CHQ Ding
E Ie
G Zhanga
H Shen
HM Berman
I Chung
I Melvin
IH Witten
J Cheng
J Wu
JE Gewehr
JF Gibrat
Jonathan D Hirst
JR Quinlan
K Chen
KC Chou
L Breiman
L Holm
L Kurgan
M Gerstein
MB Swindells
MTA Shamim
O Çamoğlu
P Baldi
P Han
P Jain
P Klein
Pooja Jain
S Kim
S Mile
S Vinga
SE Brenner
SE Hamby
SF Altschul
SP Kanaan
SS Krishna
U Hobohm
V Sam
W Kabsch
X Chen
X Chen
XM Zhao
Y Cai
Publication venue: BioMed Central
Publication date: 01/01/2010
Field of study

Abstract Background Random forest, an ensemble based supervised machine learning algorithm, is used to predict the SCOP structural classification for a target structure, based on the similarity of its structural descriptors to those of a template structure with an equal number of secondary structure elements (SSEs). An initial assessment of random forest is carried out for domains consisting of three SSEs. The usability of random forest in classifying larger domains is demonstrated by applying it to domains consisting of four, five and six SSEs. Results Random forest, trained on SCOP version 1.69, achieves a predictive accuracy of up to 94% on an independent and non-overlapping test set derived from SCOP version 1.73. For classification to the SCOP <it>Class, Fold, Super-family </it>or <it>Family </it>levels, the predictive quality of the model in terms of Matthew's correlation coefficient (MCC) ranged from 0.61 to 0.83. As the number of constituent SSEs increases the MCC for classification to different structural levels decreases. Conclusions The utility of random forest in classifying domains from the place-holder classes of SCOP to the true <it>Class, Fold, Super-family </it>or <it>Family </it>levels is demonstrated. Issues such as introduction of a new structural level in SCOP and the merger of singleton levels can also be addressed using random forest. A real-world scenario is mimicked by predicting the classification for those protein structures from the PDB, which are yet to be assigned to the SCOP classification hierarchy.</p

Crossref

Springer - Publisher Connector

Directory of Open Access Journals

PubMed Central

The genomic basis of vomeronasal-mediated behaviour

Author: A Berghard
A Berghard
A Chess
A Dewan
A Keller
A Petrulis
A Sbarbati
AL Sherborne
B Bufe
B Schaal
BG Leypold
BP Menco
C Boschat
C Dulac
C Meslin
C Mucignat-Caretta
C Yang
CJ Bult
CJ Wysocki
Darren W. Logan
DH Robertson
DM Ferrero
DM Webb
DW Logan
DW Logan
EH Wynn
EK Roberts
EM Blass
EM Norlin
ER Liman
ER Liman
F Nodari
F Papes
G Herrada
G Wang
H Kimoto
H Kimoto
H Matsunami
H Matsunami
H Yang
H Yang
H Zhao
HM Bruce
HM Schellinck
I Rodriguez
I Rodriguez
J Brechbuhl
J Loconto
J Zhang
JL Hurst
JM Mudge
JM Young
JM Young
JM Young
K Kobayakawa
K Punta Del
KB Doving
KR Kelliher
L Silvotti
L Stowers
L Stowers
L Stowers
L Weyden van der
L Yu
LM Huckins
M Ma
Maria O. Levitin
MD Thom
ND Hastie
NJ Ryba
NS Hasen
NS Hasen
P Chamero
P Chamero
P Zhang
PE Pedersen
PM Clissold
RC Karn
RD Emes
RL Doty
S Dey
S Gelstein
S Haga
S Kim
S Martini
S Riviere
S Yoshinaga
SA Cheetham
SA Roberts
SA Roberts
SD Liberles
T Ishii
T Ishii
T Ishii
T Kimchi
T Leinders-Zufall
T Leinders-Zufall
T Leinders-Zufall
TD Wyatt
TD Wyatt
U Rudolph
WC Skarnes
WE Grus
WE Grus
X Zhang
Ximena Ibarra-Soria
Y Ben-Shaul
Y Isogai
Publication venue: 'Springer Science and Business Media LLC'
Publication date
Field of study

Crossref

De novo gene signature identification from single-cell RNA-seq with hierarchical Poisson factorization

Author: Blei DM
Bruce JN
Bush EC
Canoll P
Cheng YL
Iavarone A
Lasorella A
Levitin HM
Ruiz FJR
Sims PA
Yuan J
Publication venue
Publication date: 01/02/2019
Field of study

Common approaches to gene signature discovery in single-cell RNA-sequencing (scRNA-seq) depend upon predefined structures like clusters or pseudo-temporal order, require prior normalization, or do not account for the sparsity of single-cell data. We present single-cell hierarchical Poisson factorization (scHPF), a Bayesian factorization method that adapts hierarchical Poisson factorization (Gopalan et al, Proceedings of the 31st Conference on Uncertainty in Artificial Intelligence, 326) for de novo discovery of both continuous and discrete expression patterns from scRNA-seq. scHPF does not require prior normalization and captures statistical properties of single-cell data better than other methods in benchmark datasets. Applied to scRNA-seq of the core and margin of a high-grade glioma, scHPF uncovers marked differences in the abundance of glioma subpopulations across tumor regions and regionally associated expression biases within glioma subpopulations. scHFP revealed an expression signature that was spatially biased toward the glioma-infiltrated margins and associated with inferior survival in glioblastoma

CUED - Cambridge University Engineering Department