Search CORE

Sparsest factor analysis for clustering variables: a matrix decomposition approach

Author: A Stegeman
AJ Izenman
BS Everitt
C Spearman
CC Aggarwal
D Knowles
DM Zou
G Gan
GAF Seber
HH Harman
IT Jolliffe
J de Leeuw
JMF ten Berge
JMF ten Berge
K Adachi
K Adachi
K Adachi
K Hirose
K Hirose
Kohei Adachi
L Eldén
LR Goldberg
M Rattray
M Vichi
MJ Zaki
Nickolay T. Trendafilov
Nickolay T. Trendafilov
NT Trendafilov
NT Trendafilov
PT Costa
R Mazumder
R Reyment
S Unkel
SA Mulaik
Publication venue: 'Springer Science and Business Media LLC'
Publication date: 13/04/2017
Field of study

We propose a new procedure for sparse factor analysis (FA) such that each variable loads only one common factor. Thus, the loading matrix has a single nonzero element in each row and zeros elsewhere. Such a loading matrix is the sparsest possible for certain number of variables and common factors. For this reason, the proposed method is named sparsest FA (SSFA). It may also be called FA-based variable clustering, since the variables loading the same common factor can be classified into a cluster. In SSFA, all model parts of FA (common factors, their correlations, loadings, unique factors, and unique variances) are treated as fixed unknown parameter matrices and their least squares function is minimized through specific data matrix decomposition. A useful feature of the algorithm is that the matrix of common factor scores is re-parameterized using QR decomposition in order to efficiently estimate factor correlations. A simulation study shows that the proposed procedure can exactly identify the true sparsest models. Real data examples demonstrate the usefulness of the variable clustering performed by SSFA

Open Research Online (The Open University)

Piecewise polynomial approximation of probability density functions with application to uncertainty quantification for stochastic PDEs

Author: A Criminisi
AJ Izenman
BW Silverman
CF Li
F Nobile
F Nobile
G Capodaglio
I Babuška
I Babuška
I Babuška
J Fan
M Hegland
MD Gunzburger
MS Gerber
N-B Heidenreich
NG Andronova
P Ciarlet
P Frauenfelder
P Hall
S Brenner
U Lopez-Novoa
X Xu
Z Xie
Z Zivkovic
ZI Botev
ZI Botev
Publication venue: 'Springer Science and Business Media LLC'
Publication date: 05/12/2019
Field of study

The probability density function (PDF) associated with a given set of samples is approximated by a piecewise-linear polynomial constructed with respect to a binning of the sample space. The kernel functions are a compactly supported basis for the space of such polynomials, i.e. finite element hat functions, that are centered at the bin nodes rather than at the samples, as is the case for the standard kernel density estimation approach. This feature naturally provides an approximation that is scalable with respect to the sample size. On the other hand, unlike other strategies that use a finite element approach, the proposed approximation does not require the solution of a linear system. In addition, a simple rule that relates the bin size to the sample size eliminates the need for bandwidth selection procedures. The proposed density estimator has unitary integral, does not require a constraint to enforce positivity, and is consistent. The proposed approach is validated through numerical examples in which samples are drawn from known PDFs. The approach is also used to determine approximations of (unknown) PDFs associated with outputs of interest that depend on the solution of a stochastic partial differential equation

arXiv.org e-Print Archive

Current measures of metabolic heterogeneity within cervical cancer do not predict disease outcome

Author: A Paulino
A Pugachev
AJ Izenman
B Jähne
BR Thomadsen
CR Schmidtlein
DC Lay
DL Bailey
EA Kidd
F O'Sullivan
Frank J Brooks
GB Arfken
GH Heppner
GJR Cook
JW Keyes Jr
L Révész
M Picchio
MG Vander Heiden
NA Mayr
P Gerlee
Perry W Grigsby
RA Gatenby
RA Weinberg
RK Jain
RL Wahl
S Zhao
TR Miller
Publication venue: BioMed Central
Publication date: 01/01/2011
Field of study

Abstract Background A previous study evaluated the intra-tumoral heterogeneity observed in the uptake of F-18 fluorodeoxyglucose (FDG) in pre-treatment positron emission tomography (PET) scans of cancers of the uterine cervix as an indicator of disease outcome. This was done via a novel statistic which ostensibly measured the spatial variations in intra-tumoral metabolic activity. In this work, we argue that statistic is intrinsically <it>non</it>-spatial, and that the apparent delineation between unsuccessfully- and successfully-treated patient groups via that statistic is spurious. Methods We first offer a straightforward mathematical demonstration of our argument. Next, we recapitulate an assiduous re-analysis of the originally published data which was derived from FDG-PET imagery. Finally, we present the results of a principal component analysis of FDG-PET images similar to those previously analyzed. Results We find that the previously published measure of intra-tumoral heterogeneity is intrinsically non-spatial, and actually is only a surrogate for tumor volume. We also find that an optimized linear combination of more canonical heterogeneity quantifiers does not predict disease outcome. Conclusions Current measures of intra-tumoral metabolic activity are not predictive of disease outcome as has been claimed previously. The implications of this finding are: clinical categorization of patients based upon these statistics is invalid; more sophisticated, and perhaps innately-geometric, quantifications of metabolic activity are required for predicting disease outcome.</p

Springer - Publisher Connector

Public Library of Science (PLOS)

Digital Commons@Becker

Manifold Learning for Human Population Structure Studies

Author: A Chakravarti
AB Lee
AB Lee
AJ Izenman
AL Price
B Li
BM Henn
C Deng
DR Bentley
E Kosman
F Collins
GV Kryukov
Hoicheong Siu
J Friedman
J Novembre
J Shendure
J Tenenbaum
J Zhang
J Zhang
J Zhang
JC Venter
JE Pool
L Cavalli-Sforza
Li Jin
LK Saul
M Belkin
ML Metzker
Momiao Xiong
N Kambhatla
N Patterson
P Menozzi
P Paschou
PE Smouse
R Drmanac
R Nielsen
R Wang
S Biswas
S Roweis
S Yan
SY Kim
T Tibshirani
W Guan
W Zhang
Yun Li
Publication venue: Public Library of Science
Publication date: 01/01/2012
Field of study

The dimension of the population genetics data produced by next-generation sequencing platforms is extremely high. However, the “intrinsic dimensionality” of sequence data, which determines the structure of populations, is much lower. This motivates us to use locally linear embedding (LLE) which projects high dimensional genomic data into low dimensional, neighborhood preserving embedding, as a general framework for population structure and historical inference. To facilitate application of the LLE to population genetic analysis, we systematically investigate several important properties of the LLE and reveal the connection between the LLE and principal component analysis (PCA). Identifying a set of markers and genomic regions which could be used for population structure analysis will provide invaluable information for population genetics and association studies. In addition to identifying the LLE-correlated or PCA-correlated structure informative marker, we have developed a new statistic that integrates genomic information content in a genomic region for collectively studying its association with the population structure and LASSO algorithm to search such regions across the genomes. We applied the developed methodologies to a low coverage pilot dataset in the 1000 Genomes Project and a PHASE III Mexico dataset of the HapMap. We observed that 25.1%, 44.9% and 21.4% of the common variants and 89.2%, 92.4% and 75.1% of the rare variants were the LLE-correlated markers in CEU, YRI and ASI, respectively. This showed that rare variants, which are often private to specific populations, have much higher power to identify population substructure than common variants. The preliminary results demonstrated that next generation sequencing offers a rich resources and LLE provide a powerful tool for population structure analysis

CiteSeerX

Universidade do Minho: RepositoriUM

FigShare

Evaluation of extra-virgin olive oils shelf life using an electronic tongue-chemometric approach

Author: A Cimato
ACA Veloso
AI Mendez
AJ Izenman
AM Peres
Ana C. A. Veloso
Anonymous
António M. Peres
C Apetrei
C Apetrei
C Fadda
C Samaniego-Sánchez
D Bertsimas
DL García-González
E Stefanoudaki
F Sinesio
G Pristouri
H Jabeur
IA Afaneh
IM Apetrei
J Abbadi
J Cadima
J Miller
JM Gutiérrez
José A. Pereira
K Ben-Hassine
LG Dias
LG Dias
LG Dias
Luís G. Dias
M Esti
M Kuhn
ME Escuderos
ME Escuderos
MH Alu’datt
ML Rodríguez-Méndez
MM Torres
MS Cosio
MS Cosio
N Rodrigues
Nuno Rodrigues
P Oliveri
R Garrido-Delgado
RW Kennard
S Dabbou
S Gómez-Alonso
S Kirkpatrick
SA Vekiari
TM Rababah
V Vacca
WN Venables
Y Kobayashi
Z Ayyad
Z Haddi
Publication venue: 'Springer Science and Business Media LLC'
Publication date: 01/01/2017
Field of study

Physicochemical quality parameters, olfactory and gustatoryretronasal positive sensations of extra-virgin olive oils vary during storage leading to a decrease in the overall quality. Olive oil quality decline may prevent the compliance of olive oil quality with labeling and significantly reduce shelf life, resulting in important economic losses and negatively condition the consumer confidence. The feasibility of applying an electronic tongue to assess olive oils usual commercial light storage conditions and storage time was evaluated and compared with the discrimination potential of physicochemical or positive olfactory/gustatory sensorial parameters. Linear discriminant models, based on subsets of 58 electronic tongue sensor signals, selected by the meta-heuristic simulated annealing variable selection algorithm, allowed the correct classification of olive oils according to the light exposition conditions and/or storage time (sensitivities and specificities for leave-one-out cross-validation: 8296 %). The predictive performance of the E-tongue approach was further evaluated using an external independent dataset selected using the KennardStone algorithm and, in general, better classification rates (sensitivities and specificities for external dataset: 67100 %) were obtained compared to those achieved using physicochemical or sensorial data. So, the work carried out is a proof-of-principle that the proposed electrochemical device could be a practical and versatile tool for, in a single and fast electrochemical assay, successfully discriminate olive oils with different storage times and/or exposed to different light conditions.The authors acknowledge the financial support from the strategic funding of UID/BIO/04469/2013 unit, from Project POCI-01-0145-FEDER-006984—Associate Laboratory LSRELCM funded by FEDER funds through COMPETE2020—Programa Operacional Competitividade e Internacionalização (POCI)—and by national funds through FCT—Fundação para a Ciência e a Tecnologia and under the strategic funding of UID/BIO/04469/2013 unit. Nuno Rodrigues thanks FCT, POPH-QREN and FSE for the Ph.D. Grant (SFRH/BD/104038/2014).info:eu-repo/semantics/publishedVersio

Biblioteca Digital do IPB

Gender, Obesity and Repeated Elevation of C-Reactive Protein: Data from the CARDIA Cohort

Author: A Cartier
A Khera
A Lukanova
A Nazmi
AJ Izenman
Arun S. Karlamangla
AS Karlamangla
AV Chobanian
C Gabay
C Siemes
CE Ruhl
David R. Jacobs
DE Alley
DM Lloyd-Jones
DS Goldstein
FE Harrell
GD Friedman
GL Myers
GR Cutter
Hyong Jin Cho
JR Greenfield
JR Rodgers
K Heikkila
KK Ray
KW Muir
LA Bazzano
M Cushman
M Groblewska
M Hamer
M Visser
Marcos Bote
ME Falagas
MF O’Connor
Michael R. Irwin
O Chenillot
P Bjorntorp
P Calabro
P Hu
PH Black
PM Ridker
PM Ridker
S Macintyre
S Sidney
SG Lakoski
Shinya Ishii
Stefan Kiechl
T Hastie
TA Pearson
Teresa E. Seeman
Publication venue: Public Library of Science
Publication date: 01/01/2012
Field of study

C-reactive Protein (CRP) measurements above 10 mg/L have been conventionally treated as acute inflammation and excluded from epidemiologic studies of chronic inflammation. However, recent evidence suggest that such CRP elevations can be seen even with chronic inflammation. The authors assessed 3,300 participants in The Coronary Artery Risk Development in Young Adults study, who had two or more CRP measurements between 1992/3 and 2005/6 to a) investigate characteristics associated with repeated CRP elevation above 10 mg/L; b) identify subgroups at high risk of repeated elevation; and c) investigate the effect of different CRP thresholds on the probability of an elevation being one-time rather than repeated. 225 participants (6.8%) had one-time and 103 (3.1%) had repeated CRP elevation above 10 mg/L. Repeated elevation was associated with obesity, female gender, low income, and sex hormone use. The probability of an elevation above 10 mg/L being one-time rather than repeated was lowest (51%) in women with body mass index above 31 kg/m2, compared to 82% in others. These findings suggest that CRP elevations above 10 mg/L in obese women are likely to be from chronic rather than acute inflammation, and that CRP thresholds above 10 mg/L may be warranted to distinguish acute from chronic inflammation in obese women

Public Library of Science (PLOS)

CiteSeerX

eScholarship - University of California

Limitations of Gene Duplication Models: Evolution of Modules in Protein Interaction Networks

Author: A del Sol
A Lancichinetti
A Mowshowitz
A Mowshowitz
A Mushegian
A Vazquez
A Wagner
A Wagner
A Wagner
AJ Izenman
AL Barabási
AL Barabási
B Aranda
BJ Breitkreutz
C Shannon
D Futuyma
D Hoaglin
D Watts
DB Stouffer
DC Plachetzki
DG Higgins
E Cerami
E Fernandez
E Koonin
E Lehman
E Levy
E Mayr
E Ziv
EV Koonin
F Chung
F Emmert-Streib
F Emmert-Streib
F Emmert-Streib
F Emmert-Streib
Frank Emmert-Streib
G Csardi
G Palla
G Wagner
H Yu
HW Ma
I Ispolatov
I Ispolatov
J Berg
J Felsenstein
J Hallinan
J Yoon
JP Onnela
K Evlampiev
K Maridia
K Popper
L Danon
L Holm
LL McQuitty
Lukasz Kurgan
M Dehmer
M Dehmer
M Lynch
M Lynch
M Middendorf
M Newman
M Newman
M Rosvall
MEJ Newman
MEJ Newman
MM Babu
N Przulj
O Ratmann
P Dehal
P Erdös
R Durrett
R Solomonoff
R Todeschini
RV Sole
S Ciliberti
S Fortunato
S Gould
S Lozano
S Shen-Orr
SF Altschul
SP Otto
T Dobzhansky
T Thorne
WK Kim
X Wang
Publication venue: Public Library of Science
Publication date: 01/01/2011
Field of study

It has been generally acknowledged that the module structure of protein interaction networks plays a crucial role with respect to the functional understanding of these networks. In this paper, we study evolutionary aspects of the module structure of protein interaction networks, which forms a mesoscopic level of description with respect to the architectural principles of networks. The purpose of this paper is to investigate limitations of well known gene duplication models by showing that these models are lacking crucial structural features present in protein interaction networks on a mesoscopic scale. This observation reveals our incomplete understanding of the structural evolution of protein networks on the module level

Queen's University Belfast Research Portal

CiteSeerX

Public Library of Science (PLOS)

arXiv.org e-Print Archive

Ensemble preconditioning for Markov chain Monte Carlo simulation

Author: A Jasra
AB Duncan
AJ Izenman
B Leimkuhler
Benedict Leimkuhler
C-R Hwang
Charles Matthews
D Foreman-Mackey
G Milstein
GA Pavliotis
GO Roberts
H Haario
J Andrés Christen
J Goodman
J Goodman
J Liu
J Martin
JM Hammersley
Jonathan Weare
L Rey-Bellet
M Girolami
M Hairer
MN Rosenbluth
N Bou-Rabee
N Bou-Rabee
N Chopin
O Cappé
OF Christensen
PJ Rossky
RM Neal
S Duane
WR Gilks
Y Iba
Publication venue: 'Springer Science and Business Media LLC'
Publication date: 13/07/2016
Field of study

We describe parallel Markov chain Monte Carlo methods that propagate a collective ensemble of paths, with local covariance information calculated from neighboring replicas. The use of collective dynamics eliminates multiplicative noise and stabilizes the dynamics thus providing a practical approach to difficult anisotropic sampling problems in high dimensions. Numerical experiments with model problems demonstrate that dramatic potential speedups, compared to various alternative schemes, are attainable