Search CORE

INRIA a CCSD electronic archive server

Recipes for sparse LDA of horizontal data

Author: A Marshall
A Montanari
A Rencher
B Flury
B Flury
BG Osborne
C Hage
D Bragoli
DG Calò
DM Witten
GH Golub
H Shin
IS Dhillon
IT Jolliffe
J Duchene
J Duintjer Tebbens
J Fan
JC Gower
JC Gower
L Clemmensen
M Ng
M Vichi
M Zou
ME Timmerman
N Boumal
N Hao
NA Campbell
NT Trendafilov
NT Trendafilov
NT Trendafilov
NT Trendafilov
NT Trendafilov
P Bickel
P-A Absil
R Tibshirani
RA Fisher
S Mussard
T Cai
T Hastie
TP Conrads
W Gander
WJ Krzanowski
WJ Krzanowski
WJ Krzanowski
WJ Krzanowski
WJ Krzanowski
Z Wen
Publication venue: 'Springer Science and Business Media LLC'
Publication date: 01/01/2016
Field of study

Many important modern applications require analyzing data with more variables than observations, called for short horizontal. In such situation the classical Fisher’s linear discriminant analysis (LDA) does not possess solution because the within-group scatter matrix is singular. Moreover, the number of the variables is usually huge and the classical type of solutions (discriminant functions) are difficult to interpret as they involve all available variables. Nowadays, the aim is to develop fast and reliable algorithms for sparse LDA of horizontal data. The resulting discriminant functions depend on very few original variables, which facilitates their interpretation. The main theoretical and numerical challenge is how to cope with the singularity of the within-group scatter matrix. This work aims at classifying the existing approaches according to the way they tackle this singularity issue, and suggest new ones

Open Research Online (The Open University)

Modelling the neonatal system: A joint analysis of length of stay and patient pathways

Author: Demir E
Krzanowski WJ
Molenberghs G
National Audit Office
National Neonatal Audit Programme (NNAP)
National Neonatal Audit Programme (NNAP)
Zhao LP
Publication venue: 'Wiley'
Publication date: 09/10/2020
Field of study

© 2019 John Wiley & Sons, Ltd. This is the peer reviewed version of the following article: Modelling the neonatal system: A joint analysis of length of stay and patient pathways, which has been published on 27/11/2019 in final form at https://doi.org/10.1002/hpm.2928. This article may be used for non-commercial purposes in accordance with Wiley Terms and Conditions for Use of Self-Archived Versions.In the United Kingdom, one in seven babies require specialist neonatal care after birth, with a noticeable increase in demand. Coupled with budgeting constraints and lack of investment means that neonatal units are struggling. This will inevitably have an impact on baby's length of stay (LoS) and the performance of the service. Models have previously been developed to capture individual babies' pathways to investigate the longitudinal cycle of care. However, no models have been developed to examine the joint analysis of LoS and babies' pathways. LoS at each stage of care is a critical driver of both the clinical outcomes and economic performance of the neonatal system. Using the generalized linear mixed modelling approach, extended to accommodate multiple outcomes, the association between neonate's pathway to discharge and LoS is examined. Using the data about 1002 neonates, we noticed that there is a high positive association between baby's pathway and total LoS, suggesting that discharge policies needs to be looked at more carefully. A novel statistical approach that examined the association of key outcomes and how it evolved over time is developed. Its applicability can be extended to other types of long-term care or diseases, such as heart failure and stroke.Peer reviewedFinal Accepted Versio

University of Hertfordshire Research Archive

NMJ-morph reveals principal components of synaptic morphology influencing structure–function relationships at the neuromuscular junction

Author: Coërs C
Engel AG
Keesey JC
Krzanowski WJ
Kühne W
Murray L
Tomas J
Tschiriew MS
Tuffery AR
Publication venue: 'The Royal Society'
Publication date: 01/12/2016
Field of study

The ability to form synapses is one of the fundamental properties required by the mammalian nervous system to generate network connectivity. Structural and functional diversity among synaptic populations is a key hallmark of network diversity, and yet we know comparatively little about the morphological principles that govern variability in the size, shape and strength of synapses. Using the mouse neuromuscular junction (NMJ) as an experimentally accessible model synapse, we report on the development of a robust, standardized methodology to facilitate comparative morphometric analysis of synapses (‘NMJ-morph’). We used NMJ-morph to generate baseline morphological reference data for 21 separate pre- and post-synaptic variables from 2160 individual NMJs belonging to nine anatomically distinct populations of synapses, revealing systematic differences in NMJ morphology between defined synaptic populations. Principal components analysis revealed that overall NMJ size and the degree of synaptic fragmentation, alongside pre-synaptic axon diameter, were the most critical parameters in defining synaptic morphology. ‘Average’ synaptic morphology was remarkably conserved between comparable synapses from the left and right sides of the body. Systematic differences in synaptic morphology predicted corresponding differences in synaptic function that were supported by physiological recordings, confirming the robust relationship between synaptic size and strength

Edinburgh Research Explorer

Roc632: An overview

Author: A Rosenwald
C Ambroise
EW Steyerberg
H Liu
LY Geer
P Collinson
T Fawcett
T Vu
WJ Krzanowski
Y Foucher
Publication venue: 'Springer Science and Business Media LLC'
Publication date: 01/01/2017
Field of study

The present paper aims to analyze and explore the ROC632 package, specifying its main characteristics and functions. More specifically, the goal of this study is the evaluation of the effectiveness of the package and its strengths and weaknesses. This package was created in order to overcome the lack of information concerning incomplete time-to-event data, adapting the 0.632+ bootstrap estimator for the evaluation of time dependent ROC curves. By applying this package to a specific dataset (DLBCLpatients), it becomes possible to assess tangible data, determining if it is able to analyze complete and incomplete data efficiently and without bias.(undefined)info:eu-repo/semantics/publishedVersio

Universidade do Minho: RepositoriUM

Objective surface evaluation of fiber reinforced polymer composites

Author: A Bussiba
BB Hubbard
D Czarkowski
DB Percival
DE Johnson
G Landini
G Palardy
JG Daugman
K Wood
M Roberts
MH Bharati
N Hu
P Podsiadlo
PJ Schubel
R Sarathi
S Hu
S Mallat
S Pal
S Palmer
SI Chang
Stuart Palmer
T Lindblad
TH Loutas
Wayne Hall
WJ Krzanowski
Publication venue: 'Springer Science and Business Media LLC'
Publication date: 01/01/2013
Field of study

The mechanical properties of advanced composites are essential for their structural performance, but the surface finish on exterior composite panels is of critical importance for customer satisfaction. This paper describes the application of wavelet texture analysis (WTA) to the task of automatically classifying the surface finish properties of two fiber reinforced polymer (FRP) composite construction types (clear resin and gel-coat) into three quality grades. Samples were imaged and wavelet multi-scale decomposition was used to create a visual texture representation of the sample, capturing image features at different scales and orientations. Principal components analysis was used to reduce the dimensionality of the texture feature vector, permitting successful classification of the samples using only the first principal component. This work extends and further validates the feasibility of this approach as the basis for automated non-contact classification of composite surface finish using image analysis.<br /

Deakin Research Online

Fast, automated measurement of nematode swimming (thrashing) without morphometry

Author: A Fire
AC Eziefula
AK Jones
C Restif
CJ Cronin
CJ Cronin
D Ramot
David B Sattelle
DG Colley
E Culetto
EM Jorgensen
G Tsechpenakis
GD Tsibidis
I Ruvinsky
IT Jolliffe
RS Kamath
S Brenner
SD Buckingham
SH Chalasani
SH Simonetta
Steven D Buckingham
W Geng
W Geng
WJ Krzanowski
Z Feng
Publication venue: BioMed Central
Publication date: 01/01/2009
Field of study

Background: The "thrashing assay", in which nematodes are placed in liquid and the frequency of lateral swimming ("thrashing") movements estimated, is a well-established method for measuring motility in the genetic model organism Caenorhabditis elegans as well as in parasitic nematodes. It is used as an index of the effects of drugs, chemicals or mutations on motility and has proved useful in identifying mutants affecting behaviour. However, the method is laborious, subject to experimenter error, and therefore does not permit high-throughput applications. Existing automation methods usually involve analysis of worm shape, but this is computationally demanding and error-prone. Here we present a novel, robust and rapid method of automatically counting the thrashing frequency of worms that avoids morphometry but nonetheless gives a direct measure of thrashing frequency. Our method uses principal components analysis to remove the background, followed by computation of a covariance matrix of the remaining image frames from which the interval between statistically-similar frames is estimated. Results: We tested the performance of our covariance method in measuring thrashing rates of worms using mutations that affect motility and found that it accurately substituted for laborious, manual measurements over a wide range of thrashing rates. The algorithm used also enabled us to determine a dose-dependent inhibition of thrashing frequency by the anthelmintic drug, levamisole, illustrating the suitability of the system for assaying the effects of drugs and chemicals on motility. Furthermore, the algorithm successfully measured the actions of levamisole on a parasitic nematode, Haemonchus contortus, which undergoes complex contorted shapes whilst swimming, without alterations in the code or of any parameters, indicating that it is applicable to different nematode species, including parasitic nematodes. Our method is capable of analyzing a 30 s movie in less than 30 s and can therefore be deployed in rapid screens. Conclusion: We demonstrate that a covariance-based method yields a fast, reliable, automated measurement of C. elegans motility which can replace the far more time-consuming, manual method. The absence of a morphometry step means that the method can be applied to any nematode that swims in liquid and, together with its speed, this simplicity lends itself to deployment in large-scale chemical and genetic screens. </p

Directory of Open Access Journals

Oxford University Research Archive

UCL Discovery

The University of Manchester - Institutional Repository

A comparative analysis of predictive models of morbidity in intensive care unit after cardiac surgery – Part II: an illustrative example

Author: B Biagioli
Bonizella Biagioli
C Routsi
CM Bishop
D Testi
DS Sivia
DW Hosmer
E Rivers
EB Fortescue
Emanuela Barbini
FH Edwards
G Marshall
Gabriele Cevenini
J Tuchschmidt
JH Heijmans
JL Vincent
JL Vincent
K Fukunaga
L Gattinoni
MATLAB
O Boyd
P Armitage
P Giomarelli
P Pölönen
P Pölönen
Paolo Barbini
Pierpaolo Giomarelli
RO Duda
S Dreiseitl
Sabino Scolletta
TL Higgins
TL Higgins
WC Shoemaker
WJ Krzanowski
Publication venue: BioMed Central
Publication date: 01/01/2007
Field of study

Abstract Background Popular predictive models for estimating morbidity probability after heart surgery are compared critically in a unitary framework. The study is divided into two parts. In the first part modelling techniques and intrinsic strengths and weaknesses of different approaches were discussed from a theoretical point of view. In this second part the performances of the same models are evaluated in an illustrative example. Methods Eight models were developed: Bayes linear and quadratic models, <it>k</it>-nearest neighbour model, logistic regression model, Higgins and direct scoring systems and two feed-forward artificial neural networks with one and two layers. Cardiovascular, respiratory, neurological, renal, infectious and hemorrhagic complications were defined as morbidity. Training and testing sets each of 545 cases were used. The optimal set of predictors was chosen among a collection of 78 preoperative, intraoperative and postoperative variables by a stepwise procedure. Discrimination and calibration were evaluated by the area under the receiver operating characteristic curve and Hosmer-Lemeshow goodness-of-fit test, respectively. Results Scoring systems and the logistic regression model required the largest set of predictors, while Bayesian and <it>k</it>-nearest neighbour models were much more parsimonious. In testing data, all models showed acceptable discrimination capacities, however the Bayes quadratic model, using only three predictors, provided the best performance. All models showed satisfactory generalization ability: again the Bayes quadratic model exhibited the best generalization, while artificial neural networks and scoring systems gave the worst results. Finally, poor calibration was obtained when using scoring systems, <it>k</it>-nearest neighbour model and artificial neural networks, while Bayes (after recalibration) and logistic regression models gave adequate results. Conclusion Although all the predictive models showed acceptable discrimination performance in the example considered, the Bayes and logistic regression models seemed better than the others, because they also had good generalization and calibration. The Bayes quadratic model seemed to be a convincing alternative to the much more usual Bayes linear and logistic regression models. It showed its capacity to identify a minimum core of predictors generally recognized as essential to pragmatically evaluate the risk of developing morbidity after heart surgery.</p

Archivio della Ricerca - Università degli Studi di Siena

Directory of Open Access Journals

Lund University Publications

The projection score - an evaluation criterion for variable subset selection in PCA visualization

Author: AA Shabalin
AE Raftery
C Boutsidis
C Haslinger
Charlotte Soneson
DA Jackson
DM Witten
DT Ross
E Bair
GP McCabe
H Hotelling
H Hotelling
H Shen
H Zou
H Zou
I Guyon
IM Johnstone
IM Johnstone
IT Jolliffe
IT Jolliffe
IT Jolliffe
K Hoffmann
K Pearson
M Lee
Magnus Fontes
ME Ross
MG Tadesse
O Modlich
PR Peres-Neto
R Tibshirani
R Varshavsky
S Bungaro
S Dray
SY Kassim
T Hastie
T Hastie
TR Golub
WJ Krzanowski
Y Liu
Y Lu
ZD Bai
Publication venue: BioMed Central
Publication date: 01/01/2011
Field of study

Abstract Background In many scientific domains, it is becoming increasingly common to collect high-dimensional data sets, often with an exploratory aim, to generate new and relevant hypotheses. The exploratory perspective often makes statistically guided visualization methods, such as Principal Component Analysis (PCA), the methods of choice. However, the clarity of the obtained visualizations, and thereby the potential to use them to formulate relevant hypotheses, may be confounded by the presence of the many non-informative variables. For microarray data, more easily interpretable visualizations are often obtained by filtering the variable set, for example by removing the variables with the smallest variances or by only including the variables most highly related to a specific response. The resulting visualization may depend heavily on the inclusion criterion, that is, effectively the number of retained variables. To our knowledge, there exists no objective method for determining the optimal inclusion criterion in the context of visualization. Results We present the projection score, which is a straightforward, intuitively appealing measure of the informativeness of a variable subset with respect to PCA visualization. This measure can be universally applied to find suitable inclusion criteria for any type of variable filtering. We apply the presented measure to find optimal variable subsets for different filtering methods in both microarray data sets and synthetic data sets. We note also that the projection score can be applied in general contexts, to compare the informativeness of any variable subsets with respect to visualization by PCA. Conclusions We conclude that the projection score provides an easily interpretable and universally applicable measure of the informativeness of a variable subset with respect to visualization by PCA, that can be used to systematically find the most interpretable PCA visualization in practical exploratory analysis.</p

Directory of Open Access Journals