Search CORE

132 research outputs found

Impact of managed clinical networks on neonatal care in England : a population-based study

Author: Gale C.
Modi N.
Nagarajan S.
Santhakumaran S.
Statnikov Y.
Publication venue: BMJ Publishing Group Ltd.
Publication date: 03/04/2012
Field of study

Objective: To assess the impact of reorganisation of neonatal specialist care services in England after a UK Department of Health report in 2003. Design: A population-wide observational comparison of outcomes over two epochs, before and after the establishment of managed clinical neonatal networks. Setting: Epoch one: 294 maternity and neonatal units in England, Wales, and Northern Ireland, 1 September 1998 to 31 August 2000, as reported by the Confidential Enquiry into Stillbirths and Sudden Deaths in Infancy Project 27/28. Epoch two: 146 neonatal units in England contributing data to the National Neonatal Research Database at the Neonatal Data Analysis Unit, 1 January 2009 to 31 December 2010. Participants: Babies born at a gestational age of 27+0-28+6 (weeks+days): 3522 live births in epoch one; 2919 babies admitted to a neonatal unit within 28 days of birth in epoch two. Intervention: The national reorganisation of neonatal services into managed clinical networks. Main outcome measures: The proportion of babies born at hospitals providing the highest volume of neonatal specialist care (≥2000 neonatal intensive care days annually), having an acute transfer (within the first 24 hours after birth) and/or a late transfer (between 24 hours and 28 days after birth) to another hospital, assessed by change in distribution of transfer category (“none,” “acute,” “late”), and babies from multiple births separated by transfer. For acute transfers in epoch two, the level of specialist neonatal care provided at the destination hospital (British Association of Perinatal Medicine criteria). Results: After reorganisation, there were increases in the proportions of babies born at 27-28 weeks’ gestation in hospitals providing the highest volume of neonatal specialist care (18% (631/3495) v 49% (1325/2724); odds ratio 4.30, 95% confidence interval 3.83 to 4.82; P<0.001) and in acute and late postnatal transfers (7% (235) v 12% (360) and 18% (579) v 22% (640), respectively; P<0.001). There was no significant change in the proportion of babies from multiple births separated by transfer (33% (39) v 29% (38); 0.86, 0.50 to 1.46; P=0.57). In epoch two, 32% of acute transfers were to a neonatal unit providing either an equivalent (n=87) or lower (n=26) level of specialist care. Conclusions: There is evidence of some improvement in the delivery of neonatal specialist care after reorganisation. The increase in acute transfers in epoch two, in conjunction with the high proportion transferred to a neonatal unit providing an equivalent or lower level of specialist care, and the continued separation of babies from multiple births, are indicative of poor coordination between maternity and neonatal services to facilitate in utero transfer before delivery, and continuing inadequacies in capacity of intensive care cots. Historical data representing epoch one are available only in aggregate form, preventing examination of temporal trends or confounding factors. This limits the extent to which differences between epochs can be attributed to reorganisation and highlights the importance of routine, prospective data collection for evaluation of future health service reorganisations

Warwick Research Archives Portal Repository

Expanding the Understanding of Biases in Development of Clinical-Grade Molecular Signatures: A Case Study in Acute Respiratory Viral Infections

Author: A Rangarajan
A Statnikov
A Statnikov
A Statnikov
A Statnikov
A Statnikov
AK Zaas
AK Zaas
Alexander Statnikov
AM Glas
C Ambroise
CF Aliferis
CF Aliferis
CF Aliferis
Constantin F. Aliferis
EE Ntzani
ER DeLong
F Azuaje
FJ Gonzalez
GG Jackson
I Guyon
I Guyon
I Tsamardinos
J Pearl
J Pearl
JA Sparano
JT Leek
Jörn-Hendrik Weitkamp
KA Baggerly
Lauren McVoy
LM Cope
Nikita I. Lytkin
O Ramilo
R Kohavi
R Simon
RA Irizarry
RA Irizarry
RL Somorjai
TW Anderson
UM Braga-Neto
Vladimir Brusic
VN Vapnik
WE Johnson
Y Benjamini
Y Benjamini
Z Liu
Publication venue: Public Library of Science
Publication date: 01/06/2011
Field of study

The promise of modern personalized medicine is to use molecular and clinical information to better diagnose, manage, and treat disease, on an individual patient basis. These functions are predominantly enabled by molecular signatures, which are computational models for predicting phenotypes and other responses of interest from high-throughput assay data. Data-analytics is a central component of molecular signature development and can jeopardize the entire process if conducted incorrectly. While exploratory data analysis may tolerate suboptimal protocols, clinical-grade molecular signatures are subject to vastly stricter requirements. Closing the gap between standards for exploratory versus clinically successful molecular signatures entails a thorough understanding of possible biases in the data analysis phase and developing strategies to avoid them.Using a recently introduced data-analytic protocol as a case study, we provide an in-depth examination of the poorly studied biases of the data-analytic protocols related to signature multiplicity, biomarker redundancy, data preprocessing, and validation of signature reproducibility. The methodology and results presented in this work are aimed at expanding the understanding of these data-analytic biases that affect development of clinically robust molecular signatures.Several recommendations follow from the current study. First, all molecular signatures of a phenotype should be extracted to the extent possible, in order to provide comprehensive and accurate grounds for understanding disease pathogenesis. Second, redundant genes should generally be removed from final signatures to facilitate reproducibility and decrease manufacturing costs. Third, data preprocessing procedures should be designed so as not to bias biomarker selection. Finally, molecular signatures developed and applied on different phenotypes and populations of patients should be treated with great caution

Public Library of Science (PLOS)

Crossref

Directory of Open Access Journals

PubMed Central

Deep Learning and Random Forest-Based Augmentation of sRNA Expression Profiles

Author: A Statnikov
C Backes
D Hadley
L Guo
L Simon
MD Wilkinson
RU Rahman
S Ellis
S Webb
Y LeCun
Y Sun
Z Guo
Publication venue: 'Springer Science and Business Media LLC'
Publication date: 26/09/2019
Field of study

The lack of well-structured annotations in a growing amount of RNA expression data complicates data interoperability and reusability. Commonly - used text mining methods extract annotations from existing unstructured data descriptions and often provide inaccurate output that requires manual curation. Automatic data-based augmentation (generation of annotations on the base of expression data) can considerably improve the annotation quality and has not been well-studied. We formulate an automatic augmentation of small RNA-seq expression data as a classification problem and investigate deep learning (DL) and random forest (RF) approaches to solve it. We generate tissue and sex annotations from small RNA-seq expression data for tissues and cell lines of homo sapiens. We validate our approach on 4243 annotated small RNA-seq samples from the Small RNA Expression Atlas (SEA) database. The average prediction accuracy for tissue groups is 98% (DL), for tissues - 96.5% (DL), and for sex - 77% (DL). The "one dataset out" average accuracy for tissue group prediction is 83% (DL) and 59% (RF). On average, DL provides better results as compared to RF, and considerably improves classification performance for 'unseen' datasets

arXiv.org e-Print Archive

Crossref

The United Kingdom National Neonatal Research Database: A validation study.

Author: Battersby C
Costeloe K
Gray D
Modi N
Santhakumaran S
Statnikov Y
UK Neonatal Collaborative and Medicines for Neonates Investigator Group
Publication venue: 'Public Library of Science (PLoS)'
Publication date: 01/01/2018
Field of study

BACKGROUND: The National Neonatal Research Database (NNRD) is a rich repository of pre-defined clinical data extracted at regular intervals from point-of-care, clinician-entered electronic patient records on all admissions to National Health Service neonatal units in England, Wales, and Scotland. We describe population coverage for England and assess data completeness and accuracy. METHODS: We determined population coverage of the NNRD in 2008-2014 through comparison with data on live births in England from the Office for National Statistics. We determined the completeness of seven data items on the NNRD. We assessed the accuracy of 44 data items (16 patient characteristics, 17 processes, 11 clinical outcomes) for infants enrolled in the multi-centre randomised controlled trial, Probiotics in Preterm Study (PiPs). We compared NNRD to PiPs data, the gold standard, and calculated discordancy rates using predefined criteria, and sensitivity, specificity and positive predictive values (PPV) of binary outcomes. RESULTS: The NNRD holds complete population data for England for infants born alive from 25+0 to 31+6 (completed weeks) of gestation; and 70% and 90% for those born at 23 and 24 weeks respectively. Completeness of patient characteristics was over 90%. Data were linked for 2257 episodes of care received by 1258 of the 1310 babies recruited to PiPs. Discordancy rates were 85% for all outcomes; sensitivity ranged from 50-100%; PPV ranged from 58.8 (95% CI 40.8-75.4%) for porencephalic cyst to 99.7 (95% CI 99.2, 99.9%) for survival to discharge. CONCLUSIONS: The completeness and quality of data held in the NNRD is high, providing assurance in relation to use for multiple purposes, including national audit, health service evaluations, quality improvement, and research

Directory of Open Access Journals

Spiral - Imperial College Digital Repository

Queen Mary Research Online

FigShare

Impact of managed clinical networks on NHS specialist neonatal services in England: population based study

Author: Gale C
Modi N
Nagarajan S
Neonatal Data Analysis Unit and the Medicines for Neonates Investigator Group
Petrou S
Santhakumaran S
Statnikov Y
Publication venue: BMJ Publishing Group Ltd.
Publication date: 01/01/2012
Field of study

Objective To assess the impact of reorganisation of neonatal specialist care services in England after a UK Department of Health report in 2003

Crossref

PubMed Central

Oxford University Research Archive

Spiral - Imperial College Digital Repository

King's Research Portal

Bridging a translational gap: using machine learning to improve the prediction of PTSD

Author: A Statnikov
A Statnikov
A Statnikov
Alexander Statnikov
AP Bradley
Arieh Y Shalev
AY Shalev
AY Shalev
AY Shalev
AY Shalev
AY Shalev
AY Shalev
AY Shalev
B Kleim
BE Boser
C-C Chang
CJ Bryan
CR Brewin
CR Marmar
D Forbes
EB Binder
EB Foa
EB Foa
EJ Ozer
G Orrù
IR Galatzer-Levy
Isaac R Galatzer-Levy
J Difede
JA Boscarino
JA Haagsma
Karen-Inge Karstoft
KC Koenen
KC Koenen
L Breiman
N Breslau
OJ Bienvenu
RA Bryant
RA Bryant
RC Kessler
RC Kessler
RH Segman
RS Lazarus
S Visweswaran
SA Freedman
SB Norman
TA Mellman
W Guy
Zhiguo Li
Publication venue: 'Springer Science and Business Media LLC'
Publication date
Field of study

Crossref

Birth weight and longitudinal growth in infants born below 32 weeks' gestation: a UK population study

Author: Cole TJ
Modi N
Neonatal Data Analysis Unit The
Pan H
Preterm Growth Investigator Group The
Santhakumaran S
Statnikov Y
Publication venue
Publication date: 01/01/2014
Field of study

Objective: To describe birth weight and postnatal weight gain in a contemporaneous population of babies born <32 weeks' gestation, using routinely captured electronic clinical data. Design: Anonymised longitudinal weight data from 2006 to 2011. Setting: National Health Service neonatal units in England. Methods: Birth weight centiles were constructed using the LMS method, and longitudinal weight gain was summarised as mean growth curves for each week of gestation until discharge, using SITAR (Superimposition by Translation and Rotation) growth curve analysis. Results: Data on 103 194 weights of 5009 babies born from 22–31 weeks’ gestation were received from 40 neonatal units. At birth, girls weighed 6.6% (SE 0.4%) less than boys (p<0.0001). For babies born at 31 weeks’ gestation, weight fell after birth by an average of 258 g, with the nadir on the 8th postnatal day. The rate of weight gain then increased to a maximum of 28.4 g/d or 16.0 g/kg/d after 3 weeks. Conversely for babies of 22 to 28 weeks’ gestation, there was on average no weight loss after birth. At all gestations, babies tended to cross weight centiles downwards for at least 2 weeks. Conclusions: In very preterm infants, mean weight crosses centiles downwards by at least two centile channel widths. Postnatal weight loss is generally absent in those born before 29 weeks, but marked in those born later. Assigning an infant’s target centile at birth is potentially harmful as it requires rapid weight gain and should only be done once weight gain has stabilised. The use of electronic data reflects contemporary medical management

UCL Discovery

The efficacy of various machine learning models for multi-class classification of RNA-seq expression data

Author: A Statnikov
AT Azar
G Bartsch
G Sanz
H Rhee
I Ezkurdia
J Friedman
J Meng
J Zhi
JN Weinstein
L Breiman
M Al-Rajab
M Villamizar
MD Podolsky
MS Lawrence
ND Khalilabad
P Geurts
R Díaz-Uriarte
S Bram Ednersson
S Tarek
T Cover
X Li
Y Perez-Riverol
Y Shang
Y Tan
Z Ye
Publication venue: 'Springer Science and Business Media LLC'
Publication date: 19/08/2019
Field of study

Late diagnosis and high costs are key factors that negatively impact the care of cancer patients worldwide. Although the availability of biological markers for the diagnosis of cancer type is increasing, costs and reliability of tests currently present a barrier to the adoption of their routine use. There is a pressing need for accurate methods that enable early diagnosis and cover a broad range of cancers. The use of machine learning and RNA-seq expression analysis has shown promise in the classification of cancer type. However, research is inconclusive about which type of machine learning models are optimal. The suitability of five algorithms were assessed for the classification of 17 different cancer types. Each algorithm was fine-tuned and trained on the full array of 18,015 genes per sample, for 4,221 samples (75 % of the dataset). They were then tested with 1,408 samples (25 % of the dataset) for which cancer types were withheld to determine the accuracy of prediction. The results show that ensemble algorithms achieve 100% accuracy in the classification of 14 out of 17 types of cancer. The clustering and classification models, while faster than the ensembles, performed poorly due to the high level of noise in the dataset. When the features were reduced to a list of 20 genes, the ensemble algorithms maintained an accuracy above 95% as opposed to the clustering and classification models.Comment: 12 pages, 4 figures, 3 tables, conference paper: Computing Conference 2019, published at https://link.springer.com/chapter/10.1007/978-3-030-22871-2_6

arXiv.org e-Print Archive

Crossref

An experimental study of the intrinsic stability of random forest variable importance measures

Author: A Altmann
A Kalousis
A Statnikov
A Statnikov
A Verikas
AC Haury
AL Boulesteix
AL Boulesteix
CH Park
D Ma
DM Reif
DS Cao
EC Fieller
Fan Yang
H Wang
Huazhen Wang
I Guyon
I Kamkar
J Paul
JM Cadenas
KK Nicodemus
L Breiman
L Hamers
L Yu
L Yu
LI Kuncheva
MB Kursa
ML Calle
O Okun
R Díaz-Uriarte
R Fagin
R Genuer
S Alelyani
S Loscalzo
S Pleus
SS Lee
SY Kim
TK Ho
VY Kulkarni
Y Han
Y Zhang
Z He
Zhiyuan Luo
Publication venue: 'Springer Science and Business Media LLC'
Publication date: 01/01/2016
Field of study

BACKGROUND: The stability of Variable Importance Measures (VIMs) based on random forest has recently received increased attention. Despite the extensive attention on traditional stability of data perturbations or parameter variations, few studies include influences coming from the intrinsic randomness in generating VIMs, i.e. bagging, randomization and permutation. To address these influences, in this paper we introduce a new concept of intrinsic stability of VIMs, which is defined as the self-consistence among feature rankings in repeated runs of VIMs without data perturbations and parameter variations. Two widely used VIMs, i.e., Mean Decrease Accuracy (MDA) and Mean Decrease Gini (MDG) are comprehensively investigated. The motivation of this study is two-fold. First, we empirically verify the prevalence of intrinsic stability of VIMs over many real-world datasets to highlight that the instability of VIMs does not originate exclusively from data perturbations or parameter variations, but also stems from the intrinsic randomness of VIMs. Second, through Spearman and Pearson tests we comprehensively investigate how different factors influence the intrinsic stability. RESULTS: The experiments are carried out on 19 benchmark datasets with diverse characteristics, including 10 high-dimensional and small-sample gene expression datasets. Experimental results demonstrate the prevalence of intrinsic stability of VIMs. Spearman and Pearson tests on the correlations between intrinsic stability and different factors show that #feature (number of features) and #sample (size of sample) have a coupling effect on the intrinsic stability. The synthetic indictor, #feature/#sample, shows both negative monotonic correlation and negative linear correlation with the intrinsic stability, while OOB accuracy has monotonic correlations with intrinsic stability. This indicates that high-dimensional, small-sample and high complexity datasets may suffer more from intrinsic instability of VIMs. Furthermore, with respect to parameter settings of random forest, a large number of trees is preferred. No significant correlations can be seen between intrinsic stability and other factors. Finally, the magnitude of intrinsic stability is always smaller than that of traditional stability. CONCLUSION: First, the prevalence of intrinsic stability of VIMs demonstrates that the instability of VIMs not only comes from data perturbations or parameter variations, but also stems from the intrinsic randomness of VIMs. This finding gives a better understanding of VIM stability, and may help reduce the instability of VIMs. Second, by investigating the potential factors of intrinsic stability, users would be more aware of the risks and hence more careful when using VIMs, especially on high-dimensional, small-sample and high complexity datasets

Crossref

Springer - Publisher Connector

Royal Holloway - Pure

PubMed Central

Top scoring pairs for feature selection in machine learning and applications to cancer outcome prediction

Author: A Statnikov
AC Tan
C Bishop
C Lai
D Geman
DG Beer
I Guyon
I Inza
J Jin
J Weston
LJ van 't Veer
Mark A Kon
MH Asyali
P Baldi
Ping Shi
Qifu Zhu
R Blanco
R Kohavi
S Hanshall
S Ma
S Yoon
SL Pomeroy
Surajit Ray
TM Cover
TR Golub
TS Furey
V Vinaya
VN Vapnik
X Zhang
Y Saeys
Y Wang
Y Wang
Publication venue: BioMed Central
Publication date: 01/01/2011
Field of study

Background The widely used k top scoring pair (k-TSP) algorithm is a simple yet powerful parameter-free classifier. It owes its success in many cancer microarray datasets to an effective feature selection algorithm that is based on relative expression ordering of gene pairs. However, its general robustness does not extend to some difficult datasets, such as those involving cancer outcome prediction, which may be due to the relatively simple voting scheme used by the classifier. We believe that the performance can be enhanced by separating its effective feature selection component and combining it with a powerful classifier such as the support vector machine (SVM). More generally the top scoring pairs generated by the k-TSP ranking algorithm can be used as a dimensionally reduced subspace for other machine learning classifiers. Results We developed an approach integrating the k-TSP ranking algorithm (TSP) with other machine learning methods, allowing combination of the computationally efficient, multivariate feature ranking of k-TSP with multivariate classifiers such as SVM. We evaluated this hybrid scheme (k-TSP+SVM) in a range of simulated datasets with known data structures. As compared with other feature selection methods, such as a univariate method similar to Fisher's discriminant criterion (Fisher), or a recursive feature elimination embedded in SVM (RFE), TSP is increasingly more effective than the other two methods as the informative genes become progressively more correlated, which is demonstrated both in terms of the classification performance and the ability to recover true informative genes. We also applied this hybrid scheme to four cancer prognosis datasets, in which k-TSP+SVM outperforms k-TSP classifier in all datasets, and achieves either comparable or superior performance to that using SVM alone. In concurrence with what is observed in simulation, TSP appears to be a better feature selector than Fisher and RFE in some of the cancer datasets. Conclusions The k-TSP ranking algorithm can be used as a computationally efficient, multivariate filter method for feature selection in machine learning. SVM in combination with k-TSP ranking algorithm outperforms k-TSP and SVM alone in simulated datasets and in some cancer prognosis datasets. Simulation studies suggest that as a feature selector, it is better tuned to certain data characteristics, i.e. correlations among informative genes, which is potentially interesting as an alternative feature ranking method in pathway analysis

CiteSeerX

Crossref

Boston University Institutional Repository (OpenBU)

Springer - Publisher Connector

Directory of Open Access Journals

PubMed Central

Enlighten