Search CORE

369 research outputs found

Foothill: A Quasiconvex Regularization for Edge Computing of Deep Neural Networks

Author: AE Hoerl
CM Bishop
H Zou
K Knight
LE Frank
M Rastegari
N Srivastava
PJ Huber
R Tibshirani
VP Nia
Publication venue
Publication date: 23/05/2019
Field of study

Deep neural networks (DNNs) have demonstrated success for many supervised learning tasks, ranging from voice recognition, object detection, to image classification. However, their increasing complexity might yield poor generalization error that make them hard to be deployed on edge devices. Quantization is an effective approach to compress DNNs in order to meet these constraints. Using a quasiconvex base function in order to construct a binary quantizer helps training binary neural networks (BNNs) and adding noise to the input data or using a concrete regularization function helps to improve generalization error. Here we introduce foothill function, an infinitely differentiable quasiconvex function. This regularizer is flexible enough to deform towards

L_1

and

L_2

penalties. Foothill can be used as a binary quantizer, as a regularizer, or as a loss. In particular, we show this regularizer reduces the accuracy gap between BNNs and their full-precision counterpart for image classification on ImageNet.Comment: Accepted in 16th International Conference of Image Analysis and Recognition (ICIAR 2019

arXiv.org e-Print Archive

Crossref

Generative discriminative models for multivariate inference and statistical mapping in medical imaging

Author: A Rao
AE Hoerl
C Davatzikos
E Varol
J Ashburner
J Mairal
L Grosenick
M Ganz
MR Sabuncu
N Kriegeskorte
NK Batmanghelich
PM Rasmussen
R Cuingnet
S Haufe
Y Benjamini
Publication venue: 'Springer Science and Business Media LLC'
Publication date: 01/07/2018
Field of study

This paper presents a general framework for obtaining interpretable multivariate discriminative models that allow efficient statistical inference for neuroimage analysis. The framework, termed generative discriminative machine (GDM), augments discriminative models with a generative regularization term. We demonstrate that the proposed formulation can be optimized in closed form and in dual space, allowing efficient computation for high dimensional neuroimaging datasets. Furthermore, we provide an analytic estimation of the null distribution of the model parameters, which enables efficient statistical inference and p-value computation without the need for permutation testing. We compared the proposed method with both purely generative and discriminative learning methods in two large structural magnetic resonance imaging (sMRI) datasets of Alzheimer's disease (AD) (n=415) and Schizophrenia (n=853). Using the AD dataset, we demonstrated the ability of GDM to robustly handle confounding variations. Using Schizophrenia dataset, we demonstrated the ability of GDM to handle multi-site studies. Taken together, the results underline the potential of the proposed approach for neuroimaging analyses.Comment: To appear in MICCAI 2018 proceeding

arXiv.org e-Print Archive

Crossref

Evolving Spatially Aggregated Features from Satellite Imagery for Regional Modeling

Author: AE Hoerl
D Buckingham
J Bongard
J Dong
J Dozier
J Martinec
J Rees
JR Koza
K Krawiec
M Hollander
M Schmidt
M Tedesco
MD Schmidt
R Tibshirani
TH Painter
WR Tobler
Publication venue: 'Springer Science and Business Media LLC'
Publication date: 14/12/2017
Field of study

Satellite imagery and remote sensing provide explanatory variables at relatively high resolutions for modeling geospatial phenomena, yet regional summaries are often desirable for analysis and actionable insight. In this paper, we propose a novel method of inducing spatial aggregations as a component of the machine learning process, yielding regional model features whose construction is driven by model prediction performance rather than prior assumptions. Our results demonstrate that Genetic Programming is particularly well suited to this type of feature construction because it can automatically synthesize appropriate aggregations, as well as better incorporate them into predictive models compared to other regression methods we tested. In our experiments we consider a specific problem instance and real-world dataset relevant to predicting snow properties in high-mountain Asia

arXiv.org e-Print Archive

Crossref

Classification tools for carotenoid content estimation in Manihot esculenta via metabolomics and machine learning

Author: A Champagne
AE Hoerl
AJ Meléndez-Martínez
AL Chávez
C Costa
D Rodriguez-Amaya
H Zou
K Kljak
MR Frano La
SA Tanumihardjo
T Sánchez
V Svetnik
W Stahl
Publication venue: 'Springer Science and Business Media LLC'
Publication date: 21/06/2017
Field of study

Cassava genotypes (Manihot esculenta Crantz) with high pro-vitamin A activity have been identified as a strategy to reduce the prevalence of deficiency of this vitamin. The color variability of cassava roots, which can vary from white to red, is related to the presence of several carotenoid pigments. The present study has shown how CIELAB color measurement on cassava roots tissue can be used as a non-destructive and very fast technique to quantify the levels of carotenoids in cassava root samples, avoiding the use of more expensive analytical techniques for compound quantification, such as UV-visible spectrophotometry and the HPLC. For this, we used machine learning techniques, associating the colorimetric data (CIELAB) with the data obtained by UV-vis and HPLC, to obtain models of prediction of carotenoids for this type of biomass. Best values of R2 (above 90%) were observed for the predictive variable TCC determined by UV-vis spectrophotometry. When we tested the machine learning models using the CIELAB values as inputs, for the total carotenoids contents quantified by HPLC, the Partial Least Squares (PLS), Support Vector Machines, and Elastic Net models presented the best values of R2 (above 40%) and Root-Mean-Square Error (RMSE). For the carotenoid quantification by UV-vis spectrophotometry, R2 (around 60%) and RMSE values (around 6.5) are more satisfactory. Ridge regression and Elastic Network showed the best results. It can be concluded that the use colorimetric technique (CIELAB) associated with UV-vis/HPLC and statistical techniques of prognostic analysis through machine learning can predict the content of total carotenoids in these samples, with good precision and accuracy.CAPES -Coordenação de Aperfeiçoamento de Pessoal de Nível Superior(407323/2013-9)info:eu-repo/semantics/publishedVersio

Universidade do Minho: RepositoriUM

Crossref

Selection of tuning parameters in bridge regression models via Bayesian information criterion

Author: A Antoniadis
AE Hoerl
C Park
C-H Zhang
CM Hurvich
CM Hurvich
G McDonald
G Schwarz
H Zou
H Zou
J Fan
J Fan
J Huang
J Huang
J Lv
JE Frank
K Knight
L Tierney
M Ishiguro
M Yuan
N Sugiura
P Bühlmann
P Craven
PHC Eilers
R Tibshirani
S Konishi
Shuichi Kawano
T Shimamura
WJ Fu
Y Liu
Publication venue: 'Springer Science and Business Media LLC'
Publication date: 13/04/2012
Field of study

We consider the bridge linear regression modeling, which can produce a sparse or non-sparse model. A crucial point in the model building process is the selection of adjusted parameters including a regularization parameter and a tuning parameter in bridge regression models. The choice of the adjusted parameters can be viewed as a model selection and evaluation problem. We propose a model selection criterion for evaluating bridge regression models in terms of Bayesian approach. This selection criterion enables us to select the adjusted parameters objectively. We investigate the effectiveness of our proposed modeling strategy through some numerical examples.Comment: 20 pages, 5 figure

arXiv.org e-Print Archive

Crossref

Differential expression analysis with global network adjustment

Author: A Antonellis
A Zellner
AE Hoerl
AI Su
D Bates
DB Dahl
E Choy
EJ Cosgrove
H Zou
J Friedman
J Ruan
J Schoumans
J Wettenhall
Jannine D Cody
Jonathan A Gelfond
Joseph G Ibrahim
JT Leek
M Gustafsson
M Newton
Mayetri Gupta
Ming-Hui Chen
R Development Core Team
R Tibshirani
RJ Prill
S Pounds
SC Smith
SM Siepka
T Barrett
T Barrett
Publication venue: 'Springer Science and Business Media LLC'
Publication date: 01/01/2013
Field of study

Background: Large-scale chromosomal deletions or other non-specific perturbations of the transcriptome can alter the expression of hundreds or thousands of genes, and it is of biological interest to understand which genes are most profoundly affected. We present a method for predicting a gene’s expression as a function of other genes thereby accounting for the effect of transcriptional regulation that confounds the identification of genes differentially expressed relative to a regulatory network. The challenge in constructing such models is that the number of possible regulator transcripts within a global network is on the order of thousands, and the number of biological samples is typically on the order of 10. Nevertheless, there are large gene expression databases that can be used to construct networks that could be helpful in modeling transcriptional regulation in smaller experiments. Results: We demonstrate a type of penalized regression model that can be estimated from large gene expression databases, and then applied to smaller experiments. The ridge parameter is selected by minimizing the cross-validation error of the predictions in the independent out-sample. This tends to increase the model stability and leads to a much greater degree of parameter shrinkage, but the resulting biased estimation is mitigated by a second round of regression. Nevertheless, the proposed computationally efficient “over-shrinkage” method outperforms previously used LASSO-based techniques. In two independent datasets, we find that the median proportion of explained variability in expression is approximately 25%, and this results in a substantial increase in the signal-to-noise ratio allowing more powerful inferences on differential gene expression leading to biologically intuitive findings. We also show that a large proportion of gene dependencies are conditional on the biological state, which would be impossible with standard differential expression methods. Conclusions: By adjusting for the effects of the global network on individual genes, both the sensitivity and reliability of differential expression measures are greatly improved.</p&gt

Crossref

Springer - Publisher Connector

PubMed Central

Carolina Digital Repository

Enlighten

A Regularized Graph Layout Framework for Dynamic Network Visualization

Author: AE Hoerl
Alfred O. Hero
AY Ng
DM Witten
DM Witten
G Di Battista
G Kossinets
H Lütkepohl
I Borg
I Herman
J Branke
J Leskovec
J Moody
JA Lee
K Misue
Kevin S. Xu
KM Hall
L Leydesdorff
LN Trefethen
M Belkin
Mark Kliger
MS Bazaraa
N Eagle
P Eades
PJ Mucha
PW Holland
R Tibshirani
RH Byrd
S Bender-deMoll
T Kamada
TM Newcomb
TMJ Fruchterman
U Brandes
U Brandes
U Brandes
U Brandes
Y Chi
Y Frishman
Y Koren
Publication venue: 'Springer Science and Business Media LLC'
Publication date: 19/02/2013
Field of study

Many real-world networks, including social and information networks, are dynamic structures that evolve over time. Such dynamic networks are typically visualized using a sequence of static graph layouts. In addition to providing a visual representation of the network structure at each time step, the sequence should preserve the mental map between layouts of consecutive time steps to allow a human to interpret the temporal evolution of the network. In this paper, we propose a framework for dynamic network visualization in the on-line setting where only present and past graph snapshots are available to create the present layout. The proposed framework creates regularized graph layouts by augmenting the cost function of a static graph layout algorithm with a grouping penalty, which discourages nodes from deviating too far from other nodes belonging to the same group, and a temporal penalty, which discourages large node movements between consecutive time steps. The penalties increase the stability of the layout sequence, thus preserving the mental map. We introduce two dynamic layout algorithms within the proposed framework, namely dynamic multidimensional scaling (DMDS) and dynamic graph Laplacian layout (DGLL). We apply these algorithms on several data sets to illustrate the importance of both grouping and temporal regularization for producing interpretable visualizations of dynamic networks.Comment: To appear in Data Mining and Knowledge Discovery, supporting material (animations and MATLAB toolbox) available at http://tbayes.eecs.umich.edu/xukevin/visualization_dmkd_201

arXiv.org e-Print Archive

Crossref

Significance testing in ridge regression for genetic data.

Author: AE Hoerl
AE Hoerl
AM Halawa
CI Amos
CJ Hoggart
CJ Hoggart
CJ Hoggart
D Altman
E Riboli
E Vago
Erika Cule
G Golub
H Zou
IE Frank
J Lawless
J McKay
JC Whittaker
JY Tzeng
K Ayers
M Chadeau-Hyam
M Park
M Zucknick
Maria De Iorio
N Malo
N Meinshausen
P Armitage
P Yang
Paolo Vineis
R Development Core Team
R Tibshirani
RJ Hung
SL Cessie
T Hastie
T Hsiang
T Truong
TA Manolio
The 1000 Genomes Project Consortium
WTCCC
Y Sun
Y Wang
Publication venue: 'Springer Science and Business Media LLC'
Publication date: 01/09/2011
Field of study

Published versio

Crossref

Directory of Open Access Journals

PubMed Central

UCL Discovery

Spiral - Imperial College Digital Repository

The geography of recent genetic ancestry across Europe

Author: A Albrechtsen
A Auton
A Gillett
A Gusev
A Keller
A Zeileis
AE Hoerl
AL Price
AL Price
AM Stuart
B Winney
BL Browning
BM Henn
BM Henn
C Tyler-Smith
CD Huff
Chris Tyler-Smith
CL Epstein
CT O'Dushlaine
DJ Lawson
DLT Rohde
E Jakkula
F Rousset
G McVean
Graham Coop
H Li
J Chang
J Novembre
J Novembre
J Novembre
JA Tennessen
JE Pool
JE Powell
JFC Kingman
JK Gusev Lowe
JN Fenner
K Harris
KA Frazer
KP Donnelly
M Slatkin
MD Brown
MR Nelson
MR Nelson
N Patterson
N Patterson
N Takahata
NH Chapman
O Lao
P Menozzi
P Moorjani
P Skoglund
P Soares
Peter Ralph
PF Palamara
R Hudson
RA Fisher
RL Cann
S Carmi
S Giglio
S Gravel
S Purcell
Y Petrov
Publication venue: 'Public Library of Science (PLoS)'
Publication date: 07/05/2013
Field of study

The recent genealogical history of human populations is a complex mosaic formed by individual migration, large-scale population movements, and other demographic events. Population genomics datasets can provide a window into this recent history, as rare traces of recent shared genetic ancestry are detectable due to long segments of shared genomic material. We make use of genomic data for 2,257 Europeans (the POPRES dataset) to conduct one of the first surveys of recent genealogical ancestry over the past three thousand years at a continental scale. We detected 1.9 million shared genomic segments, and used the lengths of these to infer the distribution of shared ancestors across time and geography. We find that a pair of modern Europeans living in neighboring populations share around 10-50 genetic common ancestors from the last 1500 years, and upwards of 500 genetic ancestors from the previous 1000 years. These numbers drop off exponentially with geographic distance, but since genetic ancestry is rare, individuals from opposite ends of Europe are still expected to share millions of common genealogical ancestors over the last 1000 years. There is substantial regional variation in the number of shared genetic ancestors: especially high numbers of common ancestors between many eastern populations likely date to the Slavic and/or Hunnic expansions, while much lower levels of common ancestry in the Italian and Iberian peninsulas may indicate weaker demographic effects of Germanic expansions into these areas and/or more stably structured populations. Recent shared ancestry in modern Europeans is ubiquitous, and clearly shows the impact of both small-scale migration and large historical events. Population genomic datasets have considerable power to uncover recent demographic history, and will allow a much fuller picture of the close genealogical kinship of individuals across the world.Comment: Full size figures available from http://www.eve.ucdavis.edu/~plralph/research.html; or html version at http://ralphlab.usc.edu/ibd/ibd-paper/ibd-writeup.xhtm

arXiv.org e-Print Archive

Crossref

Directory of Open Access Journals

PubMed Central

The Francis Crick Institute

Assessing the impact of a health intervention via user-generated Internet content

Author: A Culotta
A Monto
A Signorini
AC Hayward
AE Hoerl
AM Presanis
B Efron
B Efron
B Matérn
B O’Hara
C Chew
CE Rasmussen
CE Rasmussen
D Lazer
DJ Smith
DK Duvenaud
DM Morens
DR Olson
Elad Yom-Tov
G Boivin
GJ Milinovich
GJD Smith
H Zou
Ingemar J. Cox
J Bollen
J Ginsberg
JG Petrie
KE Jones
M Baguelin
MA Oliver
MJ Paul
ML Cohen
MT Osterholm
N Cristianini
P Zhao
PM Polgreen
R Tibshirani
RG Pebody
Richard Pebody
S Binder
S Briand
S Cook
T Hastie
V Lampos
Vasileios Lampos
Publication venue: Springer Nature
Publication date: 01/01/2015
Field of study

Assessing the effect of a health-oriented intervention by traditional epidemiological methods is commonly based only on population segments that use healthcare services. Here we introduce a complementary framework for evaluating the impact of a targeted intervention, such as a vaccination campaign against an infectious disease, through a statistical analysis of user-generated content submitted on web platforms. Using supervised learning, we derive a nonlinear regression model for estimating the prevalence of a health event in a population from Internet data. This model is applied to identify control location groups that correlate historically with the areas, where a specific intervention campaign has taken place. We then determine the impact of the intervention by inferring a projection of the disease rates that could have emerged in the absence of a campaign. Our case study focuses on the influenza vaccination program that was launched in England during the 2013/14 season, and our observations consist of millions of geo-located search queries to the Bing search engine and posts on Twitter. The impact estimates derived from the application of the proposed statistical framework support conventional assessments of the campaign

Crossref

Springer - Publisher Connector

UCL Discovery

Copenhagen University Research Information System