369 research outputs found
Foothill: A Quasiconvex Regularization for Edge Computing of Deep Neural Networks
Deep neural networks (DNNs) have demonstrated success for many supervised
learning tasks, ranging from voice recognition, object detection, to image
classification. However, their increasing complexity might yield poor
generalization error that make them hard to be deployed on edge devices.
Quantization is an effective approach to compress DNNs in order to meet these
constraints. Using a quasiconvex base function in order to construct a binary
quantizer helps training binary neural networks (BNNs) and adding noise to the
input data or using a concrete regularization function helps to improve
generalization error. Here we introduce foothill function, an infinitely
differentiable quasiconvex function. This regularizer is flexible enough to
deform towards and penalties. Foothill can be used as a binary
quantizer, as a regularizer, or as a loss. In particular, we show this
regularizer reduces the accuracy gap between BNNs and their full-precision
counterpart for image classification on ImageNet.Comment: Accepted in 16th International Conference of Image Analysis and
Recognition (ICIAR 2019
Generative discriminative models for multivariate inference and statistical mapping in medical imaging
This paper presents a general framework for obtaining interpretable
multivariate discriminative models that allow efficient statistical inference
for neuroimage analysis. The framework, termed generative discriminative
machine (GDM), augments discriminative models with a generative regularization
term. We demonstrate that the proposed formulation can be optimized in closed
form and in dual space, allowing efficient computation for high dimensional
neuroimaging datasets. Furthermore, we provide an analytic estimation of the
null distribution of the model parameters, which enables efficient statistical
inference and p-value computation without the need for permutation testing. We
compared the proposed method with both purely generative and discriminative
learning methods in two large structural magnetic resonance imaging (sMRI)
datasets of Alzheimer's disease (AD) (n=415) and Schizophrenia (n=853). Using
the AD dataset, we demonstrated the ability of GDM to robustly handle
confounding variations. Using Schizophrenia dataset, we demonstrated the
ability of GDM to handle multi-site studies. Taken together, the results
underline the potential of the proposed approach for neuroimaging analyses.Comment: To appear in MICCAI 2018 proceeding
Evolving Spatially Aggregated Features from Satellite Imagery for Regional Modeling
Satellite imagery and remote sensing provide explanatory variables at
relatively high resolutions for modeling geospatial phenomena, yet regional
summaries are often desirable for analysis and actionable insight. In this
paper, we propose a novel method of inducing spatial aggregations as a
component of the machine learning process, yielding regional model features
whose construction is driven by model prediction performance rather than prior
assumptions. Our results demonstrate that Genetic Programming is particularly
well suited to this type of feature construction because it can automatically
synthesize appropriate aggregations, as well as better incorporate them into
predictive models compared to other regression methods we tested. In our
experiments we consider a specific problem instance and real-world dataset
relevant to predicting snow properties in high-mountain Asia
Classification tools for carotenoid content estimation in Manihot esculenta via metabolomics and machine learning
Cassava genotypes (Manihot esculenta Crantz) with high pro-vitamin A activity have been identified as a strategy to reduce the prevalence of deficiency of this vitamin. The color variability of cassava roots, which can vary from white to red, is related to the presence of several carotenoid pigments. The present study has shown how CIELAB color measurement on cassava roots tissue can be used as a non-destructive and very fast technique to quantify the levels of carotenoids in cassava root samples, avoiding the use of more expensive analytical techniques for compound quantification, such as UV-visible spectrophotometry and the HPLC. For this, we used machine learning techniques, associating the colorimetric data (CIELAB) with the data obtained by UV-vis and HPLC, to obtain models of prediction of carotenoids for this type of biomass. Best values of R2 (above 90%) were observed for the predictive variable TCC determined by UV-vis spectrophotometry. When we tested the machine learning models using the CIELAB values as inputs, for the total carotenoids contents quantified by HPLC, the Partial Least Squares (PLS), Support Vector Machines, and Elastic Net models presented the best values of R2 (above 40%) and Root-Mean-Square Error (RMSE). For the carotenoid quantification by UV-vis spectrophotometry, R2 (around 60%) and RMSE values (around 6.5) are more satisfactory. Ridge regression and Elastic Network showed the best results. It can be concluded that the use colorimetric technique (CIELAB) associated with UV-vis/HPLC and statistical techniques of prognostic analysis through machine learning can predict the content of total carotenoids in these samples, with good precision and accuracy.CAPES -Coordenação de Aperfeiçoamento de Pessoal de Nível Superior(407323/2013-9)info:eu-repo/semantics/publishedVersio
Selection of tuning parameters in bridge regression models via Bayesian information criterion
We consider the bridge linear regression modeling, which can produce a sparse
or non-sparse model. A crucial point in the model building process is the
selection of adjusted parameters including a regularization parameter and a
tuning parameter in bridge regression models. The choice of the adjusted
parameters can be viewed as a model selection and evaluation problem. We
propose a model selection criterion for evaluating bridge regression models in
terms of Bayesian approach. This selection criterion enables us to select the
adjusted parameters objectively. We investigate the effectiveness of our
proposed modeling strategy through some numerical examples.Comment: 20 pages, 5 figure
Differential expression analysis with global network adjustment
<p>Background: Large-scale chromosomal deletions or other non-specific perturbations of the transcriptome can alter the expression of hundreds or thousands of genes, and it is of biological interest to understand which genes are most profoundly affected. We present a method for predicting a gene’s expression as a function of other genes thereby accounting for the effect of transcriptional regulation that confounds the identification of genes differentially expressed relative to a regulatory network. The challenge in constructing such models is that the number of possible regulator transcripts within a global network is on the order of thousands, and the number of biological samples is typically on the order of 10. Nevertheless, there are large gene expression databases that can be used to construct networks that could be helpful in modeling transcriptional regulation in smaller experiments.</p>
<p>Results: We demonstrate a type of penalized regression model that can be estimated from large gene expression databases, and then applied to smaller experiments. The ridge parameter is selected by minimizing the cross-validation error of the predictions in the independent out-sample. This tends to increase the model stability and leads to a much greater degree of parameter shrinkage, but the resulting biased estimation is mitigated by a second round of regression. Nevertheless, the proposed computationally efficient “over-shrinkage” method outperforms previously used LASSO-based techniques. In two independent datasets, we find that the median proportion of explained variability in expression is approximately 25%, and this results in a substantial increase in the signal-to-noise ratio allowing more powerful inferences on differential gene expression leading to biologically intuitive findings. We also show that a large proportion of gene dependencies are conditional on the biological state, which would be impossible with standard differential expression methods.</p>
<p>Conclusions: By adjusting for the effects of the global network on individual genes, both the sensitivity and reliability of differential expression measures are greatly improved.</p>
A Regularized Graph Layout Framework for Dynamic Network Visualization
Many real-world networks, including social and information networks, are
dynamic structures that evolve over time. Such dynamic networks are typically
visualized using a sequence of static graph layouts. In addition to providing a
visual representation of the network structure at each time step, the sequence
should preserve the mental map between layouts of consecutive time steps to
allow a human to interpret the temporal evolution of the network. In this
paper, we propose a framework for dynamic network visualization in the on-line
setting where only present and past graph snapshots are available to create the
present layout. The proposed framework creates regularized graph layouts by
augmenting the cost function of a static graph layout algorithm with a grouping
penalty, which discourages nodes from deviating too far from other nodes
belonging to the same group, and a temporal penalty, which discourages large
node movements between consecutive time steps. The penalties increase the
stability of the layout sequence, thus preserving the mental map. We introduce
two dynamic layout algorithms within the proposed framework, namely dynamic
multidimensional scaling (DMDS) and dynamic graph Laplacian layout (DGLL). We
apply these algorithms on several data sets to illustrate the importance of
both grouping and temporal regularization for producing interpretable
visualizations of dynamic networks.Comment: To appear in Data Mining and Knowledge Discovery, supporting material
(animations and MATLAB toolbox) available at
http://tbayes.eecs.umich.edu/xukevin/visualization_dmkd_201
The geography of recent genetic ancestry across Europe
The recent genealogical history of human populations is a complex mosaic
formed by individual migration, large-scale population movements, and other
demographic events. Population genomics datasets can provide a window into this
recent history, as rare traces of recent shared genetic ancestry are detectable
due to long segments of shared genomic material. We make use of genomic data
for 2,257 Europeans (the POPRES dataset) to conduct one of the first surveys of
recent genealogical ancestry over the past three thousand years at a
continental scale. We detected 1.9 million shared genomic segments, and used
the lengths of these to infer the distribution of shared ancestors across time
and geography. We find that a pair of modern Europeans living in neighboring
populations share around 10-50 genetic common ancestors from the last 1500
years, and upwards of 500 genetic ancestors from the previous 1000 years. These
numbers drop off exponentially with geographic distance, but since genetic
ancestry is rare, individuals from opposite ends of Europe are still expected
to share millions of common genealogical ancestors over the last 1000 years.
There is substantial regional variation in the number of shared genetic
ancestors: especially high numbers of common ancestors between many eastern
populations likely date to the Slavic and/or Hunnic expansions, while much
lower levels of common ancestry in the Italian and Iberian peninsulas may
indicate weaker demographic effects of Germanic expansions into these areas
and/or more stably structured populations. Recent shared ancestry in modern
Europeans is ubiquitous, and clearly shows the impact of both small-scale
migration and large historical events. Population genomic datasets have
considerable power to uncover recent demographic history, and will allow a much
fuller picture of the close genealogical kinship of individuals across the
world.Comment: Full size figures available from
http://www.eve.ucdavis.edu/~plralph/research.html; or html version at
http://ralphlab.usc.edu/ibd/ibd-paper/ibd-writeup.xhtm
Assessing the impact of a health intervention via user-generated Internet content
Assessing the effect of a health-oriented intervention by traditional epidemiological methods is commonly based only on population segments that use healthcare services. Here we introduce a complementary framework for evaluating the impact of a targeted intervention, such as a vaccination campaign against an infectious disease, through a statistical analysis of user-generated content submitted on web platforms. Using supervised learning, we derive a nonlinear regression model for estimating the prevalence of a health event in a population from Internet data. This model is applied to identify control location groups that correlate historically with the areas, where a specific intervention campaign has taken place. We then determine the impact of the intervention by inferring a projection of the disease rates that could have emerged in the absence of a campaign. Our case study focuses on the influenza vaccination program that was launched in England during the 2013/14 season, and our observations consist of millions of geo-located search queries to the Bing search engine and posts on Twitter. The impact estimates derived from the application of the proposed statistical framework support conventional assessments of the campaign
- …
