570 research outputs found
Stability Approach to Regularization Selection (StARS) for High Dimensional Graphical Models
A challenging problem in estimating high-dimensional graphical models is to
choose the regularization parameter in a data-dependent way. The standard
techniques include -fold cross-validation (-CV), Akaike information
criterion (AIC), and Bayesian information criterion (BIC). Though these methods
work well for low-dimensional problems, they are not suitable in high
dimensional settings. In this paper, we present StARS: a new stability-based
method for choosing the regularization parameter in high dimensional inference
for undirected graphs. The method has a clear interpretation: we use the least
amount of regularization that simultaneously makes a graph sparse and
replicable under random sampling. This interpretation requires essentially no
conditions. Under mild conditions, we show that StARS is partially sparsistent
in terms of graph estimation: i.e. with high probability, all the true edges
will be included in the selected model even when the graph size diverges with
the sample size. Empirically, the performance of StARS is compared with the
state-of-the-art model selection procedures, including -CV, AIC, and BIC, on
both synthetic data and a real microarray dataset. StARS outperforms all these
competing procedures
High Dimensional Semiparametric Gaussian Copula Graphical Models
In this paper, we propose a semiparametric approach, named nonparanormal
skeptic, for efficiently and robustly estimating high dimensional undirected
graphical models. To achieve modeling flexibility, we consider Gaussian Copula
graphical models (or the nonparanormal) as proposed by Liu et al. (2009). To
achieve estimation robustness, we exploit nonparametric rank-based correlation
coefficient estimators, including Spearman's rho and Kendall's tau. In high
dimensional settings, we prove that the nonparanormal skeptic achieves the
optimal parametric rate of convergence in both graph and parameter estimation.
This celebrating result suggests that the Gaussian copula graphical models can
be used as a safe replacement of the popular Gaussian graphical models, even
when the data are truly Gaussian. Besides theoretical analysis, we also conduct
thorough numerical simulations to compare different estimators for their graph
recovery performance under both ideal and noisy settings. The proposed methods
are then applied on a large-scale genomic dataset to illustrate their empirical
usefulness. The R language software package huge implementing the proposed
methods is available on the Comprehensive R Archive Network: http://cran.
r-project.org/.Comment: 34 pages, 10 figures; the Annals of Statistics, 201
Sparse Additive Models
We present a new class of methods for high-dimensional nonparametric
regression and classification called sparse additive models (SpAM). Our methods
combine ideas from sparse linear modeling and additive nonparametric
regression. We derive an algorithm for fitting the models that is practical and
effective even when the number of covariates is larger than the sample size.
SpAM is closely related to the COSSO model of Lin and Zhang (2006), but
decouples smoothing and sparsity, enabling the use of arbitrary nonparametric
smoothers. An analysis of the theoretical properties of SpAM is given. We also
study a greedy estimator that is a nonparametric version of forward stepwise
regression. Empirical results on synthetic and real data are presented, showing
that SpAM can be effective in fitting sparse nonparametric models in high
dimensional data
Learning Fashion Compatibility with Bidirectional LSTMs
The ubiquity of online fashion shopping demands effective recommendation
services for customers. In this paper, we study two types of fashion
recommendation: (i) suggesting an item that matches existing components in a
set to form a stylish outfit (a collection of fashion items), and (ii)
generating an outfit with multimodal (images/text) specifications from a user.
To this end, we propose to jointly learn a visual-semantic embedding and the
compatibility relationships among fashion items in an end-to-end fashion. More
specifically, we consider a fashion outfit to be a sequence (usually from top
to bottom and then accessories) and each item in the outfit as a time step.
Given the fashion items in an outfit, we train a bidirectional LSTM (Bi-LSTM)
model to sequentially predict the next item conditioned on previous ones to
learn their compatibility relationships. Further, we learn a visual-semantic
space by regressing image features to their semantic representations aiming to
inject attribute and category information as a regularization for training the
LSTM. The trained network can not only perform the aforementioned
recommendations effectively but also predict the compatibility of a given
outfit. We conduct extensive experiments on our newly collected Polyvore
dataset, and the results provide strong qualitative and quantitative evidence
that our framework outperforms alternative methods.Comment: ACM MM 1
VITON: An Image-based Virtual Try-on Network
We present an image-based VIirtual Try-On Network (VITON) without using 3D
information in any form, which seamlessly transfers a desired clothing item
onto the corresponding region of a person using a coarse-to-fine strategy.
Conditioned upon a new clothing-agnostic yet descriptive person representation,
our framework first generates a coarse synthesized image with the target
clothing item overlaid on that same person in the same pose. We further enhance
the initial blurry clothing area with a refinement network. The network is
trained to learn how much detail to utilize from the target clothing item, and
where to apply to the person in order to synthesize a photo-realistic image in
which the target item deforms naturally with clear visual patterns. Experiments
on our newly collected Zalando dataset demonstrate its promise in the
image-based virtual try-on task over state-of-the-art generative models
Using Expert Knowledge in Database-Oriented Problem Solving
Database-oriented problem solving often involves the processing of deduction rules which may be recursive in relational database systems. In this kind of problem solving, expert knowledge plays an important role in the guidance of correct and efficient processing. This paper presents a modularized relational planner RELPLAN, which develops a knowledge directed inference and planning mechanism for efficient processing of deduction rules in relational DB systems
Direct solar-pumped iodine laser amplifier
A XeCl laser which was developed earlier for an iodine laser oscillator was modified in order to increase the output pulse energy of XeCl laser so that the iodine laser output energy could be increased. The electrical circuit of the XeCl laser was changed from a simple capacitor discharge circuit of the XeCl laser to a Marx system. Because of this improvement the output energy from the XeCl laser was increased from 60 mj to 80 mj. Subsequently, iodine laser output energy was increased from 100 mj to 3 mj. On the other hand, the energy storage capability and amplification characteristics of the Vortek solar simulator-pumped amplifier was calculated expecting the calculated amplification factor is about 2 and the energy extraction efficiency is 26 percent due to the very low input energy density to the amplifier. As a result of an improved kinetic modeling for the iodine solar simulator pumped power amplifier, it is found that the I-2 along the axis of the tube affects seriously the gain profile. For the gas i-C3F7I at the higher pressures, the gain will decrease due to the I-2 as the pumping intensity increases, and at these higher pressures an increase in flow velocity will increase the gain
Forest Density Estimation
We study graph estimation and density estimation in high dimensions, using a
family of density estimators based on forest structured undirected graphical
models. For density estimation, we do not assume the true distribution
corresponds to a forest; rather, we form kernel density estimates of the
bivariate and univariate marginals, and apply Kruskal's algorithm to estimate
the optimal forest on held out data. We prove an oracle inequality on the
excess risk of the resulting estimator relative to the risk of the best forest.
For graph estimation, we consider the problem of estimating forests with
restricted tree sizes. We prove that finding a maximum weight spanning forest
with restricted tree size is NP-hard, and develop an approximation algorithm
for this problem. Viewing the tree size as a complexity parameter, we then
select a forest using data splitting, and prove bounds on excess risk and
structure selection consistency of the procedure. Experiments with simulated
data and microarray data indicate that the methods are a practical alternative
to Gaussian graphical models.Comment: Extended version of earlier paper titled "Tree density estimation
- …