1,421 research outputs found
High-dimensional Sparse Inverse Covariance Estimation using Greedy Methods
In this paper we consider the task of estimating the non-zero pattern of the
sparse inverse covariance matrix of a zero-mean Gaussian random vector from a
set of iid samples. Note that this is also equivalent to recovering the
underlying graph structure of a sparse Gaussian Markov Random Field (GMRF). We
present two novel greedy approaches to solving this problem. The first
estimates the non-zero covariates of the overall inverse covariance matrix
using a series of global forward and backward greedy steps. The second
estimates the neighborhood of each node in the graph separately, again using
greedy forward and backward steps, and combines the intermediate neighborhoods
to form an overall estimate. The principal contribution of this paper is a
rigorous analysis of the sparsistency, or consistency in recovering the
sparsity pattern of the inverse covariance matrix. Surprisingly, we show that
both the local and global greedy methods learn the full structure of the model
with high probability given just samples, which is a
\emph{significant} improvement over state of the art -regularized
Gaussian MLE (Graphical Lasso) that requires samples. Moreover,
the restricted eigenvalue and smoothness conditions imposed by our greedy
methods are much weaker than the strong irrepresentable conditions required by
the -regularization based methods. We corroborate our results with
extensive simulations and examples, comparing our local and global greedy
methods to the -regularized Gaussian MLE as well as the Neighborhood
Greedy method to that of nodewise -regularized linear regression
(Neighborhood Lasso).Comment: Accepted to AI STAT 2012 for Oral Presentatio
A new stochastic differential equation approach for waves in a random medium
We present a mathematical approach that simplifies the theoretical treatment
of electromagnetic localization in random media and leads to closed form
analytical solutions. Starting with the assumption that the dielectric
permittivity of the medium has delta-correlated spatial fluctuations, and using
the Ito lemma, we derive a linear stochastic differential equation for a one
dimensional random medium. The equation leads to localized wave solutions. The
localized wave solutions have a localization length that scales inversely with
the square of the frequency of the wave in the low frequency regime, whereas in
the high frequency regime, this length varies inversely with the frequency to
the power of two thirds
The Child is Father of the Man: Foresee the Success at the Early Stage
Understanding the dynamic mechanisms that drive the high-impact scientific
work (e.g., research papers, patents) is a long-debated research topic and has
many important implications, ranging from personal career development and
recruitment search, to the jurisdiction of research resources. Recent advances
in characterizing and modeling scientific success have made it possible to
forecast the long-term impact of scientific work, where data mining techniques,
supervised learning in particular, play an essential role. Despite much
progress, several key algorithmic challenges in relation to predicting
long-term scientific impact have largely remained open. In this paper, we
propose a joint predictive model to forecast the long-term scientific impact at
the early stage, which simultaneously addresses a number of these open
challenges, including the scholarly feature design, the non-linearity, the
domain-heterogeneity and dynamics. In particular, we formulate it as a
regularized optimization problem and propose effective and scalable algorithms
to solve it. We perform extensive empirical evaluations on large, real
scholarly data sets to validate the effectiveness and the efficiency of our
method.Comment: Correct some typos in our KDD pape
Simultaneously Structured Models with Application to Sparse and Low-rank Matrices
The topic of recovery of a structured model given a small number of linear
observations has been well-studied in recent years. Examples include recovering
sparse or group-sparse vectors, low-rank matrices, and the sum of sparse and
low-rank matrices, among others. In various applications in signal processing
and machine learning, the model of interest is known to be structured in
several ways at the same time, for example, a matrix that is simultaneously
sparse and low-rank.
Often norms that promote each individual structure are known, and allow for
recovery using an order-wise optimal number of measurements (e.g.,
norm for sparsity, nuclear norm for matrix rank). Hence, it is reasonable to
minimize a combination of such norms. We show that, surprisingly, if we use
multi-objective optimization with these norms, then we can do no better,
order-wise, than an algorithm that exploits only one of the present structures.
This result suggests that to fully exploit the multiple structures, we need an
entirely new convex relaxation, i.e. not one that is a function of the convex
relaxations used for each structure. We then specialize our results to the case
of sparse and low-rank matrices. We show that a nonconvex formulation of the
problem can recover the model from very few measurements, which is on the order
of the degrees of freedom of the matrix, whereas the convex problem obtained
from a combination of the and nuclear norms requires many more
measurements. This proves an order-wise gap between the performance of the
convex and nonconvex recovery problems in this case. Our framework applies to
arbitrary structure-inducing norms as well as to a wide range of measurement
ensembles. This allows us to give performance bounds for problems such as
sparse phase retrieval and low-rank tensor completion.Comment: 38 pages, 9 figure
Amplified Dispersive Fourier-Transform Imaging for Ultrafast Displacement Sensing and Barcode Reading
Dispersive Fourier transformation is a powerful technique in which the
spectrum of an optical pulse is mapped into a time-domain waveform using
chromatic dispersion. It replaces a diffraction grating and detector array with
a dispersive fiber and single photodetector. This simplifies the system and,
more importantly, enables fast real-time measurements. Here we describe a novel
ultrafast barcode reader and displacement sensor that employs
internally-amplified dispersive Fourier transformation. This technique
amplifies and simultaneously maps the spectrally encoded barcode into a
temporal waveform. It achieves a record acquisition speed of 25 MHz -- four
orders of magnitude faster than the current state-of-the-art.Comment: Submitted to a journa
MOFSocialNet: Exploiting Metal-Organic Framework Relationships via Social Network Analysis
The number of metal-organic frameworks (MOF) as well as the number of applications of this material are growing rapidly. With the number of characterized compounds exceeding 100,000, manual sorting becomes impossible. At the same time, the increasing computer power and established use of automated machine learning approaches makes data science tools available, that provide an overview of the MOF chemical space and support the selection of suitable MOFs for a desired application. Among the different data science tools, graph theory approaches, where data generated from numerous real-world applications is represented as a graph (network) of interconnected objects, has been widely used in a variety of scientific fields such as social sciences, health informatics, biological sciences, agricultural sciences and economics. We describe the application of a particular graph theory approach known as social network analysis to MOF materials and highlight the importance of community (group) detection and graph node centrality. In this first application of the social network analysis approach to MOF chemical space, we created MOFSocialNet. This social network is based on the geometrical descriptors of MOFs available in the CoRE-MOFs database. MOFSocialNet can discover communities with similar MOFs structures and identify the most representative MOFs within a given community. In addition, analysis of MOFSocialNet using social network analysis methods can predict MOF properties more accurately than conventional ML tools. The latter advantage is demonstrated for the prediction of gas storage properties, the most important property of these porous reticular network
Blocked All-Pairs Shortest Paths Algorithm on Intel Xeon Phi KNL Processor: A Case Study
Manycores are consolidating in HPC community as a way of improving
performance while keeping power efficiency. Knights Landing is the recently
released second generation of Intel Xeon Phi architecture. While optimizing
applications on CPUs, GPUs and first Xeon Phi's has been largely studied in the
last years, the new features in Knights Landing processors require the revision
of programming and optimization techniques for these devices. In this work, we
selected the Floyd-Warshall algorithm as a representative case study of graph
and memory-bound applications. Starting from the default serial version, we
show how data, thread and compiler level optimizations help the parallel
implementation to reach 338 GFLOPS.Comment: Computer Science - CACIC 2017. Springer Communications in Computer
and Information Science, vol 79
- …