5,558 research outputs found
Approximate message passing for nonconvex sparse regularization with stability and asymptotic analysis
We analyse a linear regression problem with nonconvex regularization called
smoothly clipped absolute deviation (SCAD) under an overcomplete Gaussian basis
for Gaussian random data. We propose an approximate message passing (AMP)
algorithm considering nonconvex regularization, namely SCAD-AMP, and
analytically show that the stability condition corresponds to the de
Almeida--Thouless condition in spin glass literature. Through asymptotic
analysis, we show the correspondence between the density evolution of SCAD-AMP
and the replica symmetric solution. Numerical experiments confirm that for a
sufficiently large system size, SCAD-AMP achieves the optimal performance
predicted by the replica method. Through replica analysis, a phase transition
between replica symmetric (RS) and replica symmetry breaking (RSB) region is
found in the parameter space of SCAD. The appearance of the RS region for a
nonconvex penalty is a significant advantage that indicates the region of
smooth landscape of the optimization problem. Furthermore, we analytically show
that the statistical representation performance of the SCAD penalty is better
than that of L1-based methods, and the minimum representation error under RS
assumption is obtained at the edge of the RS/RSB phase. The correspondence
between the convergence of the existing coordinate descent algorithm and RS/RSB
transition is also indicated
Statistical Mechanics of High-Dimensional Inference
To model modern large-scale datasets, we need efficient algorithms to infer a
set of unknown model parameters from noisy measurements. What are
fundamental limits on the accuracy of parameter inference, given finite
signal-to-noise ratios, limited measurements, prior information, and
computational tractability requirements? How can we combine prior information
with measurements to achieve these limits? Classical statistics gives incisive
answers to these questions as the measurement density . However, these classical results are not
relevant to modern high-dimensional inference problems, which instead occur at
finite . We formulate and analyze high-dimensional inference as a
problem in the statistical physics of quenched disorder. Our analysis uncovers
fundamental limits on the accuracy of inference in high dimensions, and reveals
that widely cherished inference algorithms like maximum likelihood (ML) and
maximum-a posteriori (MAP) inference cannot achieve these limits. We further
find optimal, computationally tractable algorithms that can achieve these
limits. Intriguingly, in high dimensions, these optimal algorithms become
computationally simpler than MAP and ML, while still outperforming them. For
example, such optimal algorithms can lead to as much as a 20% reduction in the
amount of data to achieve the same performance relative to MAP. Moreover, our
analysis reveals simple relations between optimal high dimensional inference
and low dimensional scalar Bayesian inference, insights into the nature of
generalization and predictive power in high dimensions, information theoretic
limits on compressed sensing, phase transitions in quadratic inference, and
connections to central mathematical objects in convex optimization theory and
random matrix theory.Comment: See http://ganguli-gang.stanford.edu/pdf/HighDimInf.Supp.pdf for
supplementary materia
Replica Creation Algorithm for Data Grids
Data grid system is a data management infrastructure that facilitates reliable access and sharing of large amount of data, storage resources, and data transfer services that can be scaled across distributed locations. This thesis presents a new replication algorithm that improves data access performance in data grids by distributing relevant data copies around the grid. The new Data Replica Creation Algorithm (DRCM) improves performance of data grid systems by reducing job execution time and making the best use of data grid resources (network bandwidth and storage space). Current algorithms focus on number of accesses in deciding which file to replicate and where to place them, which ignores resources’ capabilities. DRCM differs by considering both user and resource perspectives; strategically placing replicas at locations that provide the lowest transfer cost. The proposed algorithm uses three strategies: Replica Creation and Deletion Strategy (RCDS), Replica Placement Strategy (RPS), and Replica Replacement Strategy (RRS). DRCM was evaluated using network simulation (OptorSim) based on selected performance metrics (mean job execution time, efficient network usage, average storage usage, and computing element usage), scenarios, and topologies. Results revealed better job execution time with lower resource consumption than existing approaches. This research contributes replication strategies embodied in one algorithm that enhances data grid performance, capable of making a decision on creating or deleting more than one file during same decision. Furthermore, dependency-level-between-files criterion was utilized and integrated with the exponential growth/decay model to give an accurate file evaluation
- …