133 research outputs found
Linearized Alternating Direction Method with Parallel Splitting and Adaptive Penalty for Separable Convex Programs in Machine Learning
Many problems in machine learning and other fields can be (re)for-mulated as
linearly constrained separable convex programs. In most of the cases, there are
multiple blocks of variables. However, the traditional alternating direction
method (ADM) and its linearized version (LADM, obtained by linearizing the
quadratic penalty term) are for the two-block case and cannot be naively
generalized to solve the multi-block case. So there is great demand on
extending the ADM based methods for the multi-block case. In this paper, we
propose LADM with parallel splitting and adaptive penalty (LADMPSAP) to solve
multi-block separable convex programs efficiently. When all the component
objective functions have bounded subgradients, we obtain convergence results
that are stronger than those of ADM and LADM, e.g., allowing the penalty
parameter to be unbounded and proving the sufficient and necessary conditions}
for global convergence. We further propose a simple optimality measure and
reveal the convergence rate of LADMPSAP in an ergodic sense. For programs with
extra convex set constraints, with refined parameter estimation we devise a
practical version of LADMPSAP for faster convergence. Finally, we generalize
LADMPSAP to handle programs with more difficult objective functions by
linearizing part of the objective function as well. LADMPSAP is particularly
suitable for sparse representation and low-rank recovery problems because its
subproblems have closed form solutions and the sparsity and low-rankness of the
iterates can be preserved during the iteration. It is also highly
parallelizable and hence fits for parallel or distributed computing. Numerical
experiments testify to the advantages of LADMPSAP in speed and numerical
accuracy.Comment: Preliminary version published on Asian Conference on Machine Learning
201
Study of Rail Transit and Urban Spatial Structure Based on Urban Economics
The spatial changes of utilization intensity of urban lands are decided by the dual substitute relation of transportation costs and rent’s substitute and elements substitute (producer) or consumption substitute (residence). The land use intensity affects the urban spatial form directly. This paper aims to study the relation between construction of rail transit and urban spatial form from the perspectives of urban economics, urban traffic conditions and spatial structure evolution. It takes the metropolitan areas of Tokyo and Singapore as sample cases to analyse the influence of urban development brought by the rail transit
AdvMono3D: Advanced Monocular 3D Object Detection with Depth-Aware Robust Adversarial Training
Monocular 3D object detection plays a pivotal role in the field of autonomous
driving and numerous deep learning-based methods have made significant
breakthroughs in this area. Despite the advancements in detection accuracy and
efficiency, these models tend to fail when faced with such attacks, rendering
them ineffective. Therefore, bolstering the adversarial robustness of 3D
detection models has become a crucial issue that demands immediate attention
and innovative solutions. To mitigate this issue, we propose a depth-aware
robust adversarial training method for monocular 3D object detection, dubbed
DART3D. Specifically, we first design an adversarial attack that iteratively
degrades the 2D and 3D perception capabilities of 3D object detection
models(IDP), serves as the foundation for our subsequent defense mechanism. In
response to this attack, we propose an uncertainty-based residual learning
method for adversarial training. Our adversarial training approach capitalizes
on the inherent uncertainty, enabling the model to significantly improve its
robustness against adversarial attacks. We conducted extensive experiments on
the KITTI 3D datasets, demonstrating that DART3D surpasses direct adversarial
training (the most popular approach) under attacks in 3D object detection
of car category for the Easy, Moderate, and Hard settings, with
improvements of 4.415%, 4.112%, and 3.195%, respectively
Dependency of the Finite-Impulse-Response-Based Head-Related Impulse Response Model on Filter Order
Various approaches have been reported on HRIR modeling to lighten the high computation cost of the 3-D audio systems without sacrificing the quality of the rendered sounds. The performance of these HRIR models have been widely evaluated usually in terms of the objective estimation errors between the original measured HRIRs and the modeled HRIRs. However, it is still unclear how much these objective evaluation results match the psychoacoustic evaluations. In this research, an efficient finite-impulse-response (FIR) model is studied as a case study which is essentially based on the concept of the minimum-phase modeling technique. The accuracy dependency of this modeling approach on the order of FIR filter is examined with the objective estimation errors and the psychoacoustic tests. In the psychoacoustic tests, the MIT HRIR database are exploited and evaluated in terms of sound source localization difference and sound quality difference by comparing the synthesized stimuli with the measured HRIRs and those with the FIR models of different orders. Results indicated that the measured hundred-sample-length HRIRs can be sufficiently modeled by the low-order FIR model from the perceptual point of view, and provided the relationship between perceptual sound localization/quality difference and the objective estimation results that should be useful for evaluating the other HRIR modeling approaches
Fearless Luminance Adaptation: A Macro-Micro-Hierarchical Transformer for Exposure Correction
Photographs taken with less-than-ideal exposure settings often display poor
visual quality. Since the correction procedures vary significantly, it is
difficult for a single neural network to handle all exposure problems.
Moreover, the inherent limitations of convolutions, hinder the models ability
to restore faithful color or details on extremely over-/under- exposed regions.
To overcome these limitations, we propose a Macro-Micro-Hierarchical
transformer, which consists of a macro attention to capture long-range
dependencies, a micro attention to extract local features, and a hierarchical
structure for coarse-to-fine correction. In specific, the complementary
macro-micro attention designs enhance locality while allowing global
interactions. The hierarchical structure enables the network to correct
exposure errors of different scales layer by layer. Furthermore, we propose a
contrast constraint and couple it seamlessly in the loss function, where the
corrected image is pulled towards the positive sample and pushed away from the
dynamically generated negative samples. Thus the remaining color distortion and
loss of detail can be removed. We also extend our method as an image enhancer
for low-light face recognition and low-light semantic segmentation. Experiments
demonstrate that our approach obtains more attractive results than
state-of-the-art methods quantitatively and qualitatively.Comment: Accepted by ACM MM 202
Multiple Methods to Partition Evapotranspiration in a Maize Field
Partitioning evapotranspiration (ET) into soil evaporation E and plant transpiration T is important, but it is still a theoretical and technical challenge. The isotopic technique is considered to be an effective method, but it is difficult to quantify the isotopic composition of transpiration δT and evaporation δE directly and continuously; few previous studies determined δT successfully under a non-steady state (NSS). Here, multiple methods were used to partition ET in a maize field and a new flow-through chamber system was refined to provide direct and continuous measurement of δT and δE. An eddy covariance and lysimeter (EC-L)-based method and two isotope-based methods [isotope combined with the Craig–Gordon model (Iso-CG) and isotope using chamber measurement (Iso-M)] were applied to partition ET. Results showed the transpiration fraction FT in Iso-CG was consistent with EC-L at both diurnal and growing season time scales, but FT calculated by Iso-M was less than Iso-CG and EC-L. The chamber system method presented here to determine δT under NSS and isotope steady state (ISS) was robust, but there could be some deviation in measuring δE. The FT varied from 52% to 91%, with a mean of 78% during the entire growing season, and it was well described by a function of LAI, with a nonlinear relationship of FT = 0.71LAI0.14. The results demonstrated the feasibility of the isotope-based chamber system to partition ET. This technique and its further development may enable field ET partitioning accurately and continuously and improve understanding of water cycling through the soil–plant–atmosphere continuum
Online low-rank representation learning for joint multi-subspace recovery and clustering
Benefiting from global rank constraints, the lowrank
representation (LRR) method has been shown to be an
effective solution to subspace learning. However, the global
mechanism also means that the LRR model is not suitable for
handling large-scale data or dynamic data. For large-scale data,
the LRR method suffers from high time complexity, and for
dynamic data, it has to recompute a complex rank minimization
for the entire data set whenever new samples are dynamically
added, making it prohibitively expensive. Existing attempts to
online LRR either take a stochastic approach or build the
representation purely based on a small sample set and treat
new input as out-of-sample data. The former often requires
multiple runs for good performance and thus takes longer time
to run, and the latter formulates online LRR as an out-ofsample
classification problem and is less robust to noise. In
this paper, a novel online low-rank representation subspace
learning method is proposed for both large-scale and dynamic
data. The proposed algorithm is composed of two stages: static
learning and dynamic updating. In the first stage, the subspace
structure is learned from a small number of data samples. In
the second stage, the intrinsic principal components of the entire
data set are computed incrementally by utilizing the learned
subspace structure, and the low-rank representation matrix can
also be incrementally solved by an efficient online singular value
decomposition (SVD) algorithm. The time complexity is reduced
dramatically for large-scale data, and repeated computation is
avoided for dynamic problems. We further perform theoretical
analysis comparing the proposed online algorithm with the batch
LRR method. Finally, experimental results on typical tasks
of subspace recovery and subspace clustering show that the
proposed algorithm performs comparably or better than batch
methods including the batch LRR, and significantly outperforms
state-of-the-art online methods
ASF-Net: Robust Video Deraining via Temporal Alignment and Online Adaptive Learning
In recent times, learning-based methods for video deraining have demonstrated
commendable results. However, there are two critical challenges that these
methods are yet to address: exploiting temporal correlations among adjacent
frames and ensuring adaptability to unknown real-world scenarios. To overcome
these challenges, we explore video deraining from a paradigm design perspective
to learning strategy construction. Specifically, we propose a new computational
paradigm, Alignment-Shift-Fusion Network (ASF-Net), which incorporates a
temporal shift module. This module is novel to this field and provides deeper
exploration of temporal information by facilitating the exchange of
channel-level information within the feature space. To fully discharge the
model's characterization capability, we further construct a LArge-scale RAiny
video dataset (LARA) which also supports the development of this community. On
the basis of the newly-constructed dataset, we explore the parameters learning
process by developing an innovative re-degraded learning strategy. This
strategy bridges the gap between synthetic and real-world scenes, resulting in
stronger scene adaptability. Our proposed approach exhibits superior
performance in three benchmarks and compelling visual quality in real-world
scenarios, underscoring its efficacy. The code is available at
https://github.com/vis-opt-group/ASF-Net
From Text to Pixels: A Context-Aware Semantic Synergy Solution for Infrared and Visible Image Fusion
With the rapid progression of deep learning technologies, multi-modality
image fusion has become increasingly prevalent in object detection tasks.
Despite its popularity, the inherent disparities in how different sources
depict scene content make fusion a challenging problem. Current fusion
methodologies identify shared characteristics between the two modalities and
integrate them within this shared domain using either iterative optimization or
deep learning architectures, which often neglect the intricate semantic
relationships between modalities, resulting in a superficial understanding of
inter-modal connections and, consequently, suboptimal fusion outcomes. To
address this, we introduce a text-guided multi-modality image fusion method
that leverages the high-level semantics from textual descriptions to integrate
semantics from infrared and visible images. This method capitalizes on the
complementary characteristics of diverse modalities, bolstering both the
accuracy and robustness of object detection. The codebook is utilized to
enhance a streamlined and concise depiction of the fused intra- and
inter-domain dynamics, fine-tuned for optimal performance in detection tasks.
We present a bilevel optimization strategy that establishes a nexus between the
joint problem of fusion and detection, optimizing both processes concurrently.
Furthermore, we introduce the first dataset of paired infrared and visible
images accompanied by text prompts, paving the way for future research.
Extensive experiments on several datasets demonstrate that our method not only
produces visually superior fusion results but also achieves a higher detection
mAP over existing methods, achieving state-of-the-art results.Comment: 10 pages, 12 figures, 3 tables, conferenc
- …