285 research outputs found
One-for-All: Generalized LoRA for Parameter-Efficient Fine-tuning
We present Generalized LoRA (GLoRA), an advanced approach for universal
parameter-efficient fine-tuning tasks. Enhancing Low-Rank Adaptation (LoRA),
GLoRA employs a generalized prompt module to optimize pre-trained model weights
and adjust intermediate activations, providing more flexibility and capability
across diverse tasks and datasets. Moreover, GLoRA facilitates efficient
parameter adaptation by employing a scalable, modular, layer-wise structure
search that learns individual adapter of each layer. Originating from a unified
mathematical formulation, GLoRA exhibits strong transfer learning, few-shot
learning and domain generalization abilities, as it adapts to new tasks through
not only weights but also additional dimensions like activations. Comprehensive
experiments demonstrate that GLoRA outperforms all previous methods in natural,
specialized, and structured vision benchmarks, achieving superior accuracy with
fewer parameters and computations. The proposed method on LLaMA-1 and LLaMA-2
also show considerable enhancements compared to the original LoRA in the
language domain. Furthermore, our structural re-parameterization design ensures
that GLoRA incurs no extra inference cost, rendering it a practical solution
for resource-limited applications. Code and models are available at:
https://github.com/Arnav0400/ViT-Slim/tree/master/GLoRA.Comment: Technical report. v2: Add LLaMA-1&2 results. Code and models at
https://github.com/Arnav0400/ViT-Slim/tree/master/GLoR
Interpretable Hyperspectral AI: When Non-Convex Modeling meets Hyperspectral Remote Sensing
Hyperspectral imaging, also known as image spectrometry, is a landmark
technique in geoscience and remote sensing (RS). In the past decade, enormous
efforts have been made to process and analyze these hyperspectral (HS) products
mainly by means of seasoned experts. However, with the ever-growing volume of
data, the bulk of costs in manpower and material resources poses new challenges
on reducing the burden of manual labor and improving efficiency. For this
reason, it is, therefore, urgent to develop more intelligent and automatic
approaches for various HS RS applications. Machine learning (ML) tools with
convex optimization have successfully undertaken the tasks of numerous
artificial intelligence (AI)-related applications. However, their ability in
handling complex practical problems remains limited, particularly for HS data,
due to the effects of various spectral variabilities in the process of HS
imaging and the complexity and redundancy of higher dimensional HS signals.
Compared to the convex models, non-convex modeling, which is capable of
characterizing more complex real scenes and providing the model
interpretability technically and theoretically, has been proven to be a
feasible solution to reduce the gap between challenging HS vision tasks and
currently advanced intelligent data processing models
A multi-biometric iris recognition system based on a deep learning approach
YesMultimodal biometric systems have been widely
applied in many real-world applications due to its ability to
deal with a number of significant limitations of unimodal
biometric systems, including sensitivity to noise, population
coverage, intra-class variability, non-universality, and
vulnerability to spoofing. In this paper, an efficient and
real-time multimodal biometric system is proposed based
on building deep learning representations for images of
both the right and left irises of a person, and fusing the
results obtained using a ranking-level fusion method. The
trained deep learning system proposed is called IrisConvNet
whose architecture is based on a combination of Convolutional
Neural Network (CNN) and Softmax classifier to
extract discriminative features from the input image without
any domain knowledge where the input image represents
the localized iris region and then classify it into one of N
classes. In this work, a discriminative CNN training scheme
based on a combination of back-propagation algorithm and
mini-batch AdaGrad optimization method is proposed for
weights updating and learning rate adaptation, respectively.
In addition, other training strategies (e.g., dropout method,
data augmentation) are also proposed in order to evaluate
different CNN architectures. The performance of the proposed
system is tested on three public datasets collected
under different conditions: SDUMLA-HMT, CASIA-Iris-
V3 Interval and IITD iris databases. The results obtained
from the proposed system outperform other state-of-the-art
of approaches (e.g., Wavelet transform, Scattering transform,
Local Binary Pattern and PCA) by achieving a Rank-1 identification rate of 100% on all the employed databases
and a recognition time less than one second per person
Holistic Dynamic Frequency Transformer for Image Fusion and Exposure Correction
The correction of exposure-related issues is a pivotal component in enhancing
the quality of images, offering substantial implications for various computer
vision tasks. Historically, most methodologies have predominantly utilized
spatial domain recovery, offering limited consideration to the potentialities
of the frequency domain. Additionally, there has been a lack of a unified
perspective towards low-light enhancement, exposure correction, and
multi-exposure fusion, complicating and impeding the optimization of image
processing. In response to these challenges, this paper proposes a novel
methodology that leverages the frequency domain to improve and unify the
handling of exposure correction tasks. Our method introduces Holistic Frequency
Attention and Dynamic Frequency Feed-Forward Network, which replace
conventional correlation computation in the spatial-domain. They form a
foundational building block that facilitates a U-shaped Holistic Dynamic
Frequency Transformer as a filter to extract global information and dynamically
select important frequency bands for image restoration. Complementing this, we
employ a Laplacian pyramid to decompose images into distinct frequency bands,
followed by multiple restorers, each tuned to recover specific frequency-band
information. The pyramid fusion allows a more detailed and nuanced image
restoration process. Ultimately, our structure unifies the three tasks of
low-light enhancement, exposure correction, and multi-exposure fusion, enabling
comprehensive treatment of all classical exposure errors. Benchmarking on
mainstream datasets for these tasks, our proposed method achieves
state-of-the-art results, paving the way for more sophisticated and unified
solutions in exposure correction
MetaPortrait: Identity-Preserving Talking Head Generation with Fast Personalized Adaptation
In this work, we propose an ID-preserving talking head generation framework,
which advances previous methods in two aspects. First, as opposed to
interpolating from sparse flow, we claim that dense landmarks are crucial to
achieving accurate geometry-aware flow fields. Second, inspired by
face-swapping methods, we adaptively fuse the source identity during synthesis,
so that the network better preserves the key characteristics of the image
portrait. Although the proposed model surpasses prior generation fidelity on
established benchmarks, to further make the talking head generation qualified
for real usage, personalized fine-tuning is usually needed. However, this
process is rather computationally demanding that is unaffordable to standard
users. To solve this, we propose a fast adaptation model using a meta-learning
approach. The learned model can be adapted to a high-quality personalized model
as fast as 30 seconds. Last but not the least, a spatial-temporal enhancement
module is proposed to improve the fine details while ensuring temporal
coherency. Extensive experiments prove the significant superiority of our
approach over the state of the arts in both one-shot and personalized settings.Comment: CVPR 2023, project page: https://meta-portrait.github.i
Face Centered Image Analysis Using Saliency and Deep Learning Based Techniques
Image analysis starts with the purpose of configuring vision machines that can perceive like human to intelligently infer general principles and sense the surrounding situations from imagery. This dissertation studies the face centered image analysis as the core problem in high level computer vision research and addresses the problem by tackling three challenging subjects: Are there anything interesting in the image? If there is, what is/are that/they? If there is a person presenting, who is he/she? What kind of expression he/she is performing? Can we know his/her age? Answering these problems results in the saliency-based object detection, deep learning structured objects categorization and recognition, human facial landmark detection and multitask biometrics.
To implement object detection, a three-level saliency detection based on the self-similarity technique (SMAP) is firstly proposed in the work. The first level of SMAP accommodates statistical methods to generate proto-background patches, followed by the second level that implements local contrast computation based on image self-similarity characteristics. At last, the spatial color distribution constraint is considered to realize the saliency detection. The outcome of the algorithm is a full resolution image with highlighted saliency objects and well-defined edges.
In object recognition, the Adaptive Deconvolution Network (ADN) is implemented to categorize the objects extracted from saliency detection. To improve the system performance, L1/2 norm regularized ADN has been proposed and tested in different applications. The results demonstrate the efficiency and significance of the new structure.
To fully understand the facial biometrics related activity contained in the image, the low rank matrix decomposition is introduced to help locate the landmark points on the face images. The natural extension of this work is beneficial in human facial expression recognition and facial feature parsing research.
To facilitate the understanding of the detected facial image, the automatic facial image analysis becomes essential. We present a novel deeply learnt tree-structured face representation to uniformly model the human face with different semantic meanings. We show that the proposed feature yields unified representation in multi-task facial biometrics and the multi-task learning framework is applicable to many other computer vision tasks
Towards Data-centric Graph Machine Learning: Review and Outlook
Data-centric AI, with its primary focus on the collection, management, and
utilization of data to drive AI models and applications, has attracted
increasing attention in recent years. In this article, we conduct an in-depth
and comprehensive review, offering a forward-looking outlook on the current
efforts in data-centric AI pertaining to graph data-the fundamental data
structure for representing and capturing intricate dependencies among massive
and diverse real-life entities. We introduce a systematic framework,
Data-centric Graph Machine Learning (DC-GML), that encompasses all stages of
the graph data lifecycle, including graph data collection, exploration,
improvement, exploitation, and maintenance. A thorough taxonomy of each stage
is presented to answer three critical graph-centric questions: (1) how to
enhance graph data availability and quality; (2) how to learn from graph data
with limited-availability and low-quality; (3) how to build graph MLOps systems
from the graph data-centric view. Lastly, we pinpoint the future prospects of
the DC-GML domain, providing insights to navigate its advancements and
applications.Comment: 42 pages, 9 figure
- …