469 research outputs found
PELA: Learning Parameter-Efficient Models with Low-Rank Approximation
Applying a pre-trained large model to downstream tasks is prohibitive under
resource-constrained conditions. Recent dominant approaches for addressing
efficiency issues involve adding a few learnable parameters to the fixed
backbone model. This strategy, however, leads to more challenges in loading
large models for downstream fine-tuning with limited resources. In this paper,
we propose a novel method for increasing the parameter efficiency of
pre-trained models by introducing an intermediate pre-training stage. To this
end, we first employ low-rank approximation to compress the original large
model and then devise a feature distillation module and a weight perturbation
regularization module. These modules are specifically designed to enhance the
low-rank model. In particular, we update only the low-rank model while freezing
the backbone parameters during pre-training. This allows for direct and
efficient utilization of the low-rank model for downstream fine-tuning tasks.
The proposed method achieves both efficiencies in terms of required parameters
and computation time while maintaining comparable results with minimal
modifications to the backbone architecture. Specifically, when applied to three
vision-only and one vision-language Transformer models, our approach often
demonstrates a merely 0.6 point decrease in performance while reducing
the original parameter size by 1/3 to 2/3
Mining Conditional Part Semantics with Occluded Extrapolation for Human-Object Interaction Detection
Human-Object Interaction Detection is a crucial aspect of human-centric scene
understanding, with important applications in various domains. Despite recent
progress in this field, recognizing subtle and detailed interactions remains
challenging. Existing methods try to use human-related clues to alleviate the
difficulty, but rely heavily on external annotations or knowledge, limiting
their practical applicability in real-world scenarios. In this work, we propose
a novel Part Semantic Network (PSN) to solve this problem. The core of PSN is a
Conditional Part Attention (CPA) mechanism, where human features are taken as
keys and values, and the object feature is used as query for the computation in
a cross-attention mechanism. In this way, our model learns to automatically
focus on the most informative human parts conditioned on the involved object,
generating more semantically meaningful features for interaction recognition.
Additionally, we propose an Occluded Part Extrapolation (OPE) strategy to
facilitate interaction recognition under occluded scenarios, which teaches the
model to extrapolate detailed features from partially occluded ones. Our method
consistently outperforms prior approaches on the V-COCO and HICO-DET datasets,
without external data or extra annotations. Additional ablation studies
validate the effectiveness of each component of our proposed method.Comment: Preprin
Seasonal Variations of the Antioxidant Composition in Ground Bamboo Sasa argenteastriatus Leaves
Sasa argenteastriatus, with abundant active compounds and high antioxidant activity in leaves, is a new leafy bamboo grove suitable for exploitation. To utilize it more effectively and scientifically, we investigate the seasonal variations of antioxidant composition in its leaves and antioxidant activity. The leaves of Sasa argenteastriatus were collected on the 5th day of each month in three same-sized sample plots from May 2009 to May 2011. The total flavonoids (TF): phenolics (TP) and triterpenoid (TT) of bamboo leaves were extracted and the contents analyzed by UV-spectrophotometer. Our data showed that all exhibited variations with the changing seasons, with the highest levels appearing in November to March. Antioxidant activity was measured using DPPH and FRAP methods. The highest antioxidant activity appeared in December with the lowest in May. Correlation analyses demonstrated that TP and TF exhibited high correlation with bamboo antioxidant activity. Eight bamboo characteristic compounds (orientin, isoorientin, vitexin, homovitexin and p-coumaric acid, chlorogenic acid, caffeic acid, ferulic acid) were determined by RP-HPLC synchronously. We found that chlorogenic acid, isoorientin and vitexin are the main compounds in Sasa argenteastriatus leaves and the content of isovitexin and chlorogenic acid showed a similar seasonal variation with the TF, TP and TT. Our results suggested that the optimum season for harvesting Sasa argenteastriatus leaves is between autumn and winter
Efficient Distributed Solution for MPLS Fast Reroute
As service providers move more applications to their IP/MPLS (Multiple Protocol Label Switching) networks, rapid restoration upon failure becomes more and more crucial. Recently MPLS fast reroute has attracted lots of attention as it was designed to meet the needs of real-time applications, such as voice over IP. MPLS fast reroute achieves rapid restoration by computing and signaling backup label switched paths (LSP) in advance and re-directing traffic as close to failure point as possible. To provide a guarantee of failure restoration, extra bandwidth has to be reserved on backup LSPs. To improve the bandwidth utilization, path-merging technique was proposed to allow bandwidth sharing on common links among a service LSP and its backup LSPs. However, the sharing is very limited. In this paper, we provide efficient distributed solution, which would allow much broader bandwidth sharing among any backup LSPs from different service LSPs. We also propose an efficient algorithm for backup path selection to further increase the bandwidth sharing. The associated signaling extension for additional information distribution and collection is provided. To evaluate our solution, we compare its performance with the MPLS fast reroute proposal in IETF via simulation. The key figure-of-merit for restoration capacity efficiency is restoration overbuild, i.e., the ratio of restoration capacity to service capacity. Our simulation results show that our distributed solution reduces restoration overbuild from 2.5 to 1, and our optimized backup path selection further reduces restoration overbuild to about 0.5
SEED-Bench: Benchmarking Multimodal LLMs with Generative Comprehension
Based on powerful Large Language Models (LLMs), recent generative Multimodal
Large Language Models (MLLMs) have gained prominence as a pivotal research
area, exhibiting remarkable capability for both comprehension and generation.
In this work, we address the evaluation of generative comprehension in MLLMs as
a preliminary step towards a comprehensive assessment of generative models, by
introducing a benchmark named SEED-Bench. SEED-Bench consists of 19K multiple
choice questions with accurate human annotations (x 6 larger than existing
benchmarks), which spans 12 evaluation dimensions including the comprehension
of both the image and video modality. We develop an advanced pipeline for
generating multiple-choice questions that target specific evaluation
dimensions, integrating both automatic filtering and manual verification
processes. Multiple-choice questions with groundtruth options derived from
human annotation enables an objective and efficient assessment of model
performance, eliminating the need for human or GPT intervention during
evaluation. We further evaluate the performance of 18 models across all 12
dimensions, covering both the spatial and temporal understanding. By revealing
the limitations of existing MLLMs through evaluation results, we aim for
SEED-Bench to provide insights for motivating future research. We will launch
and consistently maintain a leaderboard to provide a platform for the community
to assess and investigate model capability.Comment: Technical Report; Project released at:
https://github.com/AILab-CVC/SEED-Benc
Efficient Task Offloading Algorithm for Digital Twin in Edge/Cloud Computing Environment
In the era of Internet of Things (IoT), Digital Twin (DT) is envisioned to
empower various areas as a bridge between physical objects and the digital
world. Through virtualization and simulation techniques, multiple functions can
be achieved by leveraging computing resources. In this process, Mobile Cloud
Computing (MCC) and Mobile Edge Computing (MEC) have become two of the key
factors to achieve real-time feedback. However, current works only considered
edge servers or cloud servers in the DT system models. Besides, The models
ignore the DT with not only one data resource. In this paper, we propose a new
DT system model considering a heterogeneous MEC/MCC environment. Each DT in the
model is maintained in one of the servers via multiple data collection devices.
The offloading decision-making problem is also considered and a new offloading
scheme is proposed based on Distributed Deep Learning (DDL). Simulation results
demonstrate that our proposed algorithm can effectively and efficiently
decrease the system's average latency and energy consumption. Significant
improvement is achieved compared with the baselines under the dynamic
environment of DTs
SEED-Bench-2: Benchmarking Multimodal Large Language Models
Multimodal large language models (MLLMs), building upon the foundation of
powerful large language models (LLMs), have recently demonstrated exceptional
capabilities in generating not only texts but also images given interleaved
multimodal inputs (acting like a combination of GPT-4V and DALL-E 3). However,
existing MLLM benchmarks remain limited to assessing only models' comprehension
ability of single image-text inputs, failing to keep up with the strides made
in MLLMs. A comprehensive benchmark is imperative for investigating the
progress and uncovering the limitations of current MLLMs. In this work, we
categorize the capabilities of MLLMs into hierarchical levels from to
based on the modalities they can accept and generate, and propose
SEED-Bench-2, a comprehensive benchmark that evaluates the
\textbf{hierarchical} capabilities of MLLMs. Specifically, SEED-Bench-2
comprises 24K multiple-choice questions with accurate human annotations, which
spans 27 dimensions, including the evaluation of both text and image
generation. Multiple-choice questions with groundtruth options derived from
human annotation enables an objective and efficient assessment of model
performance, eliminating the need for human or GPT intervention during
evaluation. We further evaluate the performance of 23 prominent open-source
MLLMs and summarize valuable observations. By revealing the limitations of
existing MLLMs through extensive evaluations, we aim for SEED-Bench-2 to
provide insights that will motivate future research towards the goal of General
Artificial Intelligence. Dataset and evaluation code are available at
\href{https://github.com/AILab-CVC/SEED-Bench}Comment: Project released at: https://github.com/AILab-CVC/SEED-Bench. arXiv
admin note: text overlap with arXiv:2307.1612
- …