10 research outputs found
Rethinking the Evaluation of Unbiased Scene Graph Generation
Since the severe imbalanced predicate distributions in common subject-object
relations, current Scene Graph Generation (SGG) methods tend to predict
frequent predicate categories and fail to recognize rare ones. To improve the
robustness of SGG models on different predicate categories, recent research has
focused on unbiased SGG and adopted mean Recall@K (mR@K) as the main evaluation
metric. However, we discovered two overlooked issues about this de facto
standard metric mR@K, which makes current unbiased SGG evaluation vulnerable
and unfair: 1) mR@K neglects the correlations among predicates and
unintentionally breaks category independence when ranking all the triplet
predictions together regardless of the predicate categories, leading to the
performance of some predicates being underestimated. 2) mR@K neglects the
compositional diversity of different predicates and assigns excessively high
weights to some oversimple category samples with limited composable relation
triplet types. It totally conflicts with the goal of SGG task which encourages
models to detect more types of visual relationship triplets. In addition, we
investigate the under-explored correlation between objects and predicates,
which can serve as a simple but strong baseline for unbiased SGG. In this
paper, we refine mR@K and propose two complementary evaluation metrics for
unbiased SGG: Independent Mean Recall (IMR) and weighted IMR (wIMR). These two
metrics are designed by considering the category independence and diversity of
composable relation triplets, respectively. We compare the proposed metrics
with the de facto standard metrics through extensive experiments and discuss
the solutions to evaluate unbiased SGG in a more trustworthy way
Boundary Proposal Network for Two-Stage Natural Language Video Localization
We aim to address the problem of Natural Language Video Localization
(NLVL)-localizing the video segment corresponding to a natural language
description in a long and untrimmed video. State-of-the-art NLVL methods are
almost in one-stage fashion, which can be typically grouped into two
categories: 1) anchor-based approach: it first pre-defines a series of video
segment candidates (e.g., by sliding window), and then does classification for
each candidate; 2) anchor-free approach: it directly predicts the probabilities
for each video frame as a boundary or intermediate frame inside the positive
segment. However, both kinds of one-stage approaches have inherent drawbacks:
the anchor-based approach is susceptible to the heuristic rules, further
limiting the capability of handling videos with variant length. While the
anchor-free approach fails to exploit the segment-level interaction thus
achieving inferior results. In this paper, we propose a novel Boundary Proposal
Network (BPNet), a universal two-stage framework that gets rid of the issues
mentioned above. Specifically, in the first stage, BPNet utilizes an
anchor-free model to generate a group of high-quality candidate video segments
with their boundaries. In the second stage, a visual-language fusion layer is
proposed to jointly model the multi-modal interaction between the candidate and
the language query, followed by a matching score rating layer that outputs the
alignment score for each candidate. We evaluate our BPNet on three challenging
NLVL benchmarks (i.e., Charades-STA, TACoS and ActivityNet-Captions). Extensive
experiments and ablative studies on these datasets demonstrate that the BPNet
outperforms the state-of-the-art methods.Comment: AAAI 202
Rethinking Multi-Modal Alignment in Video Question Answering from Feature and Sample Perspectives
Reasoning about causal and temporal event relations in videos is a new
destination of Video Question Answering (VideoQA).The major stumbling block to
achieve this purpose is the semantic gap between language and video since they
are at different levels of abstraction. Existing efforts mainly focus on
designing sophisticated architectures while utilizing frame- or object-level
visual representations. In this paper, we reconsider the multi-modal alignment
problem in VideoQA from feature and sample perspectives to achieve better
performance. From the view of feature,we break down the video into trajectories
and first leverage trajectory feature in VideoQA to enhance the alignment
between two modalities. Moreover, we adopt a heterogeneous graph architecture
and design a hierarchical framework to align both trajectory-level and
frame-level visual feature with language feature. In addition, we found that
VideoQA models are largely dependent on language priors and always neglect
visual-language interactions. Thus, two effective yet portable training
augmentation strategies are designed to strengthen the cross-modal
correspondence ability of our model from the view of sample. Extensive results
show that our method outperforms all the state-of-the-art models on the
challenging NExT-QA benchmark, which demonstrates the effectiveness of the
proposed method
A coevolutionary algorithm with detection and supervision strategy for constrained multiobjective optimization
The file attached to this record is the author's final peer reviewed version. The Publisher's final version can be found by following the DOI link.Balancing objectives and constraints is challenging in addressing constrained multiobjective optimization problems (CMOPs). Existing methods may have limitations in handling various CMOPs due to the complex geometries of the Pareto front (PF). And the complexity arises from the constraints that narrow the feasible region. Categorizing problems based on their geometric characteristics facilitates facing this challenge. For this purpose, this article proposes a novel constrained multiobjective optimization framework with detection and supervision phases, called COEA-DAS. The framework categorizes the problems into four types based on the overlap between the obtained approximate unconstrained PF and constrained PF to guide the coevolution of the two populations. In the detection phase, the detection population approaches the unconstrained PF ignoring the constraints. The main population is guided by the detection population to cross infeasible barriers and approximate the constrained PF. In the supervision phase, specialized evolutionary mechanisms are designed for each possible problem type. The detection population maintains evolution to assist the main population in spreading along the constrained PF. Meanwhile, the supervision strategy is conducted to reevaluate the problem types based on the evolutionary state of the populations. This idea of balancing constraints and objectives based on the type of problem provides a novel approach for more effectively addressing the CMOPs. Experimental results indicate that the proposed algorithm performs better or more competitively on 57 benchmark problems and 12 real-world CMOPs compared with eight state-of-the-art algorithms
Efficient Delivery of Curcumin by Alginate Oligosaccharide Coated Aminated Mesoporous Silica Nanoparticles and In Vitro Anticancer Activity against Colon Cancer Cells
We designed and synthesized aminated mesoporous silica (MSN-NH2), and functionally grafted alginate oligosaccharides (AOS) on its surface to get MSN-NH2-AOS nanoparticles as a delivery vehicle for the fat-soluble model drug curcumin (Cur). Dynamic light scattering, thermogravimetric analysis, and X-ray photoelectron spectroscopy were used to characterize the structure and performance of MSN-NH2-AOS. The nano-MSN-NH2-AOS preparation process was optimized, and the drug loading and encapsulation efficiencies of nano-MSN-NH2-AOS were investigated. The encapsulation efficiency of the MSN-NH2-Cur-AOS nanoparticles was up to 91.24 ± 1.23%. The pH-sensitive AOS coating made the total release rate of Cur only 28.9 ± 1.6% under neutral conditions and 67.5 ± 1% under acidic conditions. According to the results of in vitro anti-tumor studies conducted by MTT and cellular uptake assays, the MSN-NH2-Cur-AOS nanoparticles were more easily absorbed by colon cancer cells than free Cur, achieving a high tumor cell targeting efficiency. Moreover, when the concentration of Cur reached 50 μg/mL, MSN-NH2-Cur-AOS nanoparticles showed strong cytotoxicity against tumor cells, indicating that MSN-NH2-AOS might be a promising tool as a novel fat-soluble anticancer drug carrier