104 research outputs found
Recommended from our members
Sampling and Learning of the And-Or Graph
The And-Or graph is a tool for knowledge representation. In this thesis we first study thesampling of the And-Or graph with or without context constraints. Without any constrainton the potential functions of the And-Or graph nodes, the positions and shapes of differ-ent components of the face images are not aligned properly. In contrast, with both unaryconstraints and binary constraints, the components are aligned and the samples are morerepresentative of the And-Or graph. We further explore parameter and structure learning ofthe And-Or graph by implementing and applying some existing algorithms. The experimen-tal results on 1D text data and 2D face image data are shown. While there is no apparentdifference between the sampling results of the parameter learned And-Or graph and the trueAnd-Or graph, the sampling results of the structure learned And-Or graph are not perfectand could be further improved
From Synthetic to Real: Unveiling the Power of Synthetic Data for Video Person Re-ID
In this paper, we study a new problem of cross-domain video based person
re-identification (Re-ID). Specifically, we take the synthetic video dataset as
the source domain for training and use the real-world videos for testing, which
significantly reduces the dependence on real training data collection and
annotation. To unveil the power of synthetic data for video person Re-ID, we
first propose a self-supervised domain invariant feature learning strategy for
both static and temporal features. Then, to further improve the person
identification ability in the target domain, we develop a mean-teacher scheme
with the self-supervised ID consistency loss. Experimental results on four real
datasets verify the rationality of cross-synthetic-real domain adaption and the
effectiveness of our method. We are also surprised to find that the synthetic
data performs even better than the real data in the cross-domain setting
MMCosine: Multi-Modal Cosine Loss Towards Balanced Audio-Visual Fine-Grained Learning
Audio-visual learning helps to comprehensively understand the world by fusing
practical information from multiple modalities. However, recent studies show
that the imbalanced optimization of uni-modal encoders in a joint-learning
model is a bottleneck to enhancing the model's performance. We further find
that the up-to-date imbalance-mitigating methods fail on some audio-visual
fine-grained tasks, which have a higher demand for distinguishable feature
distribution. Fueled by the success of cosine loss that builds hyperspherical
feature spaces and achieves lower intra-class angular variability, this paper
proposes Multi-Modal Cosine loss, MMCosine. It performs a modality-wise
normalization to features and weights towards balanced and better multi-modal
fine-grained learning. We demonstrate that our method can alleviate the
imbalanced optimization from the perspective of weight norm and fully exploit
the discriminability of the cosine metric. Extensive experiments prove the
effectiveness of our method and the versatility with advanced multi-modal
fusion strategies and up-to-date imbalance-mitigating methods
Nearest Neighbor Machine Translation is Meta-Optimizer on Output Projection Layer
Nearest Neighbor Machine Translation (NN-MT) has achieved great success in
domain adaptation tasks by integrating pre-trained Neural Machine Translation
(NMT) models with domain-specific token-level retrieval. However, the reasons
underlying its success have not been thoroughly investigated. In this paper, we
comprehensively analyze NN-MT through theoretical and empirical studies.
Initially, we provide new insights into the working mechanism of NN-MT as an
efficient technique to implicitly execute gradient descent on the output
projection layer of NMT, indicating that it is a specific case of model
fine-tuning. Subsequently, we conduct multi-domain experiments and word-level
analysis to examine the differences in performance between NN-MT and
entire-model fine-tuning. Our findings suggest that: (1) Incorporating NN-MT
with adapters yields comparable translation performance to fine-tuning on
in-domain test sets, while achieving better performance on out-of-domain test
sets; (2) Fine-tuning significantly outperforms NN-MT on the recall of
in-domain low-frequency words, but this gap could be bridged by optimizing the
context representations with additional adapter layers.Comment: Accepted by EMNLP202
OmniDrones: An Efficient and Flexible Platform for Reinforcement Learning in Drone Control
In this work, we introduce OmniDrones, an efficient and flexible platform
tailored for reinforcement learning in drone control, built on Nvidia's
Omniverse Isaac Sim. It employs a bottom-up design approach that allows users
to easily design and experiment with various application scenarios on top of
GPU-parallelized simulations. It also offers a range of benchmark tasks,
presenting challenges ranging from single-drone hovering to over-actuated
system tracking. In summary, we propose an open-sourced drone simulation
platform, equipped with an extensive suite of tools for drone learning. It
includes 4 drone models, 5 sensor modalities, 4 control modes, over 10
benchmark tasks, and a selection of widely used RL baselines. To showcase the
capabilities of OmniDrones and to support future research, we also provide
preliminary results on these benchmark tasks. We hope this platform will
encourage further studies on applying RL to practical drone systems.Comment: Submitted to IEEE RA-
Keep it Consistent: Topic-Aware Storytelling from an Image Stream via Iterative Multi-agent Communication
Visual storytelling aims to generate a narrative paragraph from a sequence of
images automatically. Existing approaches construct text description
independently for each image and roughly concatenate them as a story, which
leads to the problem of generating semantically incoherent content. In this
paper, we propose a new way for visual storytelling by introducing a topic
description task to detect the global semantic context of an image stream. A
story is then constructed with the guidance of the topic description. In order
to combine the two generation tasks, we propose a multi-agent communication
framework that regards the topic description generator and the story generator
as two agents and learn them simultaneously via iterative updating mechanism.
We validate our approach on VIST dataset, where quantitative results,
ablations, and human evaluation demonstrate our method's good ability in
generating stories with higher quality compared to state-of-the-art methods.Comment: Accepted to COLING 202
Recommended from our members
Eliminating Contextual Bias in Aspect-based Sentiment Analysis.
Pretrained language models (LMs) have made remarkable achievements in aspect-based sentiment analysis (ABSA). However, it is discovered that these models may struggle in some particular cases (e.g., to detect sentiments expressed towards targeted aspects with only implicit or adversarial expressions). Since it is hard for models to align implicit or adversarial expressions with their corresponding aspects, the sentiments of the targeted aspects would largely be impacted by the expressions towards other aspects in the sentence. We name this phenomenon as contextual bias. To tackle the problem, we propose a flexible aspect-oriented debiasing method (Arde) to eliminate the harmful contextual bias without the need of adjusting the underlying LMs. Intuitively, Arde calibrates the prediction towards the targeted aspect by subtracting the bias towards the context. Favorably, Arde can get theoretical support from counterfactual reasoning theory. Experiments are conducted on SemEval benchmark, and the results show that Arde can empirically improve the accuracy on contextually biased aspect sentiments without degrading the accuracy on unbiased ones. Driven by recent success of large language models (LLMs, e.g., ChatGPT), we further uncover that even LLMs can fail to address certain contextual bias, which yet can be effectively tackled by Arde
A Benchmark of Video-Based Clothes-Changing Person Re-Identification
Person re-identification (Re-ID) is a classical computer vision task and has
achieved great progress so far. Recently, long-term Re-ID with clothes-changing
has attracted increasing attention. However, existing methods mainly focus on
image-based setting, where richer temporal information is overlooked. In this
paper, we focus on the relatively new yet practical problem of clothes-changing
video-based person re-identification (CCVReID), which is less studied. We
systematically study this problem by simultaneously considering the challenge
of the clothes inconsistency issue and the temporal information contained in
the video sequence for the person Re-ID problem. Based on this, we develop a
two-branch confidence-aware re-ranking framework for handling the CCVReID
problem. The proposed framework integrates two branches that consider both the
classical appearance features and cloth-free gait features through a
confidence-guided re-ranking strategy. This method provides the baseline method
for further studies. Also, we build two new benchmark datasets for CCVReID
problem, including a large-scale synthetic video dataset and a real-world one,
both containing human sequences with various clothing changes. We will release
the benchmark and code in this work to the public
Gains and losses from collusion: an empirical study on market behaviors of China’s power enterprises
Purpose: Collusion is a common behavior of oligarch enterprises aiming to get an advantage
in market competition. The purpose of the research is to explore positive or negative effects
from the electricity generation manufacturers’ collusion through statistical analysis approach. To
be exact, these effects are discovered both in market economy at a macro-economic level and in
enterprise behaviors at a micro-economic level.
Design/methodology/approach: This research designs a model as an extension of Porter’s
model (Green & Porter, 1984). In this model FIML is applied. Taking price bidding project
launched in China’s power industry as an example, this paper conducts an empirical research on
its relevant price data collected from subordinate power plants of China’s five power generation
groups in the pilots.
Findings: It is found in this paper that power generation enterprises are facing collusion issues
in the market. To be exact, it is such a situation in which non-cooperative competition and
collusion alternate. Under the competition, market is relatively steady, thus forming a lower
network price. It is helpful to the development of the whole industry. However, once Cartel is
formed, the price will rise and clash with power enterprises and transmission-distribution
companies concerning the interests conflicts. At the same time, a higher power price will form
in the market, making consumers suffer losses. All of these are bad for industry development. Not only the collusion of power enterprises affects power price but also the market power that
caused by long-time Cartel will reduce the market entrant in electricity generation. Market
resources are centralized in the hands of Cartel, causing a low effective competition in the
market, which has passive effects on users.
Implications: The empirical research also indicates that collusion undoubtedly benefits the
power enterprises that involved. As a cooperation pattern, collusion can lead to the synergy
between relevant companies. However, collusion harms the benefits of other market entities.
During the process of enterprises creating common interests cooperatively, collusion may bring
harm to the outside industry.
Originality/value: Using empirical research method, the paper takes China’s power industry as
an example to show the gains and losses of collusion from two aspects, namely market
economy and strategic management.Peer Reviewe
- …