140 research outputs found
A Multiagent Evolutionary Algorithm for the Resource-Constrained Project Portfolio Selection and Scheduling Problem
A multiagent evolutionary algorithm is proposed to solve the resource-constrained project portfolio selection and scheduling problem. The proposed algorithm has a dual level structure. In the upper level a set of agents make decisions to select appropriate project portfolios. Each agent selects its project portfolio independently. The neighborhood competition operator and self-learning operator are designed to improve the agent’s energy, that is, the portfolio profit. In the lower level the selected projects are scheduled simultaneously and completion times are computed to estimate the expected portfolio profit. A priority rule-based heuristic is used by each agent to solve the multiproject scheduling problem. A set of instances were generated systematically from the widely used Patterson set. Computational experiments confirmed that the proposed evolutionary algorithm is effective for the resource-constrained project portfolio selection and scheduling problem
Turning a CLIP Model into a Scene Text Detector
The recent large-scale Contrastive Language-Image Pretraining (CLIP) model
has shown great potential in various downstream tasks via leveraging the
pretrained vision and language knowledge. Scene text, which contains rich
textual and visual information, has an inherent connection with a model like
CLIP. Recently, pretraining approaches based on vision language models have
made effective progresses in the field of text detection. In contrast to these
works, this paper proposes a new method, termed TCM, focusing on Turning the
CLIP Model directly for text detection without pretraining process. We
demonstrate the advantages of the proposed TCM as follows: (1) The underlying
principle of our framework can be applied to improve existing scene text
detector. (2) It facilitates the few-shot training capability of existing
methods, e.g., by using 10% of labeled data, we significantly improve the
performance of the baseline method with an average of 22% in terms of the
F-measure on 4 benchmarks. (3) By turning the CLIP model into existing scene
text detection methods, we further achieve promising domain adaptation ability.
The code will be publicly released at https://github.com/wenwenyu/TCM.Comment: CVPR202
Turning a CLIP Model into a Scene Text Spotter
We exploit the potential of the large-scale Contrastive Language-Image
Pretraining (CLIP) model to enhance scene text detection and spotting tasks,
transforming it into a robust backbone, FastTCM-CR50. This backbone utilizes
visual prompt learning and cross-attention in CLIP to extract image and
text-based prior knowledge. Using predefined and learnable prompts,
FastTCM-CR50 introduces an instance-language matching process to enhance the
synergy between image and text embeddings, thereby refining text regions. Our
Bimodal Similarity Matching (BSM) module facilitates dynamic language prompt
generation, enabling offline computations and improving performance.
FastTCM-CR50 offers several advantages: 1) It can enhance existing text
detectors and spotters, improving performance by an average of 1.7% and 1.5%,
respectively. 2) It outperforms the previous TCM-CR50 backbone, yielding an
average improvement of 0.2% and 0.56% in text detection and spotting tasks,
along with a 48.5% increase in inference speed. 3) It showcases robust few-shot
training capabilities. Utilizing only 10% of the supervised data, FastTCM-CR50
improves performance by an average of 26.5% and 5.5% for text detection and
spotting tasks, respectively. 4) It consistently enhances performance on
out-of-distribution text detection and spotting datasets, particularly the
NightTime-ArT subset from ICDAR2019-ArT and the DOTA dataset for oriented
object detection. The code is available at https://github.com/wenwenyu/TCM.Comment: arXiv admin note: text overlap with arXiv:2302.1433
FaD-VLP: Fashion Vision-and-Language Pre-training towards Unified Retrieval and Captioning
Multimodal tasks in the fashion domain have significant potential for
e-commerce, but involve challenging vision-and-language learning problems -
e.g., retrieving a fashion item given a reference image plus text feedback from
a user. Prior works on multimodal fashion tasks have either been limited by the
data in individual benchmarks, or have leveraged generic vision-and-language
pre-training but have not taken advantage of the characteristics of fashion
data. Additionally, these works have mainly been restricted to multimodal
understanding tasks. To address these gaps, we make two key contributions.
First, we propose a novel fashion-specific pre-training framework based on
weakly-supervised triplets constructed from fashion image-text pairs. We show
the triplet-based tasks are an effective addition to standard multimodal
pre-training tasks. Second, we propose a flexible decoder-based model
architecture capable of both fashion retrieval and captioning tasks. Together,
our model design and pre-training approach are competitive on a diverse set of
fashion tasks, including cross-modal retrieval, image retrieval with text
feedback, image captioning, relative image captioning, and multimodal
categorization.Comment: 14 pages, 4 figures. To appear at Conference on Empirical Methods in
Natural Language Processing (EMNLP) 202
Looking and Listening: Audio Guided Text Recognition
Text recognition in the wild is a long-standing problem in computer vision.
Driven by end-to-end deep learning, recent studies suggest vision and language
processing are effective for scene text recognition. Yet, solving edit errors
such as add, delete, or replace is still the main challenge for existing
approaches. In fact, the content of the text and its audio are naturally
corresponding to each other, i.e., a single character error may result in a
clear different pronunciation. In this paper, we propose the AudioOCR, a simple
yet effective probabilistic audio decoder for mel spectrogram sequence
prediction to guide the scene text recognition, which only participates in the
training phase and brings no extra cost during the inference stage. The
underlying principle of AudioOCR can be easily applied to the existing
approaches. Experiments using 7 previous scene text recognition methods on 12
existing regular, irregular, and occluded benchmarks demonstrate our proposed
method can bring consistent improvement. More importantly, through our
experimentation, we show that AudioOCR possesses a generalizability that
extends to more challenging scenarios, including recognizing non-English text,
out-of-vocabulary words, and text with various accents. Code will be available
at https://github.com/wenwenyu/AudioOCR
Antifungal active ingredient from the twigs and leaves of Clausena lansium Lour. Skeels (Rutaceae)
Two novel amides, named clauphenamides A and B, and twelve other known compounds were isolated from the twigs and leaves of Clausena lansium Lour. Skeels (Rutaceae). Their structures were elucidated on the basis of extensive spectroscopic analysis and comparison with data reported in the literature. Clauphenamide A (1) featured in the unit of N-2-(4,8-dimethoxyfuro [2,3-b]quinolin-7-yl)vinyl, and clauphenamide B (2) was a unprecedented N-phenethyl cinnamide dimer. Other known compounds belong to pyrrolidone amides (3 and 4), furacoumarins (7–10), simple coumarins (11–14), lignan (5) and sesquiterpene (6). Compounds 5, 6, 10 and 12 were separated from the genus (Clausena) for the first time, while 13 was isolated in the species (C. lansium) for the first time. The antifungal activities of the isolated compounds were assayed. As a result, at the concentration of 100 μg/ml, compared with the control (chlorothalonil, inhibition rate of 83.67%), compounds 1 and 2 were found to exhibit moderate antifungal activity against B. dothidea with inhibition rates of 68.39% and 52.05%, respectively. Compounds 11–14 also exhibited moderate activity against B. dothidea and F. oxysporum, with inhibition rates greater than 40%. In addition, compared with the control (chlorothalonil, inhibition rate of 69.02%), compounds 11–14 showed strong antifungal activity to P. oryzae, with inhibition rates greater than 55%. Among them, compound 14 has the strongest antifungal activity against P. oryzae, and the inhibition rate (65.44%) is close to that of the control chlorothalonil. Additionally, the structure-activity relationships of the separated compounds are also discussed preliminarily in this paper
Carrier localization and electronic phase separation in a doped spin-orbit driven Mott phase in Sr3(Ir1-xRux)2O7
Interest in many strongly spin-orbit coupled 5d-transition metal oxide
insulators stems from mapping their electronic structures to a J=1/2 Mott
phase. One of the hopes is to establish their Mott parent states and explore
these systems' potential of realizing novel electronic states upon carrier
doping. However, once doped, little is understood regarding the role of their
reduced Coulomb interaction U relative to their strongly correlated 3d-electron
cousins. Here we show that, upon hole-doping a candidate J=1/2 Mott insulator,
carriers remain localized within a nanoscale phase separated ground state. A
percolative metal-insulator transition occurs with interplay between localized
and itinerant regions, stabilizing an antiferromagnetic metallic phase beyond
the critical region. Our results demonstrate a surprising parallel between
doped 5d- and 3d-electron Mott systems and suggest either through the near
degeneracy of nearby electronic phases or direct carrier localization that U is
essential to the carrier response of this doped spin-orbit Mott insulator.Comment: 25 pages, 4 figures in main text, 4 figures in supplemental tex
Conditionally Immortalized Mouse Embryonic Fibroblasts Retain Proliferative Activity without Compromising Multipotent Differentiation Potential
Mesenchymal stem cells (MSCs) are multipotent cells which reside in many tissues and can give rise to multiple lineages including bone, cartilage and adipose. Although MSCs have attracted significant attention for basic and translational research, primary MSCs have limited life span in culture which hampers MSCs' broader applications. Here, we investigate if mouse mesenchymal progenitors can be conditionally immortalized with SV40 large T antigen and maintain long-term cell proliferation without compromising their multipotency. Using the system which expresses SV40 large T antigen flanked with Cre/loxP sites, we demonstrate that mouse embryonic fibroblasts (MEFs) can be efficiently immortalized by SV40 large T antigen. The conditionally immortalized MEFs (iMEFs) exhibit an enhanced proliferative activity and maintain long-term cell proliferation, which can be reversed by Cre recombinase. The iMEFs express most MSC markers and retain multipotency as they can differentiate into osteogenic, chondrogenic and adipogenic lineages under appropriate differentiation conditions in vitro and in vivo. The removal of SV40 large T reduces the differentiation potential of iMEFs possibly due to the decreased progenitor expansion. Furthermore, the iMEFs are apparently not tumorigenic when they are subcutaneously injected into athymic nude mice. Thus, the conditionally immortalized iMEFs not only maintain long-term cell proliferation but also retain the ability to differentiate into multiple lineages. Our results suggest that the reversible immortalization strategy using SV40 large T antigen may be an efficient and safe approach to establishing long-term cell culture of primary mesenchymal progenitors for basic and translational research, as well as for potential clinical applications
- …