109 research outputs found
Pre-training Contextualized World Models with In-the-wild Videos for Reinforcement Learning
Unsupervised pre-training methods utilizing large and diverse datasets have
achieved tremendous success across a range of domains. Recent work has
investigated such unsupervised pre-training methods for model-based
reinforcement learning (MBRL) but is limited to domain-specific or simulated
data. In this paper, we study the problem of pre-training world models with
abundant in-the-wild videos for efficient learning of downstream visual control
tasks. However, in-the-wild videos are complicated with various contextual
factors, such as intricate backgrounds and textured appearance, which precludes
a world model from extracting shared world knowledge to generalize better. To
tackle this issue, we introduce Contextualized World Models (ContextWM) that
explicitly model both the context and dynamics to overcome the complexity and
diversity of in-the-wild videos and facilitate knowledge transfer between
distinct scenes. Specifically, a contextualized extension of the latent
dynamics model is elaborately realized by incorporating a context encoder to
retain contextual information and empower the image decoder, which allows the
latent dynamics model to concentrate on essential temporal variations. Our
experiments show that in-the-wild video pre-training equipped with ContextWM
can significantly improve the sample-efficiency of MBRL in various domains,
including robotic manipulation, locomotion, and autonomous driving
Reclamation of Isopropyl Alcohol and N-Propyl Bromide
The purpose of this design project is to recover a co-solvent mixture that is used to remove oil and water from metal machine parts. The cleansing solvents used are n-propylbromide (NPB) and isopropyl alcohol (IPA), which respectively remove oleic acid and water. The solvent mixture starts at an IPA/NPB molar ratio of 56/44 and can be used for cleaning machinery until the water reaches 7.5% by mole. This spent cleaning mixture is then delivered for reclamation of IPA and NPB so that it can be used for cleaning again. A 6,240 gallon truckload is delivered every two days, and the mixture to be cleaned has a molar composition of 47.6% IPA, 37.4% NPB, 7.5% oleic acid, and 7.5% water. The goals of the project are to completely remove the oleic acid, reduce the water molar composition to below 2.5%, maximize co-solvent recovery, and maximize profitability.
A major challenge of the project is the non-ideal behavior of the components, which includes multiple azeotropes and distillation boundaries. Another important characteristic of this design project is the unusually small scale: one 6,240 gallon quantity of used co-solvent mixture must be processed every two days. Due to this scale, batch processes were investigated as well as continuous processes.
The continuous alternative utilizes three major separation units: a 15-tray distillation column, a decanter for the distillate, and an evaporator for the bottoms. 93.8% of the original IPA and 99.2% of the original NPB is recovered. There is no oleic acid and 0.5% by mole of water in the product. Pure NPB and IPA are added at the end of the separation to compensate for the lost co-solvents, and to restore the IPA/NPB ratio to 56/44. A 2,657,300 in year 10 of production.
The batch alternative utilizes a batch distillation column with multiple receivers and recovers 95.4% of the original IPA and 99.7% of the original NPB. There is 0.8% water by mole and no oleic acid in the product. Pure NPB and IPA also must be added to restore the original ratio. Because the composition of water is higher in the batch product than in the continuous, the co-solvent mixture is sold at a lower 2,452,500 in year 10. However, the batch process has significant down time and can potentially handle up to four times the solvent demand (2 trucks of solvent/day), resulting in an IRR of 130% and an NPV of $22,494,500 at the same selling price. Because of the batch plant’s ability to handle demand growth, its flexibility in separating different co-solvent ratios, and its robust economic potential, we recommend the construction of the batch co-solvent reclamation plant
Tensor Decomposition Based Attention Module for Spiking Neural Networks
The attention mechanism has been proven to be an effective way to improve
spiking neural network (SNN). However, based on the fact that the current SNN
input data flow is split into tensors to process on GPUs, none of the previous
works consider the properties of tensors to implement an attention module. This
inspires us to rethink current SNN from the perspective of tensor-relevant
theories. Using tensor decomposition, we design the \textit{projected full
attention} (PFA) module, which demonstrates excellent results with linearly
growing parameters. Specifically, PFA is composed by the \textit{linear
projection of spike tensor} (LPST) module and \textit{attention map composing}
(AMC) module. In LPST, we start by compressing the original spike tensor into
three projected tensors using a single property-preserving strategy with
learnable parameters for each dimension. Then, in AMC, we exploit the inverse
procedure of the tensor decomposition process to combine the three tensors into
the attention map using a so-called connecting factor. To validate the
effectiveness of the proposed PFA module, we integrate it into the widely used
VGG and ResNet architectures for classification tasks. Our method achieves
state-of-the-art performance on both static and dynamic benchmark datasets,
surpassing the existing SNN models with Transformer-based and CNN-based
backbones.Comment: 11 page
TransFace: Calibrating Transformer Training for Face Recognition from a Data-Centric Perspective
Vision Transformers (ViTs) have demonstrated powerful representation ability
in various visual tasks thanks to their intrinsic data-hungry nature. However,
we unexpectedly find that ViTs perform vulnerably when applied to face
recognition (FR) scenarios with extremely large datasets. We investigate the
reasons for this phenomenon and discover that the existing data augmentation
approach and hard sample mining strategy are incompatible with ViTs-based FR
backbone due to the lack of tailored consideration on preserving face
structural information and leveraging each local token information. To remedy
these problems, this paper proposes a superior FR model called TransFace, which
employs a patch-level data augmentation strategy named DPAP and a hard sample
mining strategy named EHSM. Specially, DPAP randomly perturbs the amplitude
information of dominant patches to expand sample diversity, which effectively
alleviates the overfitting problem in ViTs. EHSM utilizes the information
entropy in the local tokens to dynamically adjust the importance weight of easy
and hard samples during training, leading to a more stable prediction.
Experiments on several benchmarks demonstrate the superiority of our TransFace.
Code and models are available at https://github.com/DanJun6737/TransFace.Comment: Accepted by ICCV 202
Jailbreaker: Automated Jailbreak Across Multiple Large Language Model Chatbots
Large Language Models (LLMs) have revolutionized Artificial Intelligence (AI)
services due to their exceptional proficiency in understanding and generating
human-like text. LLM chatbots, in particular, have seen widespread adoption,
transforming human-machine interactions. However, these LLM chatbots are
susceptible to "jailbreak" attacks, where malicious users manipulate prompts to
elicit inappropriate or sensitive responses, contravening service policies.
Despite existing attempts to mitigate such threats, our research reveals a
substantial gap in our understanding of these vulnerabilities, largely due to
the undisclosed defensive measures implemented by LLM service providers.
In this paper, we present Jailbreaker, a comprehensive framework that offers
an in-depth understanding of jailbreak attacks and countermeasures. Our work
makes a dual contribution. First, we propose an innovative methodology inspired
by time-based SQL injection techniques to reverse-engineer the defensive
strategies of prominent LLM chatbots, such as ChatGPT, Bard, and Bing Chat.
This time-sensitive approach uncovers intricate details about these services'
defenses, facilitating a proof-of-concept attack that successfully bypasses
their mechanisms. Second, we introduce an automatic generation method for
jailbreak prompts. Leveraging a fine-tuned LLM, we validate the potential of
automated jailbreak generation across various commercial LLM chatbots. Our
method achieves a promising average success rate of 21.58%, significantly
outperforming the effectiveness of existing techniques. We have responsibly
disclosed our findings to the concerned service providers, underscoring the
urgent need for more robust defenses. Jailbreaker thus marks a significant step
towards understanding and mitigating jailbreak threats in the realm of LLM
chatbots
Prompt Injection attack against LLM-integrated Applications
Large Language Models (LLMs), renowned for their superior proficiency in
language comprehension and generation, stimulate a vibrant ecosystem of
applications around them. However, their extensive assimilation into various
services introduces significant security risks. This study deconstructs the
complexities and implications of prompt injection attacks on actual
LLM-integrated applications. Initially, we conduct an exploratory analysis on
ten commercial applications, highlighting the constraints of current attack
strategies in practice. Prompted by these limitations, we subsequently
formulate HouYi, a novel black-box prompt injection attack technique, which
draws inspiration from traditional web injection attacks. HouYi is
compartmentalized into three crucial elements: a seamlessly-incorporated
pre-constructed prompt, an injection prompt inducing context partition, and a
malicious payload designed to fulfill the attack objectives. Leveraging HouYi,
we unveil previously unknown and severe attack outcomes, such as unrestricted
arbitrary LLM usage and uncomplicated application prompt theft. We deploy HouYi
on 36 actual LLM-integrated applications and discern 31 applications
susceptible to prompt injection. 10 vendors have validated our discoveries,
including Notion, which has the potential to impact millions of users. Our
investigation illuminates both the possible risks of prompt injection attacks
and the possible tactics for mitigation
STU-Net: Scalable and Transferable Medical Image Segmentation Models Empowered by Large-Scale Supervised Pre-training
Large-scale models pre-trained on large-scale datasets have profoundly
advanced the development of deep learning. However, the state-of-the-art models
for medical image segmentation are still small-scale, with their parameters
only in the tens of millions. Further scaling them up to higher orders of
magnitude is rarely explored. An overarching goal of exploring large-scale
models is to train them on large-scale medical segmentation datasets for better
transfer capacities. In this work, we design a series of Scalable and
Transferable U-Net (STU-Net) models, with parameter sizes ranging from 14
million to 1.4 billion. Notably, the 1.4B STU-Net is the largest medical image
segmentation model to date. Our STU-Net is based on nnU-Net framework due to
its popularity and impressive performance. We first refine the default
convolutional blocks in nnU-Net to make them scalable. Then, we empirically
evaluate different scaling combinations of network depth and width, discovering
that it is optimal to scale model depth and width together. We train our
scalable STU-Net models on a large-scale TotalSegmentator dataset and find that
increasing model size brings a stronger performance gain. This observation
reveals that a large model is promising in medical image segmentation.
Furthermore, we evaluate the transferability of our model on 14 downstream
datasets for direct inference and 3 datasets for further fine-tuning, covering
various modalities and segmentation targets. We observe good performance of our
pre-trained model in both direct inference and fine-tuning. The code and
pre-trained models are available at https://github.com/Ziyan-Huang/STU-Net
Elovl4 haploinsufficiency does not induce early onset retinal degeneration in mice
AbstractELOVL4 was first identified as a disease-causing gene in Stargardt macular dystrophy (STGD3, MIM 600110.) To date, three ELOVL4 mutations have been identified, all of which result in truncated proteins which induce autosomal dominant juvenile macular degenerations. Based on sequence homology, ELOVL4 is thought to be another member within a family of proteins functioning in the elongation of long chain fatty acids. However, the normal function of ELOVL4 is unclear. We generated Elovl4 knockout mice to determine if Elovl4 loss affects retinal development or function. Here we show that Elovl4 knockout mice, while perinatal lethal, exhibit normal retinal development prior to death at day of birth. Further, postnatal retinal development in Elovl4 heterozygous mice appears normal. Therefore haploinsufficiency for wildtype ELOVL4 in autosomal dominant macular degeneration likely does not contribute to juvenile macular degeneration in STGD3 patients. We found, however, that Elovl4+/− mice exhibit enhanced ERG scotopic and photopic a and b waves relative to wildtype Elovl4+/+ mice suggesting that reduced Elovl4 levels may impact retinal electrophysiological responses
A-Eval: A Benchmark for Cross-Dataset Evaluation of Abdominal Multi-Organ Segmentation
Although deep learning have revolutionized abdominal multi-organ
segmentation, models often struggle with generalization due to training on
small, specific datasets. With the recent emergence of large-scale datasets,
some important questions arise: \textbf{Can models trained on these datasets
generalize well on different ones? If yes/no, how to further improve their
generalizability?} To address these questions, we introduce A-Eval, a benchmark
for the cross-dataset Evaluation ('Eval') of Abdominal ('A') multi-organ
segmentation. We employ training sets from four large-scale public datasets:
FLARE22, AMOS, WORD, and TotalSegmentator, each providing extensive labels for
abdominal multi-organ segmentation. For evaluation, we incorporate the
validation sets from these datasets along with the training set from the BTCV
dataset, forming a robust benchmark comprising five distinct datasets. We
evaluate the generalizability of various models using the A-Eval benchmark,
with a focus on diverse data usage scenarios: training on individual datasets
independently, utilizing unlabeled data via pseudo-labeling, mixing different
modalities, and joint training across all available datasets. Additionally, we
explore the impact of model sizes on cross-dataset generalizability. Through
these analyses, we underline the importance of effective data usage in
enhancing models' generalization capabilities, offering valuable insights for
assembling large-scale datasets and improving training strategies. The code and
pre-trained models are available at
\href{https://github.com/uni-medical/A-Eval}{https://github.com/uni-medical/A-Eval}
- …