109 research outputs found

    Pre-training Contextualized World Models with In-the-wild Videos for Reinforcement Learning

    Full text link
    Unsupervised pre-training methods utilizing large and diverse datasets have achieved tremendous success across a range of domains. Recent work has investigated such unsupervised pre-training methods for model-based reinforcement learning (MBRL) but is limited to domain-specific or simulated data. In this paper, we study the problem of pre-training world models with abundant in-the-wild videos for efficient learning of downstream visual control tasks. However, in-the-wild videos are complicated with various contextual factors, such as intricate backgrounds and textured appearance, which precludes a world model from extracting shared world knowledge to generalize better. To tackle this issue, we introduce Contextualized World Models (ContextWM) that explicitly model both the context and dynamics to overcome the complexity and diversity of in-the-wild videos and facilitate knowledge transfer between distinct scenes. Specifically, a contextualized extension of the latent dynamics model is elaborately realized by incorporating a context encoder to retain contextual information and empower the image decoder, which allows the latent dynamics model to concentrate on essential temporal variations. Our experiments show that in-the-wild video pre-training equipped with ContextWM can significantly improve the sample-efficiency of MBRL in various domains, including robotic manipulation, locomotion, and autonomous driving

    Reclamation of Isopropyl Alcohol and N-Propyl Bromide

    Get PDF
    The purpose of this design project is to recover a co-solvent mixture that is used to remove oil and water from metal machine parts. The cleansing solvents used are n-propylbromide (NPB) and isopropyl alcohol (IPA), which respectively remove oleic acid and water. The solvent mixture starts at an IPA/NPB molar ratio of 56/44 and can be used for cleaning machinery until the water reaches 7.5% by mole. This spent cleaning mixture is then delivered for reclamation of IPA and NPB so that it can be used for cleaning again. A 6,240 gallon truckload is delivered every two days, and the mixture to be cleaned has a molar composition of 47.6% IPA, 37.4% NPB, 7.5% oleic acid, and 7.5% water. The goals of the project are to completely remove the oleic acid, reduce the water molar composition to below 2.5%, maximize co-solvent recovery, and maximize profitability. A major challenge of the project is the non-ideal behavior of the components, which includes multiple azeotropes and distillation boundaries. Another important characteristic of this design project is the unusually small scale: one 6,240 gallon quantity of used co-solvent mixture must be processed every two days. Due to this scale, batch processes were investigated as well as continuous processes. The continuous alternative utilizes three major separation units: a 15-tray distillation column, a decanter for the distillate, and an evaporator for the bottoms. 93.8% of the original IPA and 99.2% of the original NPB is recovered. There is no oleic acid and 0.5% by mole of water in the product. Pure NPB and IPA are added at the end of the separation to compensate for the lost co-solvents, and to restore the IPA/NPB ratio to 56/44. A 0.45/lbsellingpriceofreclaimedco−solventreturnsanIRRof45.60.45/lb selling price of reclaimed co-solvent returns an IRR of 45.6% and an NPV of 2,657,300 in year 10 of production. The batch alternative utilizes a batch distillation column with multiple receivers and recovers 95.4% of the original IPA and 99.7% of the original NPB. There is 0.8% water by mole and no oleic acid in the product. Pure NPB and IPA also must be added to restore the original ratio. Because the composition of water is higher in the batch product than in the continuous, the co-solvent mixture is sold at a lower 0.42/lb,resultinginanIRRof37.20.42/lb, resulting in an IRR of 37.2% and an NPV of 2,452,500 in year 10. However, the batch process has significant down time and can potentially handle up to four times the solvent demand (2 trucks of solvent/day), resulting in an IRR of 130% and an NPV of $22,494,500 at the same selling price. Because of the batch plant’s ability to handle demand growth, its flexibility in separating different co-solvent ratios, and its robust economic potential, we recommend the construction of the batch co-solvent reclamation plant

    Tensor Decomposition Based Attention Module for Spiking Neural Networks

    Full text link
    The attention mechanism has been proven to be an effective way to improve spiking neural network (SNN). However, based on the fact that the current SNN input data flow is split into tensors to process on GPUs, none of the previous works consider the properties of tensors to implement an attention module. This inspires us to rethink current SNN from the perspective of tensor-relevant theories. Using tensor decomposition, we design the \textit{projected full attention} (PFA) module, which demonstrates excellent results with linearly growing parameters. Specifically, PFA is composed by the \textit{linear projection of spike tensor} (LPST) module and \textit{attention map composing} (AMC) module. In LPST, we start by compressing the original spike tensor into three projected tensors using a single property-preserving strategy with learnable parameters for each dimension. Then, in AMC, we exploit the inverse procedure of the tensor decomposition process to combine the three tensors into the attention map using a so-called connecting factor. To validate the effectiveness of the proposed PFA module, we integrate it into the widely used VGG and ResNet architectures for classification tasks. Our method achieves state-of-the-art performance on both static and dynamic benchmark datasets, surpassing the existing SNN models with Transformer-based and CNN-based backbones.Comment: 11 page

    TransFace: Calibrating Transformer Training for Face Recognition from a Data-Centric Perspective

    Full text link
    Vision Transformers (ViTs) have demonstrated powerful representation ability in various visual tasks thanks to their intrinsic data-hungry nature. However, we unexpectedly find that ViTs perform vulnerably when applied to face recognition (FR) scenarios with extremely large datasets. We investigate the reasons for this phenomenon and discover that the existing data augmentation approach and hard sample mining strategy are incompatible with ViTs-based FR backbone due to the lack of tailored consideration on preserving face structural information and leveraging each local token information. To remedy these problems, this paper proposes a superior FR model called TransFace, which employs a patch-level data augmentation strategy named DPAP and a hard sample mining strategy named EHSM. Specially, DPAP randomly perturbs the amplitude information of dominant patches to expand sample diversity, which effectively alleviates the overfitting problem in ViTs. EHSM utilizes the information entropy in the local tokens to dynamically adjust the importance weight of easy and hard samples during training, leading to a more stable prediction. Experiments on several benchmarks demonstrate the superiority of our TransFace. Code and models are available at https://github.com/DanJun6737/TransFace.Comment: Accepted by ICCV 202

    Jailbreaker: Automated Jailbreak Across Multiple Large Language Model Chatbots

    Full text link
    Large Language Models (LLMs) have revolutionized Artificial Intelligence (AI) services due to their exceptional proficiency in understanding and generating human-like text. LLM chatbots, in particular, have seen widespread adoption, transforming human-machine interactions. However, these LLM chatbots are susceptible to "jailbreak" attacks, where malicious users manipulate prompts to elicit inappropriate or sensitive responses, contravening service policies. Despite existing attempts to mitigate such threats, our research reveals a substantial gap in our understanding of these vulnerabilities, largely due to the undisclosed defensive measures implemented by LLM service providers. In this paper, we present Jailbreaker, a comprehensive framework that offers an in-depth understanding of jailbreak attacks and countermeasures. Our work makes a dual contribution. First, we propose an innovative methodology inspired by time-based SQL injection techniques to reverse-engineer the defensive strategies of prominent LLM chatbots, such as ChatGPT, Bard, and Bing Chat. This time-sensitive approach uncovers intricate details about these services' defenses, facilitating a proof-of-concept attack that successfully bypasses their mechanisms. Second, we introduce an automatic generation method for jailbreak prompts. Leveraging a fine-tuned LLM, we validate the potential of automated jailbreak generation across various commercial LLM chatbots. Our method achieves a promising average success rate of 21.58%, significantly outperforming the effectiveness of existing techniques. We have responsibly disclosed our findings to the concerned service providers, underscoring the urgent need for more robust defenses. Jailbreaker thus marks a significant step towards understanding and mitigating jailbreak threats in the realm of LLM chatbots

    Prompt Injection attack against LLM-integrated Applications

    Full text link
    Large Language Models (LLMs), renowned for their superior proficiency in language comprehension and generation, stimulate a vibrant ecosystem of applications around them. However, their extensive assimilation into various services introduces significant security risks. This study deconstructs the complexities and implications of prompt injection attacks on actual LLM-integrated applications. Initially, we conduct an exploratory analysis on ten commercial applications, highlighting the constraints of current attack strategies in practice. Prompted by these limitations, we subsequently formulate HouYi, a novel black-box prompt injection attack technique, which draws inspiration from traditional web injection attacks. HouYi is compartmentalized into three crucial elements: a seamlessly-incorporated pre-constructed prompt, an injection prompt inducing context partition, and a malicious payload designed to fulfill the attack objectives. Leveraging HouYi, we unveil previously unknown and severe attack outcomes, such as unrestricted arbitrary LLM usage and uncomplicated application prompt theft. We deploy HouYi on 36 actual LLM-integrated applications and discern 31 applications susceptible to prompt injection. 10 vendors have validated our discoveries, including Notion, which has the potential to impact millions of users. Our investigation illuminates both the possible risks of prompt injection attacks and the possible tactics for mitigation

    STU-Net: Scalable and Transferable Medical Image Segmentation Models Empowered by Large-Scale Supervised Pre-training

    Full text link
    Large-scale models pre-trained on large-scale datasets have profoundly advanced the development of deep learning. However, the state-of-the-art models for medical image segmentation are still small-scale, with their parameters only in the tens of millions. Further scaling them up to higher orders of magnitude is rarely explored. An overarching goal of exploring large-scale models is to train them on large-scale medical segmentation datasets for better transfer capacities. In this work, we design a series of Scalable and Transferable U-Net (STU-Net) models, with parameter sizes ranging from 14 million to 1.4 billion. Notably, the 1.4B STU-Net is the largest medical image segmentation model to date. Our STU-Net is based on nnU-Net framework due to its popularity and impressive performance. We first refine the default convolutional blocks in nnU-Net to make them scalable. Then, we empirically evaluate different scaling combinations of network depth and width, discovering that it is optimal to scale model depth and width together. We train our scalable STU-Net models on a large-scale TotalSegmentator dataset and find that increasing model size brings a stronger performance gain. This observation reveals that a large model is promising in medical image segmentation. Furthermore, we evaluate the transferability of our model on 14 downstream datasets for direct inference and 3 datasets for further fine-tuning, covering various modalities and segmentation targets. We observe good performance of our pre-trained model in both direct inference and fine-tuning. The code and pre-trained models are available at https://github.com/Ziyan-Huang/STU-Net

    Elovl4 haploinsufficiency does not induce early onset retinal degeneration in mice

    Get PDF
    AbstractELOVL4 was first identified as a disease-causing gene in Stargardt macular dystrophy (STGD3, MIM 600110.) To date, three ELOVL4 mutations have been identified, all of which result in truncated proteins which induce autosomal dominant juvenile macular degenerations. Based on sequence homology, ELOVL4 is thought to be another member within a family of proteins functioning in the elongation of long chain fatty acids. However, the normal function of ELOVL4 is unclear. We generated Elovl4 knockout mice to determine if Elovl4 loss affects retinal development or function. Here we show that Elovl4 knockout mice, while perinatal lethal, exhibit normal retinal development prior to death at day of birth. Further, postnatal retinal development in Elovl4 heterozygous mice appears normal. Therefore haploinsufficiency for wildtype ELOVL4 in autosomal dominant macular degeneration likely does not contribute to juvenile macular degeneration in STGD3 patients. We found, however, that Elovl4+/− mice exhibit enhanced ERG scotopic and photopic a and b waves relative to wildtype Elovl4+/+ mice suggesting that reduced Elovl4 levels may impact retinal electrophysiological responses

    A-Eval: A Benchmark for Cross-Dataset Evaluation of Abdominal Multi-Organ Segmentation

    Full text link
    Although deep learning have revolutionized abdominal multi-organ segmentation, models often struggle with generalization due to training on small, specific datasets. With the recent emergence of large-scale datasets, some important questions arise: \textbf{Can models trained on these datasets generalize well on different ones? If yes/no, how to further improve their generalizability?} To address these questions, we introduce A-Eval, a benchmark for the cross-dataset Evaluation ('Eval') of Abdominal ('A') multi-organ segmentation. We employ training sets from four large-scale public datasets: FLARE22, AMOS, WORD, and TotalSegmentator, each providing extensive labels for abdominal multi-organ segmentation. For evaluation, we incorporate the validation sets from these datasets along with the training set from the BTCV dataset, forming a robust benchmark comprising five distinct datasets. We evaluate the generalizability of various models using the A-Eval benchmark, with a focus on diverse data usage scenarios: training on individual datasets independently, utilizing unlabeled data via pseudo-labeling, mixing different modalities, and joint training across all available datasets. Additionally, we explore the impact of model sizes on cross-dataset generalizability. Through these analyses, we underline the importance of effective data usage in enhancing models' generalization capabilities, offering valuable insights for assembling large-scale datasets and improving training strategies. The code and pre-trained models are available at \href{https://github.com/uni-medical/A-Eval}{https://github.com/uni-medical/A-Eval}
    • …
    corecore