25 research outputs found

    IterInv: Iterative Inversion for Pixel-Level T2I Models

    Full text link
    Large-scale text-to-image diffusion models have been a ground-breaking development in generating convincing images following an input text prompt. The goal of image editing research is to give users control over the generated images by modifying the text prompt. Current image editing techniques are relying on DDIM inversion as a common practice based on the Latent Diffusion Models (LDM). However, the large pretrained T2I models working on the latent space as LDM suffer from losing details due to the first compression stage with an autoencoder mechanism. Instead, another mainstream T2I pipeline working on the pixel level, such as Imagen and DeepFloyd-IF, avoids this problem. They are commonly composed of several stages, normally with a text-to-image stage followed by several super-resolution stages. In this case, the DDIM inversion is unable to find the initial noise to generate the original image given that the super-resolution diffusion models are not compatible with the DDIM technique. According to our experimental findings, iteratively concatenating the noisy image as the condition is the root of this problem. Based on this observation, we develop an iterative inversion (IterInv) technique for this stream of T2I models and verify IterInv with the open-source DeepFloyd-IF model. By combining our method IterInv with a popular image editing method, we prove the application prospects of IterInv. The code will be released at \url{https://github.com/Tchuanm/IterInv.git}.Comment: Accepted paper at NeurIPS 2023 Workshop on Diffusion Model

    Event Stream-based Visual Object Tracking: A High-Resolution Benchmark Dataset and A Novel Baseline

    Full text link
    Tracking using bio-inspired event cameras has drawn more and more attention in recent years. Existing works either utilize aligned RGB and event data for accurate tracking or directly learn an event-based tracker. The first category needs more cost for inference and the second one may be easily influenced by noisy events or sparse spatial resolution. In this paper, we propose a novel hierarchical knowledge distillation framework that can fully utilize multi-modal / multi-view information during training to facilitate knowledge transfer, enabling us to achieve high-speed and low-latency visual tracking during testing by using only event signals. Specifically, a teacher Transformer-based multi-modal tracking framework is first trained by feeding the RGB frame and event stream simultaneously. Then, we design a new hierarchical knowledge distillation strategy which includes pairwise similarity, feature representation, and response maps-based knowledge distillation to guide the learning of the student Transformer network. Moreover, since existing event-based tracking datasets are all low-resolution (346×260346 \times 260), we propose the first large-scale high-resolution (1280×7201280 \times 720) dataset named EventVOT. It contains 1141 videos and covers a wide range of categories such as pedestrians, vehicles, UAVs, ping pongs, etc. Extensive experiments on both low-resolution (FE240hz, VisEvent, COESOT), and our newly proposed high-resolution EventVOT dataset fully validated the effectiveness of our proposed method. The dataset, evaluation toolkit, and source code are available on \url{https://github.com/Event-AHU/EventVOT_Benchmark

    Long-term Frame-Event Visual Tracking: Benchmark Dataset and Baseline

    Full text link
    Current event-/frame-event based trackers undergo evaluation on short-term tracking datasets, however, the tracking of real-world scenarios involves long-term tracking, and the performance of existing tracking algorithms in these scenarios remains unclear. In this paper, we first propose a new long-term and large-scale frame-event single object tracking dataset, termed FELT. It contains 742 videos and 1,594,474 RGB frames and event stream pairs and has become the largest frame-event tracking dataset to date. We re-train and evaluate 15 baseline trackers on our dataset for future works to compare. More importantly, we find that the RGB frames and event streams are naturally incomplete due to the influence of challenging factors and spatially sparse event flow. In response to this, we propose a novel associative memory Transformer network as a unified backbone by introducing modern Hopfield layers into multi-head self-attention blocks to fuse both RGB and event data. Extensive experiments on RGB-Event (FELT), RGB-Thermal (RGBT234, LasHeR), and RGB-Depth (DepthTrack) datasets fully validated the effectiveness of our model. The dataset and source code can be found at \url{https://github.com/Event-AHU/FELT_SOT_Benchmark}.Comment: In Peer Revie

    Exploiting Image-Related Inductive Biases in Single-Branch Visual Tracking

    Full text link
    Despite achieving state-of-the-art performance in visual tracking, recent single-branch trackers tend to overlook the weak prior assumptions associated with the Vision Transformer (ViT) encoder and inference pipeline. Moreover, the effectiveness of discriminative trackers remains constrained due to the adoption of the dual-branch pipeline. To tackle the inferior effectiveness of the vanilla ViT, we propose an Adaptive ViT Model Prediction tracker (AViTMP) to bridge the gap between single-branch network and discriminative models. Specifically, in the proposed encoder AViT-Enc, we introduce an adaptor module and joint target state embedding to enrich the dense embedding paradigm based on ViT. Then, we combine AViT-Enc with a dense-fusion decoder and a discriminative target model to predict accurate location. Further, to mitigate the limitations of conventional inference practice, we present a novel inference pipeline called CycleTrack, which bolsters the tracking robustness in the presence of distractors via bidirectional cycle tracking verification. Lastly, we propose a dual-frame update inference strategy that adeptively handles significant challenges in long-term scenarios. In the experiments, we evaluate AViTMP on ten tracking benchmarks for a comprehensive assessment, including LaSOT, LaSOTExtSub, AVisT, etc. The experimental results unequivocally establish that AViTMP attains state-of-the-art performance, especially on long-time tracking and robustness.Comment: 13 pages, 8 figures, under revie

    Revisiting Color-Event based Tracking: A Unified Network, Dataset, and Metric

    Full text link
    Combining the Color and Event cameras (also called Dynamic Vision Sensors, DVS) for robust object tracking is a newly emerging research topic in recent years. Existing color-event tracking framework usually contains multiple scattered modules which may lead to low efficiency and high computational complexity, including feature extraction, fusion, matching, interactive learning, etc. In this paper, we propose a single-stage backbone network for Color-Event Unified Tracking (CEUTrack), which achieves the above functions simultaneously. Given the event points and RGB frames, we first transform the points into voxels and crop the template and search regions for both modalities, respectively. Then, these regions are projected into tokens and parallelly fed into the unified Transformer backbone network. The output features will be fed into a tracking head for target object localization. Our proposed CEUTrack is simple, effective, and efficient, which achieves over 75 FPS and new SOTA performance. To better validate the effectiveness of our model and address the data deficiency of this task, we also propose a generic and large-scale benchmark dataset for color-event tracking, termed COESOT, which contains 90 categories and 1354 video sequences. Additionally, a new evaluation metric named BOC is proposed in our evaluation toolkit to evaluate the prominence with respect to the baseline methods. We hope the newly proposed method, dataset, and evaluation metric provide a better platform for color-event-based tracking. The dataset, toolkit, and source code will be released on: \url{https://github.com/Event-AHU/COESOT}

    New Method to Prepare Mitomycin C Loaded PLA-Nanoparticles with High Drug Entrapment Efficiency

    Get PDF
    The classical utilized double emulsion solvent diffusion technique for encapsulating water soluble Mitomycin C (MMC) in PLA nanoparticles suffers from low encapsulation efficiency because of the drug rapid partitioning to the external aqueous phase. In this paper, MMC loaded PLA nanoparticles were prepared by a new single emulsion solvent evaporation method, in which soybean phosphatidylcholine (SPC) was employed to improve the liposolubility of MMC by formation of MMC–SPC complex. Four main influential factors based on the results of a single-factor test, namely, PLA molecular weight, ratio of PLA to SPC (wt/wt) and MMC to SPC (wt/wt), volume ratio of oil phase to water phase, were evaluated using an orthogonal design with respect to drug entrapment efficiency. The drug release study was performed in pH 7.2 PBS at 37 °C with drug analysis using UV/vis spectrometer at 365 nm. MMC–PLA particles prepared by classical method were used as comparison. The formulated MMC–SPC–PLA nanoparticles under optimized condition are found to be relatively uniform in size (594 nm) with up to 94.8% of drug entrapment efficiency compared to 6.44 μm of PLA–MMC microparticles with 34.5% of drug entrapment efficiency. The release of MMC shows biphasic with an initial burst effect, followed by a cumulated drug release over 30 days is 50.17% for PLA–MMC–SPC nanoparticles, and 74.1% for PLA–MMC particles. The IR analysis of MMC–SPC complex shows that their high liposolubility may be attributed to some weak physical interaction between MMC and SPC during the formation of the complex. It is concluded that the new method is advantageous in terms of smaller size, lower size distribution, higher encapsulation yield, and longer sustained drug release in comparison to classical method

    Atrasentan and renal events in patients with type 2 diabetes and chronic kidney disease (SONAR): a double-blind, randomised, placebo-controlled trial

    Get PDF
    Background: Short-term treatment for people with type 2 diabetes using a low dose of the selective endothelin A receptor antagonist atrasentan reduces albuminuria without causing significant sodium retention. We report the long-term effects of treatment with atrasentan on major renal outcomes. Methods: We did this double-blind, randomised, placebo-controlled trial at 689 sites in 41 countries. We enrolled adults aged 18–85 years with type 2 diabetes, estimated glomerular filtration rate (eGFR)25–75 mL/min per 1·73 m 2 of body surface area, and a urine albumin-to-creatinine ratio (UACR)of 300–5000 mg/g who had received maximum labelled or tolerated renin–angiotensin system inhibition for at least 4 weeks. Participants were given atrasentan 0·75 mg orally daily during an enrichment period before random group assignment. Those with a UACR decrease of at least 30% with no substantial fluid retention during the enrichment period (responders)were included in the double-blind treatment period. Responders were randomly assigned to receive either atrasentan 0·75 mg orally daily or placebo. All patients and investigators were masked to treatment assignment. The primary endpoint was a composite of doubling of serum creatinine (sustained for ≥30 days)or end-stage kidney disease (eGFR <15 mL/min per 1·73 m 2 sustained for ≥90 days, chronic dialysis for ≥90 days, kidney transplantation, or death from kidney failure)in the intention-to-treat population of all responders. Safety was assessed in all patients who received at least one dose of their assigned study treatment. The study is registered with ClinicalTrials.gov, number NCT01858532. Findings: Between May 17, 2013, and July 13, 2017, 11 087 patients were screened; 5117 entered the enrichment period, and 4711 completed the enrichment period. Of these, 2648 patients were responders and were randomly assigned to the atrasentan group (n=1325)or placebo group (n=1323). Median follow-up was 2·2 years (IQR 1·4–2·9). 79 (6·0%)of 1325 patients in the atrasentan group and 105 (7·9%)of 1323 in the placebo group had a primary composite renal endpoint event (hazard ratio [HR]0·65 [95% CI 0·49–0·88]; p=0·0047). Fluid retention and anaemia adverse events, which have been previously attributed to endothelin receptor antagonists, were more frequent in the atrasentan group than in the placebo group. Hospital admission for heart failure occurred in 47 (3·5%)of 1325 patients in the atrasentan group and 34 (2·6%)of 1323 patients in the placebo group (HR 1·33 [95% CI 0·85–2·07]; p=0·208). 58 (4·4%)patients in the atrasentan group and 52 (3·9%)in the placebo group died (HR 1·09 [95% CI 0·75–1·59]; p=0·65). Interpretation: Atrasentan reduced the risk of renal events in patients with diabetes and chronic kidney disease who were selected to optimise efficacy and safety. These data support a potential role for selective endothelin receptor antagonists in protecting renal function in patients with type 2 diabetes at high risk of developing end-stage kidney disease. Funding: AbbVie

    Design and baseline characteristics of the finerenone in reducing cardiovascular mortality and morbidity in diabetic kidney disease trial

    Get PDF
    Background: Among people with diabetes, those with kidney disease have exceptionally high rates of cardiovascular (CV) morbidity and mortality and progression of their underlying kidney disease. Finerenone is a novel, nonsteroidal, selective mineralocorticoid receptor antagonist that has shown to reduce albuminuria in type 2 diabetes (T2D) patients with chronic kidney disease (CKD) while revealing only a low risk of hyperkalemia. However, the effect of finerenone on CV and renal outcomes has not yet been investigated in long-term trials. Patients and Methods: The Finerenone in Reducing CV Mortality and Morbidity in Diabetic Kidney Disease (FIGARO-DKD) trial aims to assess the efficacy and safety of finerenone compared to placebo at reducing clinically important CV and renal outcomes in T2D patients with CKD. FIGARO-DKD is a randomized, double-blind, placebo-controlled, parallel-group, event-driven trial running in 47 countries with an expected duration of approximately 6 years. FIGARO-DKD randomized 7,437 patients with an estimated glomerular filtration rate >= 25 mL/min/1.73 m(2) and albuminuria (urinary albumin-to-creatinine ratio >= 30 to <= 5,000 mg/g). The study has at least 90% power to detect a 20% reduction in the risk of the primary outcome (overall two-sided significance level alpha = 0.05), the composite of time to first occurrence of CV death, nonfatal myocardial infarction, nonfatal stroke, or hospitalization for heart failure. Conclusions: FIGARO-DKD will determine whether an optimally treated cohort of T2D patients with CKD at high risk of CV and renal events will experience cardiorenal benefits with the addition of finerenone to their treatment regimen. Trial Registration: EudraCT number: 2015-000950-39; ClinicalTrials.gov identifier: NCT02545049

    Robust Template Adjustment Siamese Network for Object Visual Tracking

    No full text
    Most of the existing trackers address the visual tracking problem by extracting an appearance template from the first frame, which is used to localize the target in the current frame. Unfortunately, they typically face the model degeneration challenge, which easily results in model drift and target loss. To address this issue, a novel Template Adjustment Siamese Network (TA-Siam) is proposed in this paper. The proposed framework TA-Siam consists of two simple subnetworks: The template adjustment subnetwork for feature extraction and the classification-regression subnetwork for bounding box prediction. The template adjustment module adaptively uses the feature of subsequent frames to adjust the current template. It makes the template adapt to the target appearance variation of long-term sequence and effectively overcomes model drift problem of Siamese networks. In order to reduce classification errors, the rhombus labels are proposed in our TA-Siam. For more efficient learning and faster convergence, our proposed tracker uses a more effective regression loss in the training process. Extensive experiments and comparisons with trackers are conducted on the challenging benchmarks including VOT2016, VOT2018, OTB50, OTB100, GOT-10K, and LaSOT. Our TA-Siam achieves state-of-the-art performance at the speed of 45 FPS

    On the Glide of the 3x+1 Problem

    No full text
    corecore