181 research outputs found
PATROL: Privacy-Oriented Pruning for Collaborative Inference Against Model Inversion Attacks
Collaborative inference has been a promising solution to enable
resource-constrained edge devices to perform inference using state-of-the-art
deep neural networks (DNNs). In collaborative inference, the edge device first
feeds the input to a partial DNN locally and then uploads the intermediate
result to the cloud to complete the inference. However, recent research
indicates model inversion attacks (MIAs) can reconstruct input data from
intermediate results, posing serious privacy concerns for collaborative
inference. Existing perturbation and cryptography techniques are inefficient
and unreliable in defending against MIAs while performing accurate inference.
This paper provides a viable solution, named PATROL, which develops
privacy-oriented pruning to balance privacy, efficiency, and utility of
collaborative inference. PATROL takes advantage of the fact that later layers
in a DNN can extract more task-specific features. Given limited local resources
for collaborative inference, PATROL intends to deploy more layers at the edge
based on pruning techniques to enforce task-specific features for inference and
reduce task-irrelevant but sensitive features for privacy preservation. To
achieve privacy-oriented pruning, PATROL introduces two key components:
Lipschitz regularization and adversarial reconstruction training, which
increase the reconstruction errors by reducing the stability of MIAs and
enhance the target inference model by adversarial training, respectively
Optimized Path Planning for USVs under Ocean Currents
The proposed work focuses on the path planning for Unmanned Surface Vehicles
(USVs) in the ocean enviroment, taking into account various spatiotemporal
factors such as ocean currents and other energy consumption factors. The paper
proposes the use of Gaussian Process Motion Planning (GPMP2), a Bayesian
optimization method that has shown promising results in continuous and
nonlinear path planning algorithms. The proposed work improves GPMP2 by
incorporating a new spatiotemporal factor for tracking and predicting ocean
currents using a spatiotemporal Bayesian inference. The algorithm is applied to
the USV path planning and is shown to optimize for smoothness, obstacle
avoidance, and ocean currents in a challenging environment. The work is
relevant for practical applications in ocean scenarios where an optimal path
planning for USVs is essential for minimizing costs and optimizing performance.Comment: 9 pages and 7 figures, submitted for IEEE Transactions on Man,
systems ,and Cybernetic
Effects of Surface Modification of Nanotube Arrays on the Performance of CdS Quantum-Dot-Sensitized Solar Cells
CdS-sensitized TiO2 nanotube arrays have been fabricated using the method of successive ionic layer adsorption and reaction and used as a photoanode for quantum-dot-sensitized solar cells. Before being coated with CdS, the surface of TiO2 nanotube arrays was treated with TiCl4, nitric acid (HNO3), potassium hydroxide (KOH), and methyltrimethoxysilane (MTMS), respectively, for the purpose of reducing the interface transfer resistance of quantum-dot-sensitized solar cells. The surfaces of the modified samples represented the characteristics of superhydrophilic and hydrophobic which directly affect the power conversion efficiency of the solar cells. The results showed that surface modification resulted in the reduction of the surface tension, which played a significant role in the connectivity of CdS and TiO2 nanotube arrays. In addition, the solar cells based on CdS/TiO2 electrode treated by HNO3 achieved a maximum power conversion efficiency of 0.17%, which was 42% higher than the reference sample without any modification
Towards Real-World Visual Tracking with Temporal Contexts
Visual tracking has made significant improvements in the past few decades.
Most existing state-of-the-art trackers 1) merely aim for performance in ideal
conditions while overlooking the real-world conditions; 2) adopt the
tracking-by-detection paradigm, neglecting rich temporal contexts; 3) only
integrate the temporal information into the template, where temporal contexts
among consecutive frames are far from being fully utilized. To handle those
problems, we propose a two-level framework (TCTrack) that can exploit temporal
contexts efficiently. Based on it, we propose a stronger version for real-world
visual tracking, i.e., TCTrack++. It boils down to two levels: features and
similarity maps. Specifically, for feature extraction, we propose an
attention-based temporally adaptive convolution to enhance the spatial features
using temporal information, which is achieved by dynamically calibrating the
convolution weights. For similarity map refinement, we introduce an adaptive
temporal transformer to encode the temporal knowledge efficiently and decode it
for the accurate refinement of the similarity map. To further improve the
performance, we additionally introduce a curriculum learning strategy. Also, we
adopt online evaluation to measure performance in real-world conditions.
Exhaustive experiments on 8 wellknown benchmarks demonstrate the superiority of
TCTrack++. Real-world tests directly verify that TCTrack++ can be readily used
in real-world applications.Comment: Accepted by IEEE TPAMI, Code:
https://github.com/vision4robotics/TCTrac
Synthesis and Characterization of Hierarchical Structured TiO 2
Hierarchical structured TiO2 nanotubes were prepared by mechanical ball milling of highly ordered TiO2 nanotube arrays grown by electrochemical anodization of titanium foil. Scanning electron microscopy, transmission electron microscopy, X-ray diffraction, specific surface area analysis, UV-visible absorption spectroscopy, photocurrent measurement, photoluminescence spectra, electrochemical impedance spectra, and photocatalytic degradation test were applied to characterize the nanocomposites. Surface area increased as the milling time extended. After 5 h ball milling, TiO2 hierarchical nanotubes exhibited a corn-like shape and exhibited enhanced photoelectrochemical activity in comparison to commercial P25. The superior photocatalytic activity is suggested to be due to the combined advantages of high surface area of nanoparticles and rapid electron transfer as well as collection of the nanotubes in the hierarchical structure. The hierarchical structured TiO2 nanotubes could be applied into flexible applications on solar cells, sensors, and other photoelectrochemical devices
RLIPv2: Fast Scaling of Relational Language-Image Pre-training
Relational Language-Image Pre-training (RLIP) aims to align vision
representations with relational texts, thereby advancing the capability of
relational reasoning in computer vision tasks. However, hindered by the slow
convergence of RLIPv1 architecture and the limited availability of existing
scene graph data, scaling RLIPv1 is challenging. In this paper, we propose
RLIPv2, a fast converging model that enables the scaling of relational
pre-training to large-scale pseudo-labelled scene graph data. To enable fast
scaling, RLIPv2 introduces Asymmetric Language-Image Fusion (ALIF), a mechanism
that facilitates earlier and deeper gated cross-modal fusion with sparsified
language encoding layers. ALIF leads to comparable or better performance than
RLIPv1 in a fraction of the time for pre-training and fine-tuning. To obtain
scene graph data at scale, we extend object detection datasets with free-form
relation labels by introducing a captioner (e.g., BLIP) and a designed Relation
Tagger. The Relation Tagger assigns BLIP-generated relation texts to region
pairs, thus enabling larger-scale relational pre-training. Through extensive
experiments conducted on Human-Object Interaction Detection and Scene Graph
Generation, RLIPv2 shows state-of-the-art performance on three benchmarks under
fully-finetuning, few-shot and zero-shot settings. Notably, the largest RLIPv2
achieves 23.29mAP on HICO-DET without any fine-tuning, yields 32.22mAP with
just 1% data and yields 45.09mAP with 100% data. Code and models are publicly
available at https://github.com/JacobYuan7/RLIPv2.Comment: Accepted to ICCV 2023. Code and models:
https://github.com/JacobYuan7/RLIPv
InstructVideo: Instructing Video Diffusion Models with Human Feedback
Diffusion models have emerged as the de facto paradigm for video generation.
However, their reliance on web-scale data of varied quality often yields
results that are visually unappealing and misaligned with the textual prompts.
To tackle this problem, we propose InstructVideo to instruct text-to-video
diffusion models with human feedback by reward fine-tuning. InstructVideo has
two key ingredients: 1) To ameliorate the cost of reward fine-tuning induced by
generating through the full DDIM sampling chain, we recast reward fine-tuning
as editing. By leveraging the diffusion process to corrupt a sampled video,
InstructVideo requires only partial inference of the DDIM sampling chain,
reducing fine-tuning cost while improving fine-tuning efficiency. 2) To
mitigate the absence of a dedicated video reward model for human preferences,
we repurpose established image reward models, e.g., HPSv2. To this end, we
propose Segmental Video Reward, a mechanism to provide reward signals based on
segmental sparse sampling, and Temporally Attenuated Reward, a method that
mitigates temporal modeling degradation during fine-tuning. Extensive
experiments, both qualitative and quantitative, validate the practicality and
efficacy of using image reward models in InstructVideo, significantly enhancing
the visual quality of generated videos without compromising generalization
capabilities. Code and models will be made publicly available.Comment: Project page: https://instructvideo.github.io
ConSmax: Hardware-Friendly Alternative Softmax with Learnable Parameters
The self-attention mechanism sets transformer-based large language model
(LLM) apart from the convolutional and recurrent neural networks. Despite the
performance improvement, achieving real-time LLM inference on silicon is
challenging due to the extensively used Softmax in self-attention. Apart from
the non-linearity, the low arithmetic intensity greatly reduces the processing
parallelism, which becomes the bottleneck especially when dealing with a longer
context. To address this challenge, we propose Constant Softmax (ConSmax), a
software-hardware co-design as an efficient Softmax alternative. ConSmax
employs differentiable normalization parameters to remove the maximum searching
and denominator summation in Softmax. It allows for massive parallelization
while performing the critical tasks of Softmax. In addition, a scalable ConSmax
hardware utilizing a bitwidth-split look-up table (LUT) can produce lossless
non-linear operation and support mix-precision computing. It further
facilitates efficient LLM inference. Experimental results show that ConSmax
achieves a minuscule power consumption of 0.43 mW and area of 0.001 mm2 at
1-GHz working frequency and 22-nm CMOS technology. Compared to state-of-the-art
Softmax hardware, ConSmax results in 14.5x energy and 14.0x area savings with a
comparable accuracy on a GPT-2 model and the WikiText103 dataset
YOLO SSPD: a small target cotton boll detection model during the boll-spitting period based on space-to-depth convolution
IntroductionCotton yield estimation is crucial in the agricultural process, where the accuracy of boll detection during the flocculation period significantly influences yield estimations in cotton fields. Unmanned Aerial Vehicles (UAVs) are frequently employed for plant detection and counting due to their cost-effectiveness and adaptability.MethodsAddressing the challenges of small target cotton bolls and low resolution of UAVs, this paper introduces a method based on the YOLO v8 framework for transfer learning, named YOLO small-scale pyramid depth-aware detection (SSPD). The method combines space-to-depth and non-strided convolution (SPD-Conv) and a small target detector head, and also integrates a simple, parameter-free attentional mechanism (SimAM) that significantly improves target boll detection accuracy.ResultsThe YOLO SSPD achieved a boll detection accuracy of 0.874 on UAV-scale imagery. It also recorded a coefficient of determination (R2) of 0.86, with a root mean square error (RMSE) of 12.38 and a relative root mean square error (RRMSE) of 11.19% for boll counts.DiscussionThe findings indicate that YOLO SSPD can significantly improve the accuracy of cotton boll detection on UAV imagery, thereby supporting the cotton production process. This method offers a robust solution for high-precision cotton monitoring, enhancing the reliability of cotton yield estimates
- …