39 research outputs found
Online Video Super-Resolution with Convolutional Kernel Bypass Graft
Deep learning-based models have achieved remarkable performance in video
super-resolution (VSR) in recent years, but most of these models are less
applicable to online video applications. These methods solely consider the
distortion quality and ignore crucial requirements for online applications,
e.g., low latency and low model complexity. In this paper, we focus on online
video transmission, in which VSR algorithms are required to generate
high-resolution video sequences frame by frame in real time. To address such
challenges, we propose an extremely low-latency VSR algorithm based on a novel
kernel knowledge transfer method, named convolutional kernel bypass graft
(CKBG). First, we design a lightweight network structure that does not require
future frames as inputs and saves extra time costs for caching these frames.
Then, our proposed CKBG method enhances this lightweight base model by
bypassing the original network with ``kernel grafts'', which are extra
convolutional kernels containing the prior knowledge of external pretrained
image SR models. In the testing phase, we further accelerate the grafted
multi-branch network by converting it into a simple single-path structure.
Experiment results show that our proposed method can process online video
sequences up to 110 FPS, with very low model complexity and competitive SR
performance
PIT: Optimization of Dynamic Sparse Deep Learning Models via Permutation Invariant Transformation
Dynamic sparsity, where the sparsity patterns are unknown until runtime,
poses a significant challenge to deep learning. The state-of-the-art
sparsity-aware deep learning solutions are restricted to pre-defined, static
sparsity patterns due to significant overheads associated with preprocessing.
Efficient execution of dynamic sparse computation often faces the misalignment
between the GPU-friendly tile configuration for efficient execution and the
sparsity-aware tile shape that minimizes coverage wastes (non-zero values in
tensor).
In this paper, we propose PIT, a deep-learning compiler for dynamic sparsity.
PIT proposes a novel tiling mechanism that leverages Permutation Invariant
Transformation (PIT), a mathematically proven property, to transform multiple
sparsely located micro-tiles into a GPU-efficient dense tile without changing
the computation results, thus achieving both high GPU utilization and low
coverage waste. Given a model, PIT first finds feasible PIT rules for all its
operators and generates efficient GPU kernels accordingly. At runtime, with the
novel SRead and SWrite primitives, PIT rules can be executed extremely fast to
support dynamic sparsity in an online manner. Extensive evaluation on diverse
models shows that PIT can accelerate dynamic sparsity computation by up to 5.9x
(average 2.43x) over state-of-the-art compilers
Online Streaming Video Super-Resolution with Convolutional Look-Up Table
Online video streaming has fundamental limitations on the transmission
bandwidth and computational capacity and super-resolution is a promising
potential solution. However, applying existing video super-resolution methods
to online streaming is non-trivial. Existing video codecs and streaming
protocols (\eg, WebRTC) dynamically change the video quality both spatially and
temporally, which leads to diverse and dynamic degradations. Furthermore,
online streaming has a strict requirement for latency that most existing
methods are less applicable. As a result, this paper focuses on the rarely
exploited problem setting of online streaming video super resolution. To
facilitate the research on this problem, a new benchmark dataset named
LDV-WebRTC is constructed based on a real-world online streaming system.
Leveraging the new benchmark dataset, we proposed a novel method specifically
for online video streaming, which contains a convolution and Look-Up Table
(LUT) hybrid model to achieve better performance-latency trade-off. To tackle
the changing degradations, we propose a mixture-of-expert-LUT module, where a
set of LUT specialized in different degradations are built and adaptively
combined to handle different degradations. Experiments show our method achieves
720P video SR around 100 FPS, while significantly outperforms existing
LUT-based methods and offers competitive performance compared to efficient
CNN-based methods
Evaluation of capacities of bucket foundations in soft-stiff-soft clays under combined loading
Bucket foundations are widely constructed for offshore wind turbines, which are subjected to combined vertical-horizontal-moment (V-H-M) loading during operation. This technical note presents an extensive investigation into the response of bucket foundations in soft clay interbedded with a stiff clay layer (or soft-stiff-soft clays) under combined loading, which is a supplement to the existing design specification. The numerical method employed in this study is validated by comparing the bearing capacities of bucket foundations with previously published data. The numerical modeling results from this study show that the failure mechanism of bucket foundations in soft-stiff-soft clays under combined loading is significantly different from that in single-layer soft clay condition. A series of numerical analyses are conducted to explore the effects of geometric variation with the interbedded stiff clay, soil material properties and combined loading onto the bearing capacities of bucket foundations. Based on the parametric studies, a new failure mechanism for bucket foundations in soft-stiff-soft clays under general loading is obtained. And a corresponding design method is established, which can be used to calculate the monotonic vertical, horizontal, and moment bearing capacities, as well as the capacity envelopes under combined loading
The Differential Influences of Parenting Styles on Children's Academic Achievement in Chinese, Mathematics and English: The Chain-Mediating Effects of Academic Motivation and Academic Engagement
This study investigated the chain mediating roles of children’s academic motivation (intrinsic and extrinsic) and academic engagement in the relationships between parenting styles (positive and negative) and their academic achievements in Chinese, mathematics, and English. The participants were 433 elementary school children from grades 3 to 5. After controlling for child’s sex, age, nonverbal intelligence, and family socioeconomic status, the results found that: (1) positive parenting styles exhibited a beneficial effect on academic achievement in Chinese, but not in mathematics and English. Negative parenting styles did not yield a significant effect on academic achievement in any of the subjects examined; (2) positive parenting styles could affect children’s Chinese achievement through their academic engagement and thorough the chain mediating roles of intrinsic academic motivation and academic engagement. However, these mediating and chain mediating effects were not observed in the case of mathematics and English. This study emphasizes the significance of positive parenting styles specifically in relation to the Chinese subject, as opposed to mathematics and English. These findings hold important theoretical and practical implications for promoting children’s academic achievement
Nitric oxide sensing revisited
Nitric oxide (NO) sensing is an ancient trait enabled by hemoproteins harboring a highly conserved Heme-Nitric oxide/OXygen (H-NOX) domain that operates throughout bacteria, fungi, and animal kingdoms including in humans, but that has long thought to be absent in plants. Recently, H-NOX-containing plant hemoproteins mediating crucial NO-dependent responses such as stomatal closure and pollen tube guidance have been reported. There are indications that the detection method that led to these discoveries will uncover many more heme-based NO sensors that operate as regulatory sites in complex proteins. Their characterizations will in turn offer a much more complete picture of plant NO responses at both the molecular and systems level
EfficientViT: Memory Efficient Vision Transformer with Cascaded Group Attention
Vision transformers have shown great success due to their high model
capabilities. However, their remarkable performance is accompanied by heavy
computation costs, which makes them unsuitable for real-time applications. In
this paper, we propose a family of high-speed vision transformers named
EfficientViT. We find that the speed of existing transformer models is commonly
bounded by memory inefficient operations, especially the tensor reshaping and
element-wise functions in MHSA. Therefore, we design a new building block with
a sandwich layout, i.e., using a single memory-bound MHSA between efficient FFN
layers, which improves memory efficiency while enhancing channel communication.
Moreover, we discover that the attention maps share high similarities across
heads, leading to computational redundancy. To address this, we present a
cascaded group attention module feeding attention heads with different splits
of the full feature, which not only saves computation cost but also improves
attention diversity. Comprehensive experiments demonstrate EfficientViT
outperforms existing efficient models, striking a good trade-off between speed
and accuracy. For instance, our EfficientViT-M5 surpasses MobileNetV3-Large by
1.9% in accuracy, while getting 40.4% and 45.2% higher throughput on Nvidia
V100 GPU and Intel Xeon CPU, respectively. Compared to the recent efficient
model MobileViT-XXS, EfficientViT-M2 achieves 1.8% superior accuracy, while
running 5.8x/3.7x faster on the GPU/CPU, and 7.4x faster when converted to ONNX
format. Code and models are available at
https://github.com/microsoft/Cream/tree/main/EfficientViT.Comment: CVPR 202
WH<sup>2</sup>D<sup>2</sup>N<sup>2</sup>: distributed AI-enabled OK-ASN service for Web of Things
Model data-driven ontology and knowledge presentation for evolving semantic Asian social networks (OK-ASN) is a critical strategy for web of things (WoT) services. Meanwhile, Deep Neural Network (DNN)-based OK-ASN service in WoT is growing rapidly. However, most DNN-based services cannot utilize the potential of WoT fully, as heterogeneity exists in WoT. Therefore, this article proposes a novel framework called Web-based Heterogeneous Hierarchical Distributed Deep Neural Network (WH2D2N2) to deploy the DNNs for OK-ASN services on WoT, overcoming the heterogeneity. The architecture of the system and the designed Edge-Cloud-Joint execute scheme utilize heterogeneous devices to make DNN inference ubiquitous and output two types of results to meet various requirements. To bring robustness to OK-ASN services, a global scheduling is designed to arrange the workflow dynamically. The results of our experiments prove the efficiency of the execute scheme and the global scheduling in the system.</p
An adhesive cellulose nanocrystal-reinforced nanocomposite hydrogel electrolyte for supercapacitor applications
Hydrogel electrolytes were applied in various energy storage devices, including supercapacitors. However, they still suffer from disadvantages such as low mechanical performance and poor adhesion of the interfaces between electrolytes and electrodes. Herein, an adhesive hydrogel electrolyte with promising mechanical strength and electrochemical performance was designed by introducing hydrophobic carbon chains as long-range physical cross-linkers and cellulose nanocrystal (CNC) as biopolymer nano-reinforcement, and soak-loading liquid electrolytes such as KOH into the hydrogel matrix. The hydrogel electrolyte loaded with 1 M KOH demonstrated the best tensile stress of 362.31 kPa and an elongation of 2479 %, and exhibited self-repairability by applying stimuli on the cut interface. The hydrogel electrolyte showed excellent adhesion on various surfaces, including nonconductive and conductive materials such as cardboard, leather, carbon film, and carbon cloth. Regarding electrochemical properties, the hydrogel electrolyte showed the largest conductivity of 0.207 ± 0.005 S/cm when soak-loading in 1 M KOH for 24 h. Moreover, the hydrogel electrolyte exhibited promising electrochemical performance when assembled into coin-cell supercapacitors using free-standing activated carbon sheets as electrodes. A capacitance of 67.31 F/g at 0.05 A/g, and almost 100 % capacitance retention at 0.1 A/g after 2200 cycles was achieved