244 research outputs found
Analysis of a discrete-layout bimorph disk elements piezoelectric deformable mirror
We introduce a discrete-layout bimorph disk elements piezoelectric deformable mirror (DBDEPDM), driven by the circular flexural-mode piezoelectric actuators. We formulated an electromechanical model for analyzing the performance of the new deformable mirror. As a numerical example, a 21-actuators DBDEPDM with an aperture of 165 mm was modeled. The presented results demonstrate that the DBDEPDM has a stroke larger than 10 μm and the resonance frequency is 4.456 kHz. Compared with the conventional piezoelectric deformable mirrors, the DBDEPDM has a larger stroke, higher resonance frequency, and provides higher spatial resolution due to the circular shape of its actuators. Moreover, numerical simulations of influence functions on the model are provided
Egocentric Hand Detection Via Dynamic Region Growing
Egocentric videos, which mainly record the activities carried out by the
users of the wearable cameras, have drawn much research attentions in recent
years. Due to its lengthy content, a large number of ego-related applications
have been developed to abstract the captured videos. As the users are
accustomed to interacting with the target objects using their own hands while
their hands usually appear within their visual fields during the interaction,
an egocentric hand detection step is involved in tasks like gesture
recognition, action recognition and social interaction understanding. In this
work, we propose a dynamic region growing approach for hand region detection in
egocentric videos, by jointly considering hand-related motion and egocentric
cues. We first determine seed regions that most likely belong to the hand, by
analyzing the motion patterns across successive frames. The hand regions can
then be located by extending from the seed regions, according to the scores
computed for the adjacent superpixels. These scores are derived from four
egocentric cues: contrast, location, position consistency and appearance
continuity. We discuss how to apply the proposed method in real-life scenarios,
where multiple hands irregularly appear and disappear from the videos.
Experimental results on public datasets show that the proposed method achieves
superior performance compared with the state-of-the-art methods, especially in
complicated scenarios
Self-supervised Spatio-temporal Representation Learning for Videos by Predicting Motion and Appearance Statistics
We address the problem of video representation learning without
human-annotated labels. While previous efforts address the problem by designing
novel self-supervised tasks using video data, the learned features are merely
on a frame-by-frame basis, which are not applicable to many video analytic
tasks where spatio-temporal features are prevailing. In this paper we propose a
novel self-supervised approach to learn spatio-temporal features for video
representation. Inspired by the success of two-stream approaches in video
classification, we propose to learn visual features by regressing both motion
and appearance statistics along spatial and temporal dimensions, given only the
input video data. Specifically, we extract statistical concepts (fast-motion
region and the corresponding dominant direction, spatio-temporal color
diversity, dominant color, etc.) from simple patterns in both spatial and
temporal domains. Unlike prior puzzles that are even hard for humans to solve,
the proposed approach is consistent with human inherent visual habits and
therefore easy to answer. We conduct extensive experiments with C3D to validate
the effectiveness of our proposed approach. The experiments show that our
approach can significantly improve the performance of C3D when applied to video
classification tasks. Code is available at
https://github.com/laura-wang/video_repres_mas.Comment: CVPR 201
Modeling Sketching Primitives to Support Freehand Drawing Based on Context Awareness
Freehand drawing is an easy and intuitive method for thinking input and output. In sketch based interface, there lack support for natural sketching with drawing cues, like overlapping, overlooping, hatching, etc. which happen frequently in physical pen and paper. In this paper, we analyze some characters of drawing cues in sketch based interface and describe the different types of sketching primitives. An improved sketch information model is given and the idea is to present and record design thinking during freehand drawing process with individuality and diversification. The interaction model based on context is developed which can guide and help new sketch-based interface development. New applications with different context contents can be easily derived from it and developed further. Our approach can support the tasks that are common across applications, requiring the designer to only provide support for the application-specific tasks. It is capable of and applicable for modeling various sketching interfaces and applications. Finally, we illustrate the general operations of the system by examples in different applications
Modeling and Design of the Communication Sensing and Control Coupled Closed-Loop Industrial System
With the advent of 5G era, factories are transitioning towards wireless
networks to break free from the limitations of wired networks. In 5G-enabled
factories, unmanned automatic devices such as automated guided vehicles and
robotic arms complete production tasks cooperatively through the periodic
control loops. In such loops, the sensing data is generated by sensors, and
transmitted to the control center through uplink wireless communications. The
corresponding control commands are generated and sent back to the devices
through downlink wireless communications. Since wireless communications,
sensing and control are tightly coupled, there are big challenges on the
modeling and design of such closed-loop systems. In particular, existing
theoretical tools of these functionalities have different modelings and
underlying assumptions, which make it difficult for them to collaborate with
each other. Therefore, in this paper, an analytical closed-loop model is
proposed, where the performances and resources of communication, sensing and
control are deeply related. To achieve the optimal control performance, a
co-design of communication resource allocation and control method is proposed,
inspired by the model predictive control algorithm. Numerical results are
provided to demonstrate the relationships between the resources and control
performances.Comment: 6 pages, 3 figures, received by GlobeCom 202
Shunted Self-Attention via Multi-Scale Token Aggregation
Recent Vision Transformer~(ViT) models have demonstrated encouraging results
across various computer vision tasks, thanks to their competence in modeling
long-range dependencies of image patches or tokens via self-attention. These
models, however, usually designate the similar receptive fields of each token
feature within each layer. Such a constraint inevitably limits the ability of
each self-attention layer in capturing multi-scale features, thereby leading to
performance degradation in handling images with multiple objects of different
scales. To address this issue, we propose a novel and generic strategy, termed
shunted self-attention~(SSA), that allows ViTs to model the attentions at
hybrid scales per attention layer. The key idea of SSA is to inject
heterogeneous receptive field sizes into tokens: before computing the
self-attention matrix, it selectively merges tokens to represent larger object
features while keeping certain tokens to preserve fine-grained features. This
novel merging scheme enables the self-attention to learn relationships between
objects with different sizes and simultaneously reduces the token numbers and
the computational cost. Extensive experiments across various tasks demonstrate
the superiority of SSA. Specifically, the SSA-based transformer achieves 84.0\%
Top-1 accuracy and outperforms the state-of-the-art Focal Transformer on
ImageNet with only half of the model size and computation cost, and surpasses
Focal Transformer by 1.3 mAP on COCO and 2.9 mIOU on ADE20K under similar
parameter and computation cost. Code has been released at
https://github.com/OliverRensu/Shunted-Transformer
Visualizing the Invisible: Occluded Vehicle Segmentation and Recovery
In this paper, we propose a novel iterative multi-task framework to complete
the segmentation mask of an occluded vehicle and recover the appearance of its
invisible parts. In particular, to improve the quality of the segmentation
completion, we present two coupled discriminators and introduce an auxiliary 3D
model pool for sampling authentic silhouettes as adversarial samples. In
addition, we propose a two-path structure with a shared network to enhance the
appearance recovery capability. By iteratively performing the segmentation
completion and the appearance recovery, the results will be progressively
refined. To evaluate our method, we present a dataset, the Occluded Vehicle
dataset, containing synthetic and real-world occluded vehicle images. We
conduct comparison experiments on this dataset and demonstrate that our model
outperforms the state-of-the-art in tasks of recovering segmentation mask and
appearance for occluded vehicles. Moreover, we also demonstrate that our
appearance recovery approach can benefit the occluded vehicle tracking in
real-world videos
Self-supervised Video Representation Learning by Uncovering Spatio-temporal Statistics
This paper proposes a novel pretext task to address the self-supervised video
representation learning problem. Specifically, given an unlabeled video clip,
we compute a series of spatio-temporal statistical summaries, such as the
spatial location and dominant direction of the largest motion, the spatial
location and dominant color of the largest color diversity along the temporal
axis, etc. Then a neural network is built and trained to yield the statistical
summaries given the video frames as inputs. In order to alleviate the learning
difficulty, we employ several spatial partitioning patterns to encode rough
spatial locations instead of exact spatial Cartesian coordinates. Our approach
is inspired by the observation that human visual system is sensitive to rapidly
changing contents in the visual field, and only needs impressions about rough
spatial locations to understand the visual contents. To validate the
effectiveness of the proposed approach, we conduct extensive experiments with
four 3D backbone networks, i.e., C3D, 3D-ResNet, R(2+1)D and S3D-G. The results
show that our approach outperforms the existing approaches across these
backbone networks on four downstream video analysis tasks including action
recognition, video retrieval, dynamic scene recognition, and action similarity
labeling. The source code is publicly available at:
https://github.com/laura-wang/video_repres_sts.Comment: Accepted by TPAMI. An extension of our previous work at
arXiv:1904.0359
Socio-demographic association of multiple modifiable lifestyle risk factors and their clustering in a representative urban population of adults: a cross-sectional study in Hangzhou, China
<p>Abstract</p> <p>Background</p> <p>To plan long-term prevention strategies and develop tailored intervention activities, it is important to understand the socio-demographic characteristics of the subpopulations at high risk of developing chronic diseases. This study aimed to examine the socio-demographic characteristics associated with multiple lifestyle risk factors and their clustering.</p> <p>Methods</p> <p>We conducted a simple random sampling survey to assess lifestyle risk factors in three districts of Hangzhou, China between 2008 and 2009. A two-step cluster analysis was used to identify different health-related lifestyle clusters based on tobacco use, physical activity, fruit and vegetable consumption, and out-of-home eating. Multinomial logistic regression was used to model the association between socio-demographic factors and lifestyle clusters.</p> <p>Results</p> <p>A total of 2016 eligible people (977 men and 1039 women, ages 18-64 years) completed the survey. Three distinct clusters were identified from the cluster analysis: an unhealthy (UH) group (25.7%), moderately healthy (MH) group (31.1%), and healthy (H) group (43.1%). UH group was characterised by a high prevalence of current daily smoking, a moderate or low level of PA, low FV consumption with regard to the frequency or servings, and more occurrences of eating out. H group was characterised by no current daily smoking, a moderate level of PA, high FV consumption, and the fewest times of eating out. MH group was characterised by no current daily smoking, a low or high level of PA, and an intermediate level of FV consumption and frequency of eating out. Men were more likely than women to have unhealthy lifestyles. Adults aged 50-64 years were more likely to live healthy lifestyles. Adults aged 40-49 years were more likely to be in the UH group. Adults whose highest level of education was junior high school or below were more likely to be in the UH group. Adults with a high asset index were more likely to be in the MH group.</p> <p>Conclusions</p> <p>This study suggests that Chinese urban people who are middle-aged, men, and less educated are most likely to be part of the cluster with a high-risk profile. Those groups will contribute the most to the future burden of major chronic disease and should be targeted for early prevention programs.</p
- …