183 research outputs found
Field Assisted Material Engineering (FAME)
In order to further improve the energy saving of Spark Plasma Sintering we have developed a very rapid sintering technique called Flash SPS (FSPS) with heating rates in the order of 104-105 ˚C/minute[1]. Unlike the Flash Sintering based on high voltage (≈100V), FSPS is based on low voltage (≈10V) and it can be up-scaled to samples volumes of several tens of cubic centimetres. Flash SPS allows densification of metallic conductors like ZrB2 and HfB2, under a discharge time as short as 20-30 seconds. FSPS of semiconductors like silicon carbide and boron carbide was also demonstrated. Highly customized and versatile equipment with ultrafast responsive controls and programmable bipolar power supplies (up to 20 kHz, 1 MA, 500V) has been built. The developed methodology has been applied to produce FSPSed samples even larger than 6 cm in diameter of ultra refractory materials. Understanding the intrinsic electrical field role in the triangle properties-microstructure-processing remains one our primary scientific goal and the main open question. We tried to give some answers by approaching the problem at different length scales (see figure 1) by developing dedicated equipment/controls, simulations (FEM and ab-initio), thermo-kinetic analysis, in situ observations and accurate temperature measurements/calibrations.
Please click Additional Files below to see the full abstract
Interaction-aware Spatio-temporal Pyramid Attention Networks for Action Classification
Local features at neighboring spatial positions in feature maps have high
correlation since their receptive fields are often overlapped. Self-attention
usually uses the weighted sum (or other functions) with internal elements of
each local feature to obtain its weight score, which ignores interactions among
local features. To address this, we propose an effective interaction-aware
self-attention model inspired by PCA to learn attention maps. Furthermore,
since different layers in a deep network capture feature maps of different
scales, we use these feature maps to construct a spatial pyramid and then
utilize multi-scale information to obtain more accurate attention scores, which
are used to weight the local features in all spatial positions of feature maps
to calculate attention maps. Moreover, our spatial pyramid attention is
unrestricted to the number of its input feature maps so it is easily extended
to a spatio-temporal version. Finally, our model is embedded in general CNNs to
form end-to-end attention networks for action classification. Experimental
results show that our method achieves the state-of-the-art results on the
UCF101, HMDB51 and untrimmed Charades.Comment: Accepted by ECCV201
EMScore: Evaluating Video Captioning via Coarse-Grained and Fine-Grained Embedding Matching
Current metrics for video captioning are mostly based on the text-level
comparison between reference and candidate captions. However, they have some
insuperable drawbacks, e.g., they cannot handle videos without references, and
they may result in biased evaluation due to the one-to-many nature of
video-to-text and the neglect of visual relevance. From the human evaluator's
viewpoint, a high-quality caption should be consistent with the provided video,
but not necessarily be similar to the reference in literal or semantics.
Inspired by human evaluation, we propose EMScore (Embedding Matching-based
score), a novel reference-free metric for video captioning, which directly
measures similarity between video and candidate captions. Benefit from the
recent development of large-scale pre-training models, we exploit a well
pre-trained vision-language model to extract visual and linguistic embeddings
for computing EMScore. Specifically, EMScore combines matching scores of both
coarse-grained (video and caption) and fine-grained (frames and words) levels,
which takes the overall understanding and detailed characteristics of the video
into account. Furthermore, considering the potential information gain, EMScore
can be flexibly extended to the conditions where human-labeled references are
available. Last but not least, we collect VATEX-EVAL and ActivityNet-FOIl
datasets to systematically evaluate the existing metrics. VATEX-EVAL
experiments demonstrate that EMScore has higher human correlation and lower
reference dependency. ActivityNet-FOIL experiment verifies that EMScore can
effectively identify "hallucinating" captions. The datasets will be released to
facilitate the development of video captioning metrics. The code is available
at: https://github.com/ShiYaya/emscore.Comment: cvpr202
Human Action Recognition Using Pyramid Vocabulary Tree
Abstract. The bag-of-visual-words (BOVW) approaches are widely used in human action recognition. Usually, large vocabulary size of the BOVW is more discriminative for inter-class action classification while small one is more robust to noise and thus tolerant to the intra-class invariance. In this pape, we propose a pyramid vocabulary tree to model local spatio-temporal features, which can characterize the inter-class difference and also allow intra-class variance. Moreover, since BOVW is geometrically unconstrained, we further consider the spatio-temporal information of local features and propose a sparse spatio-temporal pyramid matching kernel (termed as SST-PMK) to compute the similarity measures between video sequences. SST-PMK satisfies the Mercer’s condition and therefore is readily integrated into SVM to perform action recognition. Experimental results on the Weizmann datasets show that both the pyramid vocabulary tree and the SST-PMK lead to a significant improvement in human action recognition. Keywords: Action recognition, Bag-of-visual-words (BOVW), Pyramid matching kernel (PMK
- …