972 research outputs found
Voronoi-Based Compact Image Descriptors: Efficient Region-of-Interest Retrieval With VLAD and Deep-Learning-Based Descriptors
We investigate the problem of image retrieval based on visual queries when the latter comprise arbitrary regionsof- interest (ROI) rather than entire images. Our proposal is a compact image descriptor that combines the state-of-the-art in content-based descriptor extraction with a multi-level, Voronoibased spatial partitioning of each dataset image. The proposed multi-level Voronoi-based encoding uses a spatial hierarchical K-means over interest-point locations, and computes a contentbased descriptor over each cell. In order to reduce the matching complexity with minimal or no sacrifice in retrieval performance: (i) we utilize the tree structure of the spatial hierarchical Kmeans to perform a top-to-bottom pruning for local similarity maxima; (ii) we propose a new image similarity score that combines relevant information from all partition levels into a single measure for similarity; (iii) we combine our proposal with a novel and efficient approach for optimal bit allocation within quantized descriptor representations. By deriving both a Voronoi-based VLAD descriptor (termed as Fast-VVLAD) and a Voronoi-based deep convolutional neural network (CNN) descriptor (termed as Fast-VDCNN), we demonstrate that our Voronoi-based framework is agnostic to the descriptor basis, and can easily be slotted into existing frameworks. Via a range of ROI queries in two standard datasets, it is shown that the Voronoibased descriptors achieve comparable or higher mean Average Precision against conventional grid-based spatial search, while offering more than two-fold reduction in complexity. Finally, beyond ROI queries, we show that Voronoi partitioning improves the geometric invariance of compact CNN descriptors, thereby resulting in competitive performance to the current state-of-theart on whole image retrieval
Deep perceptual preprocessing for video coding
We introduce the concept of rate-aware deep perceptual preprocessing (DPP) for video encoding. DPP makes a single pass over each input frame in order to enhance its visual quality when the video is to be compressed with any codec at any bitrate. The resulting bitstreams can be decoded and displayed at the client side without any post-processing component. DPP comprises a convolutional neural network that is trained via a composite set of loss functions that incorporates: (i) a perceptual loss based on a trained no-reference image quality assessment model, (ii) a reference-based fidelity loss expressing L1 and structural similarity aspects, (iii) a motion-based rate loss via block-based transform, quantization and entropy estimates that converts the essential components of standard hybrid video encoder designs into a trainable framework. Extensive testing using multiple quality metrics and AVC, AV1 and VVC encoders shows that DPP+encoder reduces, on average, the bitrate of the corresponding encoder by 11%. This marks the first time a server-side neural processing component achieves such savings over the state-of-the-art in video coding
Neuromorphic Vision Sensing for CNN-based Action Recognition
Neuromorphic vision sensing (NVS) hardware is now gaining traction as a low-power/high-speed visual sensing technology that circumvents the limitations of conventional active pixel sensing (APS) cameras. While object detection and tracking models have been investigated in conjunction with NVS, there is currently little work on NVS for higher-level semantic tasks, such as action recognition. Contrary to recent work that considers homogeneous transfer between flow domains (optical flow to motion vectors), we propose to embed an NVS emulator into a multi-modal transfer learning framework that carries out heterogeneous transfer from optical flow to NVS. The potential of our framework is showcased by the fact that, for the first time, our NVS-based results achieve comparable action recognition performance to motion-vector or optical-flow based methods (i.e., accuracy on UCF-101 within 8.8% of I3D with optical flow), with the NVS emulator and NVS camera hardware offering 3 to 6 orders of magnitude faster frame generation (respectively) compared to standard Brox optical flow. Beyond this significant advantage, our CNN processing is found to have the lowest total GFLOP count against all competing methods (up to 7.7 times complexity saving compared to I3D with optical flow)
Graph-Based Object Classification for Neuromorphic Vision Sensing
Neuromorphic vision sensing (NVS) devices represent visual information as sequences of asynchronous discrete events (a.k.a., "spikes'") in response to changes in scene reflectance. Unlike conventional active pixel sensing (APS), NVS allows for significantly higher event sampling rates at substantially increased energy efficiency and robustness to illumination changes. However, object classification with NVS streams cannot leverage on state-of-the-art convolutional neural networks (CNNs), since NVS does not produce frame representations. To circumvent this mismatch between sensing and processing with CNNs, we propose a compact graph representation for NVS. We couple this with novel residual graph CNN architectures and show that, when trained on spatio-temporal NVS data for object classification, such residual graph CNNs preserve the spatial and temporal coherence of spike events, while requiring less computation and memory. Finally, to address the absence of large real-world NVS datasets for complex recognition tasks, we present and make available a 100k dataset of NVS recordings of the American sign language letters, acquired with an iniLabs DAVIS240c device under real-world conditions
Escaping the complexity-bitrate-quality barriers of video encoders via deep perceptual optimization
We extend the concept of learnable video precoding (rate-aware neural-network processing prior to encoding)
to deep perceptual optimization (DPO). Our framework comprises a pixel-to-pixel convolutional neural network
that is trained based on the virtualization of core encoding blocks (block transform, quantization, block-based
prediction) and multiple loss functions representing rate, distortion and visual quality of the virtual encoder.
We evaluate our proposal with AVC/H.264 and AV1 under per-clip rate-quality optimization. The results show
that DPO offers, on average, 14.2% bitrate reduction over AVC/H.264 and 12.5% bitrate reduction over AV1.
Our framework is shown to improve both distortion- and perception-oriented metrics in a consistent manner,
exhibiting only 3% outliers, which correspond to content with peculiar characteristics. Thus, DPO is shown to
offer complexity-bitrate-quality tradeoffs that go beyond what conventional video encoders can offe
Indirect cyclopexy for treatment of a chronic traumatic cyclodialysis cleft with hypotony
Cyclodialysis cleft is a rare clinical finding and therefore, reports on surgical repair techniques in the literature are limited. Additionally, hypotony can make repair technically challenging. We share a novel, simple surgical approach to management of a case of chronic traumatic cyclodialysis cleft with a successful outcome
Toward Generalized Psychovisual Preprocessing For Video Encoding
Deep perceptual preprocessing has recently emerged as a new way to enable further bitrate savings across several generations of video encoders without breaking standards or requiring any changes in client devices. In this article, we lay the foundation for a generalized psychovisual preprocessing framework for video encoding and describe one of its promising instantiations that is practically deployable for video-on-demand, live, gaming, and user-generated content (UGC). Results using state-of-the-art advanced video coding (AVC), high efficiency video coding (HEVC), and versatile video coding (VVC) encoders show that average bitrate [Bjontegaard delta-rate (BD-rate)] gains of 11%-17% are obtained over three state-of-the-art reference-based quality metrics [Netflix video multi-method assessment fusion (VMAF), structural similarity index (SSIM), and Apple advanced video quality tool (AVQT)], as well as the recently proposed nonreference International Telecommunication Union-Telecommunication?(ITU-T) P.1204 metric. The proposed framework on CPU is shown to be twice faster than × 264 medium-preset encoding. On GPU hardware, our approach achieves 714 frames/sec for 1080p video (below 2 ms/frame), thereby enabling its use in very-low-latency live video or game streaming applications
Evaluation of two mobile health apps in the context of smoking cessation: qualitative study of cognitive behavioral therapy (CBT) versus non-CBT-based digital solutions.
BACKGROUND: Mobile health (mHealth) apps can offer users numerous benefits, representing a feasible and acceptable means of administering health interventions such as cognitive behavioral therapy (CBT). CBT is commonly used in the treatment of mental health conditions, where it has a strong evidence base, suggesting that it represents an effective method to elicit health behavior change. More importantly, CBT has proved to be effective in smoking cessation, in the context of smoking-related costs to the National Health Service (NHS) having been estimated to be as high as £2.6bn in 2015. Although the evidence base for computerized CBT in mental health is strong, there is limited literature on its use in smoking cessation. This, combined with the cost-effectiveness of mHealth interventions, advocates a need for research into the effectiveness of CBT-based smoking cessation apps. OBJECTIVE: The objective of this study was, first, to explore participants' perceptions of 2 mHealth apps, a CBT-based app, Quit Genius, and a non-CBT-based app, NHS Smokefree, over a variety of themes. Second, the study aimed to investigate the perceptions and health behavior of users of each app with respect to smoking cessation. METHODS: A qualitative short-term longitudinal study was conducted, using a sample of 29 smokers allocated to one of the 2 apps, Quit Genius or Smokefree. Each user underwent 2 one-to-one semistructured interviews, 1 week apart. Thematic analysis was carried out, and important themes were identified. Descriptive statistics regarding participants' perceptions and health behavior in relation to smoking cessation are also provided. RESULTS: The thematic analysis resulted in five higher themes and several subthemes. Participants were generally more positive about Quit Genius's features, as well as about its design and information engagement and quality. Quit Genius users reported increased motivation to quit smoking, as well as greater willingness to continue using their allocated app after 1 week. Moreover, these participants demonstrated preliminary changes in their smoking behavior, although this was in the context of our limited sample, not yet allowing for the finding to be generalizable. CONCLUSIONS: Our findings underscore the use of CBT in the context of mHealth apps as a feasible and potentially effective smoking cessation tool. mHealth apps must be well developed, preferably with an underlying behavioral change mechanism, to promote positive health behavior change. Digital CBT has the potential to become a powerful tool in overcoming current health care challenges. The present results should be replicated in a wider sample using the apps for a longer period so as to allow for generalizability. Further research is also needed to focus on the effect of greater personalization on behavioral change and on understanding the psychological barriers to the adoption of new mHealth solutions
Compressed-domain video classification with deep neural networks: “There's way too much information to decode the matrix”
We investigate video classification via a 3D deep convolutional
neural network (CNN) that directly ingests compressed
bitstream information. This idea is based on the observation
that video macroblock (MB) motion vectors (that are
very compact and directly available from the compressed
bitstream) are inherently capturing local spatio-temporal
changes in each video scene. Our results on two standard
video datasets show that our approach outperforms pixelbased
approaches and remains within 7 percentile points from
the best classification results reported by highly-complex
optical-flow & deep-CNN methods. At the same time, a
CPU-based realization of our approach is found to be more
than 2500 times faster in the motion extraction in comparison
to GPU-based optical flow methods and also offers 2
to 3.4-fold reduction in the utilized deep CNN weights compared
to recent architectures. This indicates that deep learning
based on compressed video bitstream information may allow
for advanced video classification to be deployed in very
large datasets using commodity CPU hardware. Source code
is available at http://www.github.com/mvcnn
Nucleon decay matrix elements with the Wilson quark action: an update
We present preliminary results of a new lattice computation of hadronic
matrix elements of baryon number violating operators which appear in the
low-energy effective Lagrangian of (SUSY-)Grand Unified Theories. The
contribution of irrelevant form factor which has caused an underestimate of the
matrix elements in previous studies is subtracted in this calculation. Our
results are 24 times larger than the most conservative values often
employed in phenomenological analyses of nucleon decay with specific GUT
models.Comment: LATTICE99(matrixelements), 3 pages, 2 figure
- …