972 research outputs found

    Voronoi-Based Compact Image Descriptors: Efficient Region-of-Interest Retrieval With VLAD and Deep-Learning-Based Descriptors

    Get PDF
    We investigate the problem of image retrieval based on visual queries when the latter comprise arbitrary regionsof- interest (ROI) rather than entire images. Our proposal is a compact image descriptor that combines the state-of-the-art in content-based descriptor extraction with a multi-level, Voronoibased spatial partitioning of each dataset image. The proposed multi-level Voronoi-based encoding uses a spatial hierarchical K-means over interest-point locations, and computes a contentbased descriptor over each cell. In order to reduce the matching complexity with minimal or no sacrifice in retrieval performance: (i) we utilize the tree structure of the spatial hierarchical Kmeans to perform a top-to-bottom pruning for local similarity maxima; (ii) we propose a new image similarity score that combines relevant information from all partition levels into a single measure for similarity; (iii) we combine our proposal with a novel and efficient approach for optimal bit allocation within quantized descriptor representations. By deriving both a Voronoi-based VLAD descriptor (termed as Fast-VVLAD) and a Voronoi-based deep convolutional neural network (CNN) descriptor (termed as Fast-VDCNN), we demonstrate that our Voronoi-based framework is agnostic to the descriptor basis, and can easily be slotted into existing frameworks. Via a range of ROI queries in two standard datasets, it is shown that the Voronoibased descriptors achieve comparable or higher mean Average Precision against conventional grid-based spatial search, while offering more than two-fold reduction in complexity. Finally, beyond ROI queries, we show that Voronoi partitioning improves the geometric invariance of compact CNN descriptors, thereby resulting in competitive performance to the current state-of-theart on whole image retrieval

    Deep perceptual preprocessing for video coding

    Get PDF
    We introduce the concept of rate-aware deep perceptual preprocessing (DPP) for video encoding. DPP makes a single pass over each input frame in order to enhance its visual quality when the video is to be compressed with any codec at any bitrate. The resulting bitstreams can be decoded and displayed at the client side without any post-processing component. DPP comprises a convolutional neural network that is trained via a composite set of loss functions that incorporates: (i) a perceptual loss based on a trained no-reference image quality assessment model, (ii) a reference-based fidelity loss expressing L1 and structural similarity aspects, (iii) a motion-based rate loss via block-based transform, quantization and entropy estimates that converts the essential components of standard hybrid video encoder designs into a trainable framework. Extensive testing using multiple quality metrics and AVC, AV1 and VVC encoders shows that DPP+encoder reduces, on average, the bitrate of the corresponding encoder by 11%. This marks the first time a server-side neural processing component achieves such savings over the state-of-the-art in video coding

    Neuromorphic Vision Sensing for CNN-based Action Recognition

    Get PDF
    Neuromorphic vision sensing (NVS) hardware is now gaining traction as a low-power/high-speed visual sensing technology that circumvents the limitations of conventional active pixel sensing (APS) cameras. While object detection and tracking models have been investigated in conjunction with NVS, there is currently little work on NVS for higher-level semantic tasks, such as action recognition. Contrary to recent work that considers homogeneous transfer between flow domains (optical flow to motion vectors), we propose to embed an NVS emulator into a multi-modal transfer learning framework that carries out heterogeneous transfer from optical flow to NVS. The potential of our framework is showcased by the fact that, for the first time, our NVS-based results achieve comparable action recognition performance to motion-vector or optical-flow based methods (i.e., accuracy on UCF-101 within 8.8% of I3D with optical flow), with the NVS emulator and NVS camera hardware offering 3 to 6 orders of magnitude faster frame generation (respectively) compared to standard Brox optical flow. Beyond this significant advantage, our CNN processing is found to have the lowest total GFLOP count against all competing methods (up to 7.7 times complexity saving compared to I3D with optical flow)

    Graph-Based Object Classification for Neuromorphic Vision Sensing

    Get PDF
    Neuromorphic vision sensing (NVS) devices represent visual information as sequences of asynchronous discrete events (a.k.a., "spikes'") in response to changes in scene reflectance. Unlike conventional active pixel sensing (APS), NVS allows for significantly higher event sampling rates at substantially increased energy efficiency and robustness to illumination changes. However, object classification with NVS streams cannot leverage on state-of-the-art convolutional neural networks (CNNs), since NVS does not produce frame representations. To circumvent this mismatch between sensing and processing with CNNs, we propose a compact graph representation for NVS. We couple this with novel residual graph CNN architectures and show that, when trained on spatio-temporal NVS data for object classification, such residual graph CNNs preserve the spatial and temporal coherence of spike events, while requiring less computation and memory. Finally, to address the absence of large real-world NVS datasets for complex recognition tasks, we present and make available a 100k dataset of NVS recordings of the American sign language letters, acquired with an iniLabs DAVIS240c device under real-world conditions

    Escaping the complexity-bitrate-quality barriers of video encoders via deep perceptual optimization

    Get PDF
    We extend the concept of learnable video precoding (rate-aware neural-network processing prior to encoding) to deep perceptual optimization (DPO). Our framework comprises a pixel-to-pixel convolutional neural network that is trained based on the virtualization of core encoding blocks (block transform, quantization, block-based prediction) and multiple loss functions representing rate, distortion and visual quality of the virtual encoder. We evaluate our proposal with AVC/H.264 and AV1 under per-clip rate-quality optimization. The results show that DPO offers, on average, 14.2% bitrate reduction over AVC/H.264 and 12.5% bitrate reduction over AV1. Our framework is shown to improve both distortion- and perception-oriented metrics in a consistent manner, exhibiting only 3% outliers, which correspond to content with peculiar characteristics. Thus, DPO is shown to offer complexity-bitrate-quality tradeoffs that go beyond what conventional video encoders can offe

    Indirect cyclopexy for treatment of a chronic traumatic cyclodialysis cleft with hypotony

    Get PDF
    Cyclodialysis cleft is a rare clinical finding and therefore, reports on surgical repair techniques in the literature are limited. Additionally, hypotony can make repair technically challenging. We share a novel, simple surgical approach to management of a case of chronic traumatic cyclodialysis cleft with a successful outcome

    Toward Generalized Psychovisual Preprocessing For Video Encoding

    Get PDF
    Deep perceptual preprocessing has recently emerged as a new way to enable further bitrate savings across several generations of video encoders without breaking standards or requiring any changes in client devices. In this article, we lay the foundation for a generalized psychovisual preprocessing framework for video encoding and describe one of its promising instantiations that is practically deployable for video-on-demand, live, gaming, and user-generated content (UGC). Results using state-of-the-art advanced video coding (AVC), high efficiency video coding (HEVC), and versatile video coding (VVC) encoders show that average bitrate [Bjontegaard delta-rate (BD-rate)] gains of 11%-17% are obtained over three state-of-the-art reference-based quality metrics [Netflix video multi-method assessment fusion (VMAF), structural similarity index (SSIM), and Apple advanced video quality tool (AVQT)], as well as the recently proposed nonreference International Telecommunication Union-Telecommunication?(ITU-T) P.1204 metric. The proposed framework on CPU is shown to be twice faster than × 264 medium-preset encoding. On GPU hardware, our approach achieves 714 frames/sec for 1080p video (below 2 ms/frame), thereby enabling its use in very-low-latency live video or game streaming applications

    Evaluation of two mobile health apps in the context of smoking cessation: qualitative study of cognitive behavioral therapy (CBT) versus non-CBT-based digital solutions.

    Get PDF
    BACKGROUND: Mobile health (mHealth) apps can offer users numerous benefits, representing a feasible and acceptable means of administering health interventions such as cognitive behavioral therapy (CBT). CBT is commonly used in the treatment of mental health conditions, where it has a strong evidence base, suggesting that it represents an effective method to elicit health behavior change. More importantly, CBT has proved to be effective in smoking cessation, in the context of smoking-related costs to the National Health Service (NHS) having been estimated to be as high as £2.6bn in 2015. Although the evidence base for computerized CBT in mental health is strong, there is limited literature on its use in smoking cessation. This, combined with the cost-effectiveness of mHealth interventions, advocates a need for research into the effectiveness of CBT-based smoking cessation apps. OBJECTIVE: The objective of this study was, first, to explore participants' perceptions of 2 mHealth apps, a CBT-based app, Quit Genius, and a non-CBT-based app, NHS Smokefree, over a variety of themes. Second, the study aimed to investigate the perceptions and health behavior of users of each app with respect to smoking cessation. METHODS: A qualitative short-term longitudinal study was conducted, using a sample of 29 smokers allocated to one of the 2 apps, Quit Genius or Smokefree. Each user underwent 2 one-to-one semistructured interviews, 1 week apart. Thematic analysis was carried out, and important themes were identified. Descriptive statistics regarding participants' perceptions and health behavior in relation to smoking cessation are also provided. RESULTS: The thematic analysis resulted in five higher themes and several subthemes. Participants were generally more positive about Quit Genius's features, as well as about its design and information engagement and quality. Quit Genius users reported increased motivation to quit smoking, as well as greater willingness to continue using their allocated app after 1 week. Moreover, these participants demonstrated preliminary changes in their smoking behavior, although this was in the context of our limited sample, not yet allowing for the finding to be generalizable. CONCLUSIONS: Our findings underscore the use of CBT in the context of mHealth apps as a feasible and potentially effective smoking cessation tool. mHealth apps must be well developed, preferably with an underlying behavioral change mechanism, to promote positive health behavior change. Digital CBT has the potential to become a powerful tool in overcoming current health care challenges. The present results should be replicated in a wider sample using the apps for a longer period so as to allow for generalizability. Further research is also needed to focus on the effect of greater personalization on behavioral change and on understanding the psychological barriers to the adoption of new mHealth solutions

    Compressed-domain video classification with deep neural networks: “There's way too much information to decode the matrix”

    Get PDF
    We investigate video classification via a 3D deep convolutional neural network (CNN) that directly ingests compressed bitstream information. This idea is based on the observation that video macroblock (MB) motion vectors (that are very compact and directly available from the compressed bitstream) are inherently capturing local spatio-temporal changes in each video scene. Our results on two standard video datasets show that our approach outperforms pixelbased approaches and remains within 7 percentile points from the best classification results reported by highly-complex optical-flow & deep-CNN methods. At the same time, a CPU-based realization of our approach is found to be more than 2500 times faster in the motion extraction in comparison to GPU-based optical flow methods and also offers 2 to 3.4-fold reduction in the utilized deep CNN weights compared to recent architectures. This indicates that deep learning based on compressed video bitstream information may allow for advanced video classification to be deployed in very large datasets using commodity CPU hardware. Source code is available at http://www.github.com/mvcnn

    Nucleon decay matrix elements with the Wilson quark action: an update

    Get PDF
    We present preliminary results of a new lattice computation of hadronic matrix elements of baryon number violating operators which appear in the low-energy effective Lagrangian of (SUSY-)Grand Unified Theories. The contribution of irrelevant form factor which has caused an underestimate of the matrix elements in previous studies is subtracted in this calculation. Our results are 2\sim4 times larger than the most conservative values often employed in phenomenological analyses of nucleon decay with specific GUT models.Comment: LATTICE99(matrixelements), 3 pages, 2 figure
    corecore