16 research outputs found

    A Web Service for Video Summarization

    Get PDF
    This paper presents a Web service that supports the automatic generation of video summaries for user-submitted videos. The developed Web application decomposes the video into segments, evaluates the fitness of each segment to be included in the video summary and selects appropriate segments until a pre-defined time budget is filled. The integrated deep-learning-based video analysis and summarization technologies exhibit state-of-the-art performance and, by exploiting the processing capabilities of modern GPUs, offer faster than real-time processing. Configurations for generating video summaries that fulfill the specifications for posting on the most common video sharing platforms and social networks are available in the user interface of this application, enabling the one-click generation of distribution-channel-specific summaries

    A Comparison of Embedded Deep Learning Methods for Person Detection

    Full text link
    Recent advancements in parallel computing, GPU technology and deep learning provide a new platform for complex image processing tasks such as person detection to flourish. Person detection is fundamental preliminary operation for several high level computer vision tasks. One industry that can significantly benefit from person detection is retail. In recent years, various studies attempt to find an optimal solution for person detection using neural networks and deep learning. This study conducts a comparison among the state of the art deep learning base object detector with the focus on person detection performance in indoor environments. Performance of various implementations of YOLO, SSD, RCNN, R-FCN and SqueezeDet have been assessed using our in-house proprietary dataset which consists of over 10 thousands indoor images captured form shopping malls, retails and stores. Experimental results indicate that, Tiny YOLO-416 and SSD (VGG-300) are the fastest and Faster-RCNN (Inception ResNet-v2) and R-FCN (ResNet-101) are the most accurate detectors investigated in this study. Further analysis shows that YOLO v3-416 delivers relatively accurate result in a reasonable amount of time, which makes it an ideal model for person detection in embedded platforms

    Sladkovodni sedimenty, rizika a moznosti zpracovani.

    No full text
    The aim of this thesis is to evaluate chemical changes after aeration of a contaminated freshwater sediment (changes in oxidative-reduction potential and pH values, sulphate leaching and leaching of selected cations of potentially toxic metals), and to explore two new techniques for sulphate treatment: (i) inhibition of sulphate production via suppression of microbial oxidative processes in the sediment by application of NaOH,Available from STL Prague, CZ / NTK - National Technical LibrarySIGLECZCzech Republi

    Visual memories

    No full text
    Despite the rapid progress in the field of artificial intelligence, there are still important new areas to be explored and existing methods enhanced to make machines think like humans. This thesis conducts research in four machine learning and computer vision areas in this direction. First, we study what makes some images more memorable than others and propose a new machine learning method to learn and predict image memorability, closely matching human performance. A spatial attention function is learnt to localize image regions responsible for the image retention in memory. To identify meaningful temporal segments in a video stream, we study episodic segmentation in our memory and design a novel algorithm for video summarization to mimic human capabilities. A soft, self-attention method without a recurrent network is used to learn frame importance scores for the video summarization. This simple algorithm demonstrates a performance superior to the current state-of-the-art methods. Inspired by our brain’s ability to project high dimensional visual information to computationally efficient, meaningful representations, we propose a method for latent binary representations learning and methods for operations in this discrete latent space such as interpolation, novel image generation, and attribute modification outperforming more complex published methods. To advance methods targeting catastrophic interference, one of the most fundamental problems of artificial neural networks, we study elementary neural mechanisms mitigating this phenomenon in our brain’s memory. Building on our insights on the function of pattern separation in the hippocampus, we propose a conceptually simple and resource-efficient method to learn high dimensional sparse binary representations for continual learning. By performing elementary binary operations or and and over a continual stream of sparse representations of novel classes, our method exhibits performance significantly exceeding the current state-of-the-art meta-learning methods on identical benchmarks

    AMNet: Memorability Estimation with Attention

    Get PDF
    In this paper we present the design and evaluation of an end-to-end trainable, deep neural network with a visual attention mechanism for memorability estimation in still images. We analyze the suitability of transfer learning of deep models from image classification to the memorability task. Further on we study the impact of the attention mechanism on the memorability estimation and evaluate our network on the SUN Memorability and the LaMem datasets. Our network outperforms the existing state of the art models on both datasets in terms of the Spearman's rank correlation as well as the mean squared error, closely matching human consistency.Comment: To appear at CVPR 201

    DEEP RESIDUAL NETWORK WITH SUBCLASS DISCRIMINANT ANALYSIS FOR CROWD BEHAVIOR RECOGNITION

    Get PDF
    In this work, we extract rich representations of crowd behavior from video using a fine-tuned deep convolutional neural residual network. Using spatial partitioning trees we create subclasses within the feature maps from each of the crowd behavior attributes (classes). Features from these subclasses are then regularized using an eigen modeling scheme. This enables to model the variance appearing from the intra-subclass information. Low dimensional discriminative features are extracted after using the total subclass scatter information. Dynamic time warping is used on the cosine distance measure to find the similarity measure between videos. A 1-nearest neighbor (NN) classifier is used to find the respective crowd behavior attribute classes from the normal videos. Experimental results on large crowd behavior video database show the superior performance of our proposed framework as compared to the baseline and current state-of-the-art methodologies for the crowd behavior recognition task
    corecore