16 research outputs found
A Web Service for Video Summarization
This paper presents a Web service that supports the automatic generation of video summaries for user-submitted videos. The developed Web application decomposes the video into segments, evaluates the fitness of each segment to be included in the video summary and selects appropriate segments until a pre-defined time budget is filled. The integrated deep-learning-based video analysis and summarization technologies exhibit state-of-the-art performance and, by exploiting the processing capabilities of modern GPUs, offer faster than real-time processing. Configurations for generating video summaries that fulfill the specifications for posting on the most common video sharing platforms and social networks are available in the user interface of this application, enabling the one-click generation of distribution-channel-specific summaries
A Comparison of Embedded Deep Learning Methods for Person Detection
Recent advancements in parallel computing, GPU technology and deep learning
provide a new platform for complex image processing tasks such as person
detection to flourish. Person detection is fundamental preliminary operation
for several high level computer vision tasks. One industry that can
significantly benefit from person detection is retail. In recent years, various
studies attempt to find an optimal solution for person detection using neural
networks and deep learning. This study conducts a comparison among the state of
the art deep learning base object detector with the focus on person detection
performance in indoor environments. Performance of various implementations of
YOLO, SSD, RCNN, R-FCN and SqueezeDet have been assessed using our in-house
proprietary dataset which consists of over 10 thousands indoor images captured
form shopping malls, retails and stores. Experimental results indicate that,
Tiny YOLO-416 and SSD (VGG-300) are the fastest and Faster-RCNN (Inception
ResNet-v2) and R-FCN (ResNet-101) are the most accurate detectors investigated
in this study. Further analysis shows that YOLO v3-416 delivers relatively
accurate result in a reasonable amount of time, which makes it an ideal model
for person detection in embedded platforms
Two-year recall for people with no diabetic retinopathy : a multiethnic population-based retrospective cohort study using real-world data to quantify the effect
Sladkovodni sedimenty, rizika a moznosti zpracovani.
The aim of this thesis is to evaluate chemical changes after aeration of a contaminated freshwater sediment (changes in oxidative-reduction potential and pH values, sulphate leaching and leaching of selected cations of potentially toxic metals), and to explore two new techniques for sulphate treatment: (i) inhibition of sulphate production via suppression of microbial oxidative processes in the sediment by application of NaOH,Available from STL Prague, CZ / NTK - National Technical LibrarySIGLECZCzech Republi
Visual memories
Despite the rapid progress in the field of artificial intelligence, there are still important new areas to be explored and existing methods enhanced to make machines think like humans. This thesis conducts research in four machine learning and computer vision areas in this direction.
First, we study what makes some images more memorable than others and propose a new machine learning method to learn and predict image memorability, closely matching human performance. A spatial attention function is learnt to localize image regions responsible for the image retention in memory. To identify meaningful temporal segments in a video stream, we study episodic segmentation in our memory and design a novel algorithm for video summarization to mimic human capabilities. A soft, self-attention method without a recurrent network is used to learn frame importance scores for the video summarization. This simple algorithm demonstrates a performance superior to the current state-of-the-art methods. Inspired by our brain’s ability to project high dimensional visual information to computationally efficient, meaningful representations, we propose a method for latent binary representations learning and methods for operations in this discrete latent space such as interpolation, novel image generation, and attribute modification outperforming more complex published methods.
To advance methods targeting catastrophic interference, one of the most fundamental problems of artificial neural networks, we study elementary neural mechanisms mitigating this phenomenon in our brain’s memory. Building on our insights on the function of pattern separation in the hippocampus, we propose a conceptually simple and resource-efficient method to learn high dimensional sparse binary representations for continual learning. By performing elementary binary operations or and and over a continual stream of sparse representations of novel classes, our method exhibits performance significantly exceeding the current state-of-the-art meta-learning methods on identical benchmarks
AMNet: Memorability Estimation with Attention
In this paper we present the design and evaluation of an end-to-end
trainable, deep neural network with a visual attention mechanism for
memorability estimation in still images. We analyze the suitability of transfer
learning of deep models from image classification to the memorability task.
Further on we study the impact of the attention mechanism on the memorability
estimation and evaluate our network on the SUN Memorability and the LaMem
datasets. Our network outperforms the existing state of the art models on both
datasets in terms of the Spearman's rank correlation as well as the mean
squared error, closely matching human consistency.Comment: To appear at CVPR 201
DEEP RESIDUAL NETWORK WITH SUBCLASS DISCRIMINANT ANALYSIS FOR CROWD BEHAVIOR RECOGNITION
In this work, we extract rich representations of crowd behavior from video using a fine-tuned deep convolutional neural residual network. Using spatial partitioning trees we create subclasses within the feature maps from each of the crowd behavior attributes (classes). Features from these subclasses are then regularized using an eigen modeling scheme. This enables to model the variance appearing from the intra-subclass information. Low dimensional discriminative features are extracted after using the total subclass scatter information. Dynamic time warping is used on the cosine distance measure to find the similarity measure between videos. A 1-nearest neighbor (NN) classifier is used to find the respective crowd behavior attribute classes from the normal videos. Experimental results on large crowd behavior video database show the superior performance of our proposed framework as compared to the baseline and current state-of-the-art methodologies for the crowd behavior recognition task