9 research outputs found
Space-Time Attention with Shifted Non-Local Search
Efficiently computing attention maps for videos is challenging due to the
motion of objects between frames. While a standard non-local search is
high-quality for a window surrounding each query point, the window's small size
cannot accommodate motion. Methods for long-range motion use an auxiliary
network to predict the most similar key coordinates as offsets from each query
location. However, accurately predicting this flow field of offsets remains
challenging, even for large-scale networks. Small spatial inaccuracies
significantly impact the attention module's quality. This paper proposes a
search strategy that combines the quality of a non-local search with the range
of predicted offsets. The method, named Shifted Non-Local Search, executes a
small grid search surrounding the predicted offsets to correct small spatial
errors. Our method's in-place computation consumes 10 times less memory and is
over 3 times faster than previous work. Experimentally, correcting the small
spatial errors improves the video frame alignment quality by over 3 dB PSNR.
Our search upgrades existing space-time attention modules, which improves video
denoising results by 0.30 dB PSNR for a 7.5% increase in overall runtime. We
integrate our space-time attention module into a UNet-like architecture to
achieve state-of-the-art results on video denoising.Comment: 15 pages, 12 figure
Investigating Dataset Distinctiveness
Just as a human might struggle to interpret another human’s handwriting, a computer vision program might fail when asked to perform one task in two different domains. To be more specific, visualize a self-driving car as a human driver who had only ever driven on clear, sunny days, during daylight hours. This driver – the self-driving car – would inevitably face a significant challenge when asked to drive when it is violently raining or foggy during the night, putting the safety of its passengers in danger. An extensive understanding of the data we use to teach computer vision models – such as those that will be driving our cars in the years to come – is absolutely necessary as these sorts of complex systems find their way into everyday human life. This study works to develop a comprehensive meaning of the style of a dataset, or the quantitative difference between cursive lettering and print lettering, with respect to the image data used in the field of computer vision. We accomplished this by asking a machine learning model to predict which commonly used dataset a particular image belongs to, based on detailed features of the images. If the model performed well when classifying an image based on which dataset it belongs to, that dataset was considered distinct. We then developed a linear relationship between this distinctiveness metric and a model’s ability to learn from one dataset and test on another, so as to have a better understanding of how a computer vision system will perform in a given context, before it is trained
Comparison of Visual Datasets for Machine Learning
One of the greatest technological improvements in recent years is the rapid progress using machine learning for processing visual data. Among all factors that contribute to this development, datasets with labels play crucial roles. Several datasets are widely reused for investigating and analyzing different solutions in machine learning. Many systems, such as autonomous vehicles, rely on components using machine learning for recognizing objects. This paper compares different visual datasets and frameworks for machine learning. The comparison is both qualitative and quantitative and investigates object detection labels with respect to size, location, and contextual information. This paper also presents a new approach creating datasets using real-time, geo-tagged visual data, greatly improving the contextual information of the data. The data could be automatically labeled by cross-referencing information from other sources (such as weather)
See the World through Network Cameras
Millions of network cameras have been deployed worldwide. Real-time data from many network cameras can offer instant views of multiple locations with applications in public safety, transportation management, urban planning, agriculture, forestry, social sciences, atmospheric information, and more. This paper describes the real-time data available from worldwide network cameras and potential applications. Second, this paper outlines the CAM2 System available to users at https://www.cam2project.net/. This information includes strategies to discover network cameras and create the camera database, user interface, and computing platforms. Third, this paper describes many opportunities provided by data from network cameras and challenges to be addressed
Dynamic Sampling in Convolutional Neural Networks for Imbalanced Data Classification
Many multimedia systems stream real-time visual data continuously for a wide variety of applications. These systems can produce vast amounts of data, but few studies take advantage of the versatile and real-time data. This paper presents a novel model based on the Convolutional Neural Networks (CNNs) to handle such imbalanced and heterogeneous data and successfully identifies the semantic concepts in these multimedia systems. The proposed model can discover the semantic concepts from the data with a skewed distribution using a dynamic sampling technique. The paper also presents a system that can retrieve real-time visual data from heterogeneous cameras, and the run-time environment allows the analysis programs to process the data from thousands of cameras simultaneously. The evaluation results in comparison with several state-of-the-art methods demonstrate the ability and effectiveness of the proposed model on visual data captured by public network cameras
Low-Power Computer Vision: Status, Challenges, Opportunities
Computer vision has achieved impressive progress in recent years. Meanwhile, mobile phones have become the primary computing platforms for millions of people. In addition to mobile phones, many autonomous systems rely on visual data for making decisions and some of these systems have limited energy (such as unmanned aerial vehicles also called drones and mobile robots). These systems rely on batteries and energy efficiency is critical. This article serves two main purposes: (1) Examine the state-of-the-art for low-power solutions to detect objects in images. Since 2015, the IEEE Annual International Low-Power Image Recognition Challenge (LPIRC) has been held to identify the most energy-efficient computer vision solutions. This article summarizes 2018 winners\u27 solutions. (2) Suggest directions for research as well as opportunities for low-power computer vision