2,031 research outputs found
Activity-driven content adaptation for effective video summarisation
In this paper, we present a novel method for content adaptation and video summarization fully implemented in compressed-domain. Firstly, summarization of generic videos is modeled as the process of extracted human objects under various activities/events. Accordingly, frames are classified into five categories via fuzzy decision including shot changes (cut and gradual transitions), motion activities (camera motion and object motion) and others by using two inter-frame measurements. Secondly, human objects are detected using Haar-like features. With the detected human objects and attained frame categories, activity levels for each frame are determined to adapt with video contents. Continuous frames belonging to same category are grouped to form one activity entry as content of interest (COI) which will convert the original video into a series of activities. An overall adjustable quota is used to control the size of generated summarization for efficient streaming purpose. Upon this quota, the frames selected for summarization are determined by evenly sampling the accumulated activity levels for content adaptation. Quantitative evaluations have proved the effectiveness and efficiency of our proposed approach, which provides a more flexible and general solution for this topic as domain-specific tasks such as accurate recognition of objects can be avoided
Dense semantic labeling of sub-decimeter resolution images with convolutional neural networks
Semantic labeling (or pixel-level land-cover classification) in ultra-high
resolution imagery (< 10cm) requires statistical models able to learn high
level concepts from spatial data, with large appearance variations.
Convolutional Neural Networks (CNNs) achieve this goal by learning
discriminatively a hierarchy of representations of increasing abstraction.
In this paper we present a CNN-based system relying on an
downsample-then-upsample architecture. Specifically, it first learns a rough
spatial map of high-level representations by means of convolutions and then
learns to upsample them back to the original resolution by deconvolutions. By
doing so, the CNN learns to densely label every pixel at the original
resolution of the image. This results in many advantages, including i)
state-of-the-art numerical accuracy, ii) improved geometric accuracy of
predictions and iii) high efficiency at inference time.
We test the proposed system on the Vaihingen and Potsdam sub-decimeter
resolution datasets, involving semantic labeling of aerial images of 9cm and
5cm resolution, respectively. These datasets are composed by many large and
fully annotated tiles allowing an unbiased evaluation of models making use of
spatial information. We do so by comparing two standard CNN architectures to
the proposed one: standard patch classification, prediction of local label
patches by employing only convolutions and full patch labeling by employing
deconvolutions. All the systems compare favorably or outperform a
state-of-the-art baseline relying on superpixels and powerful appearance
descriptors. The proposed full patch labeling CNN outperforms these models by a
large margin, also showing a very appealing inference time.Comment: Accepted in IEEE Transactions on Geoscience and Remote Sensing, 201
Neural network application to comprehensive engine diagnostics
We have previously reported on the use of neural networks for detection and identification of faults in complex microprocessor controlled powertrain systems. The data analyzed in those studies consisted of the full spectrum of signals passing between the engine and the real-time microprocessor controller. The specific task of the classification system was to classify system operation as nominal or abnormal and to identify the fault present. The primary concern in earlier work was the identification of faults, in sensors or actuators in the powertrain system as it was exercised over its full operating range. The use of data from a variety of sources, each contributing some potentially useful information to the classification task, is commonly referred to as sensor fusion and typifies the type of problems successfully addressed using neural networks. In this work we explore the application of neural networks to a different diagnostic problem, the diagnosis of faults in newly manufactured engines and the utility of neural networks for process control
HumanSD: A Native Skeleton-Guided Diffusion Model for Human Image Generation
Controllable human image generation (HIG) has numerous real-life
applications. State-of-the-art solutions, such as ControlNet and T2I-Adapter,
introduce an additional learnable branch on top of the frozen pre-trained
stable diffusion (SD) model, which can enforce various conditions, including
skeleton guidance of HIG. While such a plug-and-play approach is appealing, the
inevitable and uncertain conflicts between the original images produced from
the frozen SD branch and the given condition incur significant challenges for
the learnable branch, which essentially conducts image feature editing for
condition enforcement. In this work, we propose a native skeleton-guided
diffusion model for controllable HIG called HumanSD. Instead of performing
image editing with dual-branch diffusion, we fine-tune the original SD model
using a novel heatmap-guided denoising loss. This strategy effectively and
efficiently strengthens the given skeleton condition during model training
while mitigating the catastrophic forgetting effects. HumanSD is fine-tuned on
the assembly of three large-scale human-centric datasets with text-image-pose
information, two of which are established in this work. As shown in Figure 1,
HumanSD outperforms ControlNet in terms of accurate pose control and image
quality, particularly when the given skeleton guidance is sophisticated
- …