1,283 research outputs found
Dynamic texture recognition using time-causal and time-recursive spatio-temporal receptive fields
This work presents a first evaluation of using spatio-temporal receptive
fields from a recently proposed time-causal spatio-temporal scale-space
framework as primitives for video analysis. We propose a new family of video
descriptors based on regional statistics of spatio-temporal receptive field
responses and evaluate this approach on the problem of dynamic texture
recognition. Our approach generalises a previously used method, based on joint
histograms of receptive field responses, from the spatial to the
spatio-temporal domain and from object recognition to dynamic texture
recognition. The time-recursive formulation enables computationally efficient
time-causal recognition. The experimental evaluation demonstrates competitive
performance compared to state-of-the-art. Especially, it is shown that binary
versions of our dynamic texture descriptors achieve improved performance
compared to a large range of similar methods using different primitives either
handcrafted or learned from data. Further, our qualitative and quantitative
investigation into parameter choices and the use of different sets of receptive
fields highlights the robustness and flexibility of our approach. Together,
these results support the descriptive power of this family of time-causal
spatio-temporal receptive fields, validate our approach for dynamic texture
recognition and point towards the possibility of designing a range of video
analysis methods based on these new time-causal spatio-temporal primitives.Comment: 29 pages, 16 figure
FastCLIPstyler: Optimisation-free Text-based Image Style Transfer Using Style Representations
In recent years, language-driven artistic style transfer has emerged as a new
type of style transfer technique, eliminating the need for a reference style
image by using natural language descriptions of the style. The first model to
achieve this, called CLIPstyler, has demonstrated impressive stylisation
results. However, its lengthy optimisation procedure at runtime for each query
limits its suitability for many practical applications. In this work, we
present FastCLIPstyler, a generalised text-based image style transfer model
capable of stylising images in a single forward pass for arbitrary text inputs.
Furthermore, we introduce EdgeCLIPstyler, a lightweight model designed for
compatibility with resource-constrained devices. Through quantitative and
qualitative comparisons with state-of-the-art approaches, we demonstrate that
our models achieve superior stylisation quality based on measurable metrics
while offering significantly improved runtime efficiency, particularly on edge
devices.Comment: Accepted at the 2024 IEEE/CVF Winter Conference on Applications of
Computer Vision (WACV 2024
On-Bird Sound Recordings: Automatic Acoustic Recognition of Activities and Contexts
We introduce a novel approach to studying animal behaviour and the context in
which it occurs, through the use of microphone backpacks carried on the backs
of individual free-flying birds. These sensors are increasingly used by animal
behaviour researchers to study individual vocalisations of freely behaving
animals, even in the field. However such devices may record more than an
animals vocal behaviour, and have the potential to be used for investigating
specific activities (movement) and context (background) within which
vocalisations occur. To facilitate this approach, we investigate the automatic
annotation of such recordings through two different sound scene analysis
paradigms: a scene-classification method using feature learning, and an
event-detection method using probabilistic latent component analysis (PLCA). We
analyse recordings made with Eurasian jackdaws (Corvus monedula) in both
captive and field settings. Results are comparable with the state of the art in
sound scene analysis; we find that the current recognition quality level
enables scalable automatic annotation of audio logger data, given partial
annotation, but also find that individual differences between animals and/or
their backpacks limit the generalisation from one individual to another. we
consider the interrelation of 'scenes' and 'events' in this particular task,
and issues of temporal resolution
Efficient vanishing point detection method in unstructured road environments based on dark channel prior
Vanishing point detection is a key technique in the fields such as road detection, camera calibration and visual navigation. This study presents a new vanishing point detection method, which delivers efficiency by using a dark channel priorâbased segmentation method and an adaptive straight lines search mechanism in the road region. First, the dark channel prior information is used to segment the image into a series of regions. Then the straight lines are extracted from the region contours, and the straight lines in the road region are estimated by a vertical envelope and a perspective quadrilateral constraint. The vertical envelope roughly divides the whole image into sky region, vertical region and road region. The perspective quadrilateral constraint, as the authors defined herein, eliminates the vertical lines interference inside the road region to extract the approximate straight lines in the road region. Finally, the vanishing point is estimated by the meanshift clustering method, which are computed based on the proposed grouping strategies and the intersection principles. Experiments have been conducted with a large number of road images under different environmental conditions, and the results demonstrate that the authorsâ proposed algorithm can estimate vanishing point accurately and efficiently in unstructured road scenes
- âŠ