59 research outputs found
ΠΡΠΎΠ΅ΠΊΡΠΈΡΠΎΠ²Π°Π½ΠΈΠ΅ ΠΈ ΡΠ΅Π°Π»ΠΈΠ·Π°ΡΠΈΡ ΠΌΠΎΠ΄ΡΠ»Ρ Π°Π½Π°Π»ΠΈΠ·Π° ΠΏΡΠ±Π»ΠΈΠΊΠ°ΡΠΈΠΎΠ½Π½ΠΎΠΉ Π°ΠΊΡΠΈΠ²Π½ΠΎΡΡΠΈ ΡΠ½ΠΈΠ²Π΅ΡΡΠΈΡΠ΅ΡΠ°
Π Π°Π±ΠΎΡΠ° Π½Π°ΠΏΡΠ°Π²Π»Π΅Π½Π° Π½Π° ΡΠΎΠ·Π΄Π°Π½ΠΈΠ΅ ΠΏΡΠΎΠ³ΡΠ°ΠΌΠΌΠ½ΠΎΠ³ΠΎ ΡΠ΅ΡΠ΅Π½ΠΈΡ, Π°Π²ΡΠΎΠΌΠ°ΡΠΈΠ·ΠΈΡΡΡΡΠ΅Π³ΠΎ ΡΡΠ°ΡΠΈΡΡΠΈΡΠ΅ΡΠΊΠΈΠΉ Π°Π½Π°Π»ΠΈΠ· ΠΏΡΠ±Π»ΠΈΠΊΠ°ΡΠΈΠΎΠ½Π½ΠΎΠΉ Π°ΠΊΡΠΈΠ²Π½ΠΎΡΡΠΈ Π½Π°ΡΡΠ½ΠΎ-ΠΎΠ±ΡΠ°Π·ΠΎΠ²Π°ΡΠ΅Π»ΡΠ½ΠΎΠΉ ΠΎΡΠ³Π°Π½ΠΈΠ·Π°ΡΠΈΠΈ.
Π Π΅Π·ΡΠ»ΡΡΠ°ΡΠΎΠΌ ΡΠ°Π±ΠΎΡΡ ΡΠ²Π»ΡΠ΅ΡΡΡ ΠΌΠΎΠ΄ΡΠ»Ρ, ΠΎΠ±Π΅ΡΠΏΠ΅ΡΠΈΠ²Π°ΡΡΠΈΠΉ ΠΈΠ½ΡΠΎΡΠΌΠ°ΡΠΈΠΎΠ½Π½ΡΡ ΠΏΠΎΠ΄Π΄Π΅ΡΠΆΠΊΡ ΠΏΡΠΎΡΠ΅ΡΡΠΎΠ² ΡΠΏΡΠ°Π²Π»Π΅Π½ΠΈΡ Π½Π°ΡΡΠ½ΠΎΠΉ Π΄Π΅ΡΡΠ΅Π»ΡΠ½ΠΎΡΡΡΡ, Π²ΠΊΠ»ΡΡΠ°Ρ ΠΎΡΠ΅Π½ΠΊΡ Π°ΠΊΡΡΠ°Π»ΡΠ½ΠΎΡΡΠΈ ΡΠ΅Π°Π»ΠΈΠ·ΡΠ΅ΠΌΡΡ
Π½Π°ΡΡΠ½ΠΎ-ΠΈΡΡΠ»Π΅Π΄ΠΎΠ²Π°ΡΠ΅Π»ΡΡΠΊΠΈΡ
ΠΏΡΠΎΠ΅ΠΊΡΠΎΠ².The work is aimed at creating a software solution that automates the statistical analysis of the publication activity of a scientific and educational organization.
The result of the work is a module that provides information support for the management of scientific activities, including an assessment of the relevance of ongoing research projects
Object-Centric Learning with Slot Attention
Learning object-centric representations of complex scenes is a promising step
towards enabling efficient abstract reasoning from low-level perceptual
features. Yet, most deep learning approaches learn distributed representations
that do not capture the compositional properties of natural scenes. In this
paper, we present the Slot Attention module, an architectural component that
interfaces with perceptual representations such as the output of a
convolutional neural network and produces a set of task-dependent abstract
representations which we call slots. These slots are exchangeable and can bind
to any object in the input by specializing through a competitive procedure over
multiple rounds of attention. We empirically demonstrate that Slot Attention
can extract object-centric representations that enable generalization to unseen
compositions when trained on unsupervised object discovery and supervised
property prediction tasks
Video OWL-ViT: Temporally-consistent open-world localization in video
We present an architecture and a training recipe that adapts pre-trained
open-world image models to localization in videos. Understanding the open
visual world (without being constrained by fixed label spaces) is crucial for
many real-world vision tasks. Contrastive pre-training on large image-text
datasets has recently led to significant improvements for image-level tasks.
For more structured tasks involving object localization applying pre-trained
models is more challenging. This is particularly true for video tasks, where
task-specific data is limited. We show successful transfer of open-world models
by building on the OWL-ViT open-vocabulary detection model and adapting it to
video by adding a transformer decoder. The decoder propagates object
representations recurrently through time by using the output tokens for one
frame as the object queries for the next. Our model is end-to-end trainable on
video data and enjoys improved temporal consistency compared to
tracking-by-detection baselines, while retaining the open-world capabilities of
the backbone detector. We evaluate our model on the challenging TAO-OW
benchmark and demonstrate that open-world capabilities, learned from
large-scale image-text pre-training, can be transferred successfully to
open-world localization across diverse videos.Comment: ICCV 202
- β¦