58 research outputs found
Quality-Driven video analysis for the improvement of foreground segmentation
Tesis Doctoral inédita leída en la Universidad Autónoma de Madrid, Escuela Politécnica Superior, Departamento de Tecnología Electrónica y de las Comunicaciones.Fecha de lectura: 15-06-2018It was partially supported by the Spanish
Government (TEC2014-53176-R, HAVideo
LEA: Improving Sentence Similarity Robustness to Typos Using Lexical Attention Bias
Textual noise, such as typos or abbreviations, is a well-known issue that
penalizes vanilla Transformers for most downstream tasks. We show that this is
also the case for sentence similarity, a fundamental task in multiple domains,
e.g. matching, retrieval or paraphrasing. Sentence similarity can be approached
using cross-encoders, where the two sentences are concatenated in the input
allowing the model to exploit the inter-relations between them. Previous works
addressing the noise issue mainly rely on data augmentation strategies, showing
improved robustness when dealing with corrupted samples that are similar to the
ones used for training. However, all these methods still suffer from the token
distribution shift induced by typos. In this work, we propose to tackle textual
noise by equipping cross-encoders with a novel LExical-aware Attention module
(LEA) that incorporates lexical similarities between words in both sentences.
By using raw text similarities, our approach avoids the tokenization shift
problem obtaining improved robustness. We demonstrate that the attention bias
introduced by LEA helps cross-encoders to tackle complex scenarios with textual
noise, specially in domains with short-text descriptions and limited context.
Experiments using three popular Transformer encoders in five e-commerce
datasets for product matching show that LEA consistently boosts performance
under the presence of noise, while remaining competitive on the original
(clean) splits. We also evaluate our approach in two datasets for textual
entailment and paraphrasing showing that LEA is robust to typos in domains with
longer sentences and more natural context. Additionally, we thoroughly analyze
several design choices in our approach, providing insights about the impact of
the decisions made and fostering future research in cross-encoders dealing with
typos.Comment: KDD'23 conference (main research track). (*) These authors
contributed equall
Long-Term Stationary Object Detection Based on Spatio-Temporal Change Detection
Personal use of this material is permitted. Permission from IEEE must be obtained for all other uses, in any current or future media, including reprinting/republishing this material for advertising or promotional purposes, creating new collective works, for resale or redistribution to servers or lists, or reuse of any copyrighted component of this work in other works. D. Ortego, J. C. SanMiguel and J. M. Martínez, "Long-Term Stationary Object Detection Based on Spatio-Temporal Change Detection," in IEEE Signal Processing Letters, vol. 22, no. 12, pp. 2368-2372, Dec. 2015. doi: 10.1109/LSP.2015.2482598We present a block-wise approach to detect stationary objects based on spatio-Temporal change detection. First, block candidates are extracted by filtering out consecutive blocks containing moving objects. Then, an online clustering approach groups similar blocks at each spatial location over time via statistical variation of pixel ratios. The stability changes are identified by analyzing the relationships between the most repeated clusters at regular sampling instants. Finally, stationary objects are detected as those stability changes that exceed an alarm time and have not been visualized before. Unlike previous approaches making use of Background Subtraction, the proposed approach does not require foreground segmentation and provides robustness to illumination changes, crowds and intermittent object motion. The experiments over an heterogeneous dataset demonstrate the ability of the proposed approach for short-and long-Term operation while overcoming challenging issues.This work was partially supported by the Spanish Government (HA-Video TEC2014-5317-R) and by the TEC department (UAM)
Rejection based multipath reconstruction for background estimation in video sequences with stationary objects
This is the author’s version of a work that was accepted for publication in Computer Vision and Image Understanding. Changes resulting from the publishing process, such as peer review, editing, corrections, structural formatting, and other quality control mechanisms may not be reflected in this document. Changes may have been made to this work since it was submitted for publication. A definitive version was subsequently published in Computer Vision and Image Understanding, VOL147 (2016) DOI 10.1016/j.cviu.2016.03.012Background estimation in video consists in extracting a foreground-free image from a set of training frames. Moving and stationary objects may affect the background visibility, thus invalidating the assumption of many related literature where background is the temporal dominant data. In this paper, we present a temporal-spatial block-level approach for background estimation in video to cope with moving and stationary objects. First, a Temporal Analysis module obtains a compact representation of the training data by motion filtering and dimensionality reduction. Then, a threshold-free hierarchical clustering determines a set of candidates to represent the background for each spatial location (block). Second, a Spatial Analysis module iteratively reconstructs the background using these candidates. For each spatial location, multiple reconstruction hypotheses (paths) are explored to obtain its neighboring locations by enforcing inter-block similarities and intra-block homogeneity constraints in terms of color discontinuity, color dissimilarity and variability. The experimental results show that the proposed approach outperforms the related state-of-the-art over challenging video sequences in presence of moving and stationary objects.This work was partially supported by the Spanish Government (HAVideo, TEC2014-53176-R) and by the TEC department (Universidad Autónoma de Madrid)
Unsupervised Contrastive Learning of Sound Event Representations
Self-supervised representation learning can mitigate the limitations in
recognition tasks with few manually labeled data but abundant unlabeled
data---a common scenario in sound event research. In this work, we explore
unsupervised contrastive learning as a way to learn sound event
representations. To this end, we propose to use the pretext task of contrasting
differently augmented views of sound events. The views are computed primarily
via mixing of training examples with unrelated backgrounds, followed by other
data augmentations. We analyze the main components of our method via ablation
experiments. We evaluate the learned representations using linear evaluation,
and in two in-domain downstream sound event classification tasks, namely, using
limited manually labeled data, and using noisy labeled data. Our results
suggest that unsupervised contrastive pre-training can mitigate the impact of
data scarcity and increase robustness against noisy labels, outperforming
supervised baselines.Comment: A 4-page version is submitted to ICASSP 202
Reliable Label Bootstrapping for Semi-Supervised Learning
Reducing the amount of labels required to train convolutional neural networks
without performance degradation is key to effectively reduce human annotation
efforts. We propose Reliable Label Bootstrapping (ReLaB), an unsupervised
preprossessing algorithm which improves the performance of semi-supervised
algorithms in extremely low supervision settings. Given a dataset with few
labeled samples, we first learn meaningful self-supervised, latent features for
the data. Second, a label propagation algorithm propagates the known labels on
the unsupervised features, effectively labeling the full dataset in an
automatic fashion. Third, we select a subset of correctly labeled (reliable)
samples using a label noise detection algorithm. Finally, we train a
semi-supervised algorithm on the extended subset. We show that the selection of
the network architecture and the self-supervised algorithm are important
factors to achieve successful label propagation and demonstrate that ReLaB
substantially improves semi-supervised learning in scenarios of very limited
supervision on CIFAR-10, CIFAR-100 and mini-ImageNet. We reach average error
rates of with 1 random labeled sample per class on
CIFAR-10 and lower this error to when the labeled sample in
each class is highly representative. Our work is fully reproducible:
https://github.com/PaulAlbert31/ReLaB.Comment: 10 pages, 3 figure
- …