186 research outputs found
Synchronization recovery and state model reduction for soft decoding of variable length codes
Variable length codes exhibit de-synchronization problems when transmitted
over noisy channels. Trellis decoding techniques based on Maximum A Posteriori
(MAP) estimators are often used to minimize the error rate on the estimated
sequence. If the number of symbols and/or bits transmitted are known by the
decoder, termination constraints can be incorporated in the decoding process.
All the paths in the trellis which do not lead to a valid sequence length are
suppressed. This paper presents an analytic method to assess the expected error
resilience of a VLC when trellis decoding with a sequence length constraint is
used. The approach is based on the computation, for a given code, of the amount
of information brought by the constraint. It is then shown that this quantity
as well as the probability that the VLC decoder does not re-synchronize in a
strict sense, are not significantly altered by appropriate trellis states
aggregation. This proves that the performance obtained by running a
length-constrained Viterbi decoder on aggregated state models approaches the
one obtained with the bit/symbol trellis, with a significantly reduced
complexity. It is then shown that the complexity can be further decreased by
projecting the state model on two state models of reduced size
Particular object retrieval with integral max-pooling of CNN activations
Recently, image representation built upon Convolutional Neural Network (CNN)
has been shown to provide effective descriptors for image search, outperforming
pre-CNN features as short-vector representations. Yet such models are not
compatible with geometry-aware re-ranking methods and still outperformed, on
some particular object retrieval benchmarks, by traditional image search
systems relying on precise descriptor matching, geometric re-ranking, or query
expansion. This work revisits both retrieval stages, namely initial search and
re-ranking, by employing the same primitive information derived from the CNN.
We build compact feature vectors that encode several image regions without the
need to feed multiple inputs to the network. Furthermore, we extend integral
images to handle max-pooling on convolutional layer activations, allowing us to
efficiently localize matching objects. The resulting bounding box is finally
used for image re-ranking. As a result, this paper significantly improves
existing CNN-based recognition pipeline: We report for the first time results
competing with traditional methods on the challenging Oxford5k and Paris6k
datasets
Balancing clusters to reduce response time variability in large scale image search
Many algorithms for approximate nearest neighbor search in high-dimensional
spaces partition the data into clusters. At query time, in order to avoid
exhaustive search, an index selects the few (or a single) clusters nearest to
the query point. Clusters are often produced by the well-known -means
approach since it has several desirable properties. On the downside, it tends
to produce clusters having quite different cardinalities. Imbalanced clusters
negatively impact both the variance and the expectation of query response
times. This paper proposes to modify -means centroids to produce clusters
with more comparable sizes without sacrificing the desirable properties.
Experiments with a large scale collection of image descriptors show that our
algorithm significantly reduces the variance of response times without
seriously impacting the search quality
Orientation covariant aggregation of local descriptors with embeddings
Image search systems based on local descriptors typically achieve orientation
invariance by aligning the patches on their dominant orientations. Albeit
successful, this choice introduces too much invariance because it does not
guarantee that the patches are rotated consistently. This paper introduces an
aggregation strategy of local descriptors that achieves this covariance
property by jointly encoding the angle in the aggregation stage in a continuous
manner. It is combined with an efficient monomial embedding to provide a
codebook-free method to aggregate local descriptors into a single vector
representation. Our strategy is also compatible and employed with several
popular encoding methods, in particular bag-of-words, VLAD and the Fisher
vector. Our geometric-aware aggregation strategy is effective for image search,
as shown by experiments performed on standard benchmarks for image and
particular object retrieval, namely Holidays and Oxford buildings.Comment: European Conference on Computer Vision (2014
Low-shot learning with large-scale diffusion
This paper considers the problem of inferring image labels from images when
only a few annotated examples are available at training time. This setup is
often referred to as low-shot learning, where a standard approach is to
re-train the last few layers of a convolutional neural network learned on
separate classes for which training examples are abundant. We consider a
semi-supervised setting based on a large collection of images to support label
propagation. This is possible by leveraging the recent advances on large-scale
similarity graph construction.
We show that despite its conceptual simplicity, scaling label propagation up
to hundred millions of images leads to state of the art accuracy in the
low-shot learning regime
Memory vectors for similarity search in high-dimensional spaces
We study an indexing architecture to store and search in a database of
high-dimensional vectors from the perspective of statistical signal processing
and decision theory. This architecture is composed of several memory units,
each of which summarizes a fraction of the database by a single representative
vector. The potential similarity of the query to one of the vectors stored in
the memory unit is gauged by a simple correlation with the memory unit's
representative vector. This representative optimizes the test of the following
hypothesis: the query is independent from any vector in the memory unit vs. the
query is a simple perturbation of one of the stored vectors.
Compared to exhaustive search, our approach finds the most similar database
vectors significantly faster without a noticeable reduction in search quality.
Interestingly, the reduction of complexity is provably better in
high-dimensional spaces. We empirically demonstrate its practical interest in a
large-scale image search scenario with off-the-shelf state-of-the-art
descriptors.Comment: Accepted to IEEE Transactions on Big Dat
Circulant temporal encoding for video retrieval and temporal alignment
We address the problem of specific video event retrieval. Given a query video
of a specific event, e.g., a concert of Madonna, the goal is to retrieve other
videos of the same event that temporally overlap with the query. Our approach
encodes the frame descriptors of a video to jointly represent their appearance
and temporal order. It exploits the properties of circulant matrices to
efficiently compare the videos in the frequency domain. This offers a
significant gain in complexity and accurately localizes the matching parts of
videos. The descriptors can be compressed in the frequency domain with a
product quantizer adapted to complex numbers. In this case, video retrieval is
performed without decompressing the descriptors. We also consider the temporal
alignment of a set of videos. We exploit the matching confidence and an
estimate of the temporal offset computed for all pairs of videos by our
retrieval approach. Our robust algorithm aligns the videos on a global timeline
by maximizing the set of temporally consistent matches. The global temporal
alignment enables synchronous playback of the videos of a given scene
Codes robustes et codes joints source-canal pour transmission multimédia sur canaux mobiles
Some new error-resilient source coding and joint source/channel coding techniquesare proposed for the transmission of multimedia sources over error-prone channels.First, we introduce a class of entropy codes providing unequal error-resilience, i.e.providing some protection to the most sensitive information. These codes are thenextended to exploit the temporal dependencies. A new state model based on the aggregation of some states of the trellis is thenproposed and analyzed for soft source decoding of variable length codes with a lengthconstraint. It allows the weighting of the compromise between the estimation accuracyand the decoding complexity.Next, some paquetization methods are proposed to reduce the error propagationphenomenon of variable length codes.Finally, some re-writing rules are proposed to extend the binary codetree representationof entropy codes. The proposed representation allows in particular the designof codes with improved soft decoding performances.Cette thèse propose des codes robustes et des codes conjoints source/canal pourtransmettre des signaux multimédia sur des canaux bruités. Nous proposons des codesentropiques offrant une résistance intrinsèque aux données prioritaires. Ces codes sontétendus pour exploiter la dépendance temporelle du signal.Un nouveau modèle d’état est ensuite proposé et analysé pour le décodage souplede codes à longueur variable avec une contrainte de longueur. Il permet de réglerfinement le compromis performance de décodage/complexité.Nous proposons également de séparer, au niveau du codage entropique, les étapesde production des mots de codes et de paquétisation. Différentes stratégies de constructionde train binaire sont alors proposées.Enfin, la représentation en arbre binaire des codes entropiques est étendue enconsidérant des règles de ré-écriture. Cela permet en particulier d’obtenir des codesqui offrent des meilleures performances en décodage souple
- …