15,381 research outputs found
Adaptive multimodal continuous ant colony optimization
Seeking multiple optima simultaneously, which multimodal optimization aims at, has attracted increasing attention but remains challenging. Taking advantage of ant colony optimization algorithms in preserving high diversity, this paper intends to extend ant colony optimization algorithms to deal with multimodal optimization. First, combined with current niching methods, an adaptive multimodal continuous ant colony optimization algorithm is introduced. In this algorithm, an adaptive parameter adjustment is developed, which takes the difference among niches into consideration. Second, to accelerate convergence, a differential evolution mutation operator is alternatively utilized to build base vectors for ants to construct new solutions. Then, to enhance the exploitation, a local search scheme based on Gaussian distribution is self-adaptively performed around the seeds of niches. Together, the proposed algorithm affords a good balance between exploration and exploitation. Extensive experiments on 20 widely used benchmark multimodal functions are conducted to investigate the influence of each algorithmic component and results are compared with several state-of-the-art multimodal algorithms and winners of competitions on multimodal optimization. These comparisons demonstrate the competitive efficiency and effectiveness of the proposed algorithm, especially in dealing with complex problems with high numbers of local optima
Multimodal nested sampling: an efficient and robust alternative to MCMC methods for astronomical data analysis
In performing a Bayesian analysis of astronomical data, two difficult
problems often emerge. First, in estimating the parameters of some model for
the data, the resulting posterior distribution may be multimodal or exhibit
pronounced (curving) degeneracies, which can cause problems for traditional
MCMC sampling methods. Second, in selecting between a set of competing models,
calculation of the Bayesian evidence for each model is computationally
expensive. The nested sampling method introduced by Skilling (2004), has
greatly reduced the computational expense of calculating evidences and also
produces posterior inferences as a by-product. This method has been applied
successfully in cosmological applications by Mukherjee et al. (2006), but their
implementation was efficient only for unimodal distributions without pronounced
degeneracies. Shaw et al. (2007), recently introduced a clustered nested
sampling method which is significantly more efficient in sampling from
multimodal posteriors and also determines the expectation and variance of the
final evidence from a single run of the algorithm, hence providing a further
increase in efficiency. In this paper, we build on the work of Shaw et al. and
present three new methods for sampling and evidence evaluation from
distributions that may contain multiple modes and significant degeneracies; we
also present an even more efficient technique for estimating the uncertainty on
the evaluated evidence. These methods lead to a further substantial improvement
in sampling efficiency and robustness, and are applied to toy problems to
demonstrate the accuracy and economy of the evidence calculation and parameter
estimation. Finally, we discuss the use of these methods in performing Bayesian
object detection in astronomical datasets.Comment: 14 pages, 11 figures, submitted to MNRAS, some major additions to the
previous version in response to the referee's comment
Video Captioning with Guidance of Multimodal Latent Topics
The topic diversity of open-domain videos leads to various vocabularies and
linguistic expressions in describing video contents, and therefore, makes the
video captioning task even more challenging. In this paper, we propose an
unified caption framework, M&M TGM, which mines multimodal topics in
unsupervised fashion from data and guides the caption decoder with these
topics. Compared to pre-defined topics, the mined multimodal topics are more
semantically and visually coherent and can reflect the topic distribution of
videos better. We formulate the topic-aware caption generation as a multi-task
learning problem, in which we add a parallel task, topic prediction, in
addition to the caption task. For the topic prediction task, we use the mined
topics as the teacher to train a student topic prediction model, which learns
to predict the latent topics from multimodal contents of videos. The topic
prediction provides intermediate supervision to the learning process. As for
the caption task, we propose a novel topic-aware decoder to generate more
accurate and detailed video descriptions with the guidance from latent topics.
The entire learning procedure is end-to-end and it optimizes both tasks
simultaneously. The results from extensive experiments conducted on the MSR-VTT
and Youtube2Text datasets demonstrate the effectiveness of our proposed model.
M&M TGM not only outperforms prior state-of-the-art methods on multiple
evaluation metrics and on both benchmark datasets, but also achieves better
generalization ability.Comment: ACM Multimedia 201
A sticky HDP-HMM with application to speaker diarization
We consider the problem of speaker diarization, the problem of segmenting an
audio recording of a meeting into temporal segments corresponding to individual
speakers. The problem is rendered particularly difficult by the fact that we
are not allowed to assume knowledge of the number of people participating in
the meeting. To address this problem, we take a Bayesian nonparametric approach
to speaker diarization that builds on the hierarchical Dirichlet process hidden
Markov model (HDP-HMM) of Teh et al. [J. Amer. Statist. Assoc. 101 (2006)
1566--1581]. Although the basic HDP-HMM tends to over-segment the audio
data---creating redundant states and rapidly switching among them---we describe
an augmented HDP-HMM that provides effective control over the switching rate.
We also show that this augmentation makes it possible to treat emission
distributions nonparametrically. To scale the resulting architecture to
realistic diarization problems, we develop a sampling algorithm that employs a
truncated approximation of the Dirichlet process to jointly resample the full
state sequence, greatly improving mixing rates. Working with a benchmark NIST
data set, we show that our Bayesian nonparametric architecture yields
state-of-the-art speaker diarization results.Comment: Published in at http://dx.doi.org/10.1214/10-AOAS395 the Annals of
Applied Statistics (http://www.imstat.org/aoas/) by the Institute of
Mathematical Statistics (http://www.imstat.org
Exploiting multimedia in creating and analysing multimedia Web archives
The data contained on the web and the social web are inherently multimedia and consist of a mixture of textual, visual and audio modalities. Community memories embodied on the web and social web contain a rich mixture of data from these modalities. In many ways, the web is the greatest resource ever created by human-kind. However, due to the dynamic and distributed nature of the web, its content changes, appears and disappears on a daily basis. Web archiving provides a way of capturing snapshots of (parts of) the web for preservation and future analysis. This paper provides an overview of techniques we have developed within the context of the EU funded ARCOMEM (ARchiving COmmunity MEMories) project to allow multimedia web content to be leveraged during the archival process and for post-archival analysis. Through a set of use cases, we explore several practical applications of multimedia analytics within the realm of web archiving, web archive analysis and multimedia data on the web in general
Quality-based Multimodal Classification Using Tree-Structured Sparsity
Recent studies have demonstrated advantages of information fusion based on
sparsity models for multimodal classification. Among several sparsity models,
tree-structured sparsity provides a flexible framework for extraction of
cross-correlated information from different sources and for enforcing group
sparsity at multiple granularities. However, the existing algorithm only solves
an approximated version of the cost functional and the resulting solution is
not necessarily sparse at group levels. This paper reformulates the
tree-structured sparse model for multimodal classification task. An accelerated
proximal algorithm is proposed to solve the optimization problem, which is an
efficient tool for feature-level fusion among either homogeneous or
heterogeneous sources of information. In addition, a (fuzzy-set-theoretic)
possibilistic scheme is proposed to weight the available modalities, based on
their respective reliability, in a joint optimization problem for finding the
sparsity codes. This approach provides a general framework for quality-based
fusion that offers added robustness to several sparsity-based multimodal
classification algorithms. To demonstrate their efficacy, the proposed methods
are evaluated on three different applications - multiview face recognition,
multimodal face recognition, and target classification.Comment: To Appear in 2014 IEEE Conference on Computer Vision and Pattern
Recognition (CVPR 2014
- âŠ