1,337 research outputs found
An Universal Image Attractiveness Ranking Framework
We propose a new framework to rank image attractiveness using a novel
pairwise deep network trained with a large set of side-by-side multi-labeled
image pairs from a web image index. The judges only provide relative ranking
between two images without the need to directly assign an absolute score, or
rate any predefined image attribute, thus making the rating more intuitive and
accurate. We investigate a deep attractiveness rank net (DARN), a combination
of deep convolutional neural network and rank net, to directly learn an
attractiveness score mean and variance for each image and the underlying
criteria the judges use to label each pair. The extension of this model
(DARN-V2) is able to adapt to individual judge's personal preference. We also
show the attractiveness of search results are significantly improved by using
this attractiveness information in a real commercial search engine. We evaluate
our model against other state-of-the-art models on our side-by-side web test
data and another public aesthetic data set. With much less judgments (1M vs
50M), our model outperforms on side-by-side labeled data, and is comparable on
data labeled by absolute score.Comment: Accepted by 2019 Winter Conference on Application of Computer Vision
(WACV
Multitask Learning Deep Neural Networks to Combine Revealed and Stated Preference Data
It is an enduring question how to combine revealed preference (RP) and stated
preference (SP) data to analyze travel behavior. This study presents a
framework of multitask learning deep neural networks (MTLDNNs) for this
question, and demonstrates that MTLDNNs are more generic than the traditional
nested logit (NL) method, due to its capacity of automatic feature learning and
soft constraints. About 1,500 MTLDNN models are designed and applied to the
survey data that was collected in Singapore and focused on the RP of four
current travel modes and the SP with autonomous vehicles (AV) as the one new
travel mode in addition to those in RP. We found that MTLDNNs consistently
outperform six benchmark models and particularly the classical NL models by
about 5% prediction accuracy in both RP and SP datasets. This performance
improvement can be mainly attributed to the soft constraints specific to
MTLDNNs, including its innovative architectural design and regularization
methods, but not much to the generic capacity of automatic feature learning
endowed by a standard feedforward DNN architecture. Besides prediction, MTLDNNs
are also interpretable. The empirical results show that AV is mainly the
substitute of driving and AV alternative-specific variables are more important
than the socio-economic variables in determining AV adoption. Overall, this
study introduces a new MTLDNN framework to combine RP and SP, and demonstrates
its theoretical flexibility and empirical power for prediction and
interpretation. Future studies can design new MTLDNN architectures to reflect
the speciality of RP and SP and extend this work to other behavioral analysis
Speech vocoding for laboratory phonology
Using phonological speech vocoding, we propose a platform for exploring
relations between phonology and speech processing, and in broader terms, for
exploring relations between the abstract and physical structures of a speech
signal. Our goal is to make a step towards bridging phonology and speech
processing and to contribute to the program of Laboratory Phonology. We show
three application examples for laboratory phonology: compositional phonological
speech modelling, a comparison of phonological systems and an experimental
phonological parametric text-to-speech (TTS) system. The featural
representations of the following three phonological systems are considered in
this work: (i) Government Phonology (GP), (ii) the Sound Pattern of English
(SPE), and (iii) the extended SPE (eSPE). Comparing GP- and eSPE-based vocoded
speech, we conclude that the latter achieves slightly better results than the
former. However, GP - the most compact phonological speech representation -
performs comparably to the systems with a higher number of phonological
features. The parametric TTS based on phonological speech representation, and
trained from an unlabelled audiobook in an unsupervised manner, achieves
intelligibility of 85% of the state-of-the-art parametric speech synthesis. We
envision that the presented approach paves the way for researchers in both
fields to form meaningful hypotheses that are explicitly testable using the
concepts developed and exemplified in this paper. On the one hand, laboratory
phonologists might test the applied concepts of their theoretical models, and
on the other hand, the speech processing community may utilize the concepts
developed for the theoretical phonological models for improvements of the
current state-of-the-art applications
- …