11,501 research outputs found
ViP-CNN: Visual Phrase Guided Convolutional Neural Network
As the intermediate level task connecting image captioning and object
detection, visual relationship detection started to catch researchers'
attention because of its descriptive power and clear structure. It detects the
objects and captures their pair-wise interactions with a
subject-predicate-object triplet, e.g. person-ride-horse. In this paper, each
visual relationship is considered as a phrase with three components. We
formulate the visual relationship detection as three inter-connected
recognition problems and propose a Visual Phrase guided Convolutional Neural
Network (ViP-CNN) to address them simultaneously. In ViP-CNN, we present a
Phrase-guided Message Passing Structure (PMPS) to establish the connection
among relationship components and help the model consider the three problems
jointly. Corresponding non-maximum suppression method and model training
strategy are also proposed. Experimental results show that our ViP-CNN
outperforms the state-of-art method both in speed and accuracy. We further
pretrain ViP-CNN on our cleansed Visual Genome Relationship dataset, which is
found to perform better than the pretraining on the ImageNet for this task.Comment: 10 pages, 5 figures, accepted by CVPR 201
Exascale Deep Learning for Climate Analytics
We extract pixel-level masks of extreme weather patterns using variants of
Tiramisu and DeepLabv3+ neural networks. We describe improvements to the
software frameworks, input pipeline, and the network training algorithms
necessary to efficiently scale deep learning on the Piz Daint and Summit
systems. The Tiramisu network scales to 5300 P100 GPUs with a sustained
throughput of 21.0 PF/s and parallel efficiency of 79.0%. DeepLabv3+ scales up
to 27360 V100 GPUs with a sustained throughput of 325.8 PF/s and a parallel
efficiency of 90.7% in single precision. By taking advantage of the FP16 Tensor
Cores, a half-precision version of the DeepLabv3+ network achieves a peak and
sustained throughput of 1.13 EF/s and 999.0 PF/s respectively.Comment: 12 pages, 5 tables, 4, figures, Super Computing Conference November
11-16, 2018, Dallas, TX, US
CHORUS Deliverable 2.1: State of the Art on Multimedia Search Engines
Based on the information provided by European projects and national initiatives related to multimedia search as well as domains experts that participated in the CHORUS Think-thanks and workshops, this document reports on the state of the art related to multimedia content search from, a technical, and socio-economic perspective.
The technical perspective includes an up to date view on content based indexing and retrieval technologies, multimedia search in the context of mobile devices and peer-to-peer networks, and an overview of current evaluation and benchmark inititiatives to measure the performance of multimedia search engines.
From a socio-economic perspective we inventorize the impact and legal consequences of these technical advances and point out future directions of research
Spoken content retrieval: A survey of techniques and technologies
Speech media, that is, digital audio and video containing spoken content, has blossomed in recent years. Large collections are accruing on the Internet as well as in private and enterprise settings. This growth has motivated extensive research on techniques and technologies that facilitate reliable indexing and retrieval. Spoken content retrieval (SCR) requires the combination of audio and speech processing technologies with methods from information retrieval (IR). SCR research initially investigated planned speech structured in document-like units, but has subsequently shifted focus to more informal spoken content produced spontaneously, outside of the studio and in conversational settings. This survey provides an overview of the field of SCR encompassing component technologies, the relationship of SCR to text IR and automatic speech recognition and user interaction issues. It is aimed at researchers with backgrounds in speech technology or IR who are seeking deeper insight on how these fields are integrated to support research and development, thus addressing the core challenges of SCR
- âŠ