Search CORE

26,334 research outputs found

On Low-Resolution ADCs in Practical 5G Millimeter-Wave Massive MIMO Systems

Author: Dai Lnglong
Hanzo Lajos
Li Xu
Liu Ying
Zhang Jiayi
Publication venue: 'Institute of Electrical and Electronics Engineers (IEEE)'
Publication date: 20/03/2018
Field of study

Nowadays, millimeter-wave (mmWave) massive multiple-input multiple-output (MIMO) systems is a favorable candidate for the fifth generation (5G) cellular systems. However, a key challenge is the high power consumption imposed by its numerous radio frequency (RF) chains, which may be mitigated by opting for low-resolution analog-to-digital converters (ADCs), whilst tolerating a moderate performance loss. In this article, we discuss several important issues based on the most recent research on mmWave massive MIMO systems relying on low-resolution ADCs. We discuss the key transceiver design challenges including channel estimation, signal detector, channel information feedback and transmit precoding. Furthermore, we introduce a mixed-ADC architecture as an alternative technique of improving the overall system performance. Finally, the associated challenges and potential implementations of the practical 5G mmWave massive MIMO system {with ADC quantizers} are discussed.Comment: to appear in IEEE Communications Magazin

arXiv.org e-Print Archive

Southampton (e-Prints Soton)

Crossref

Deep Multimodal Speaker Naming

Author: Dai Jingwen
Hu Yongtao
Ren Jimmy
Wang Wenping
Xu Li
Yuan Chang
Publication venue: 'Association for Computing Machinery (ACM)'
Publication date: 17/07/2015
Field of study

Automatic speaker naming is the problem of localizing as well as identifying each speaking character in a TV/movie/live show video. This is a challenging problem mainly attributes to its multimodal nature, namely face cue alone is insufficient to achieve good performance. Previous multimodal approaches to this problem usually process the data of different modalities individually and merge them using handcrafted heuristics. Such approaches work well for simple scenes, but fail to achieve high performance for speakers with large appearance variations. In this paper, we propose a novel convolutional neural networks (CNN) based learning framework to automatically learn the fusion function of both face and audio cues. We show that without using face tracking, facial landmark localization or subtitle/transcript, our system with robust multimodal feature extraction is able to achieve state-of-the-art speaker naming performance evaluated on two diverse TV series. The dataset and implementation of our algorithm are publicly available online

arXiv.org e-Print Archive

Crossref