8,758 research outputs found
Near-capacity dirty-paper code design : a source-channel coding approach
This paper examines near-capacity dirty-paper code designs based on source-channel coding. We first point out that the performance loss in signal-to-noise ratio (SNR) in our code designs can be broken into the sum of the packing loss from channel coding and a modulo loss, which is a function of the granular loss from source coding and the target dirty-paper coding rate (or SNR). We then examine practical designs by combining trellis-coded quantization (TCQ) with both systematic and nonsystematic irregular repeat-accumulate (IRA) codes. Like previous approaches, we exploit the extrinsic information transfer (EXIT) chart technique for capacity-approaching IRA code design; but unlike previous approaches, we emphasize the role of strong source coding to achieve as much granular gain as possible using TCQ. Instead of systematic doping, we employ two relatively shifted TCQ codebooks, where the shift is optimized (via tuning the EXIT charts) to facilitate the IRA code design. Our designs synergistically combine TCQ with IRA codes so that they work together as well as they do individually. By bringing together TCQ (the best quantizer from the source coding community) and EXIT chart-based IRA code designs (the best from the channel coding community), we are able to approach the theoretical limit of dirty-paper coding. For example, at 0.25 bit per symbol (b/s), our best code design (with 2048-state TCQ) performs only 0.630 dB away from the Shannon capacity
Deterministic Polynomial-Time Algorithms for Designing Short DNA Words
Designing short DNA words is a problem of constructing a set (i.e., code) of
n DNA strings (i.e., words) with the minimum length such that the Hamming
distance between each pair of words is at least k and the n words satisfy a set
of additional constraints. This problem has applications in, e.g., DNA
self-assembly and DNA arrays. Previous works include those that extended
results from coding theory to obtain bounds on code and word sizes for
biologically motivated constraints and those that applied heuristic local
searches, genetic algorithms, and randomized algorithms. In particular, Kao,
Sanghi, and Schweller (2009) developed polynomial-time randomized algorithms to
construct n DNA words of length within a multiplicative constant of the
smallest possible word length (e.g., 9 max{log n, k}) that satisfy various sets
of constraints with high probability. In this paper, we give deterministic
polynomial-time algorithms to construct DNA words based on derandomization
techniques. Our algorithms can construct n DNA words of shorter length (e.g.,
2.1 log n + 6.28 k) and satisfy the same sets of constraints as the words
constructed by the algorithms of Kao et al. Furthermore, we extend these new
algorithms to construct words that satisfy a larger set of constraints for
which the algorithms of Kao et al. do not work.Comment: 27 page
Closed-loop control of complex networks : A trade-off between time and energy
W. L. is supported by the National Science Foundation of China (NSFC) (Grants No. 11322111 and No. 61773125). Y.-Z. S. is supported by the NSFC (Grant No. 61403393). Y.-C. L. acknowledges support from the Vannevar Bush Faculty Fellowship program sponsored by the Basic Research Office of the Assistant Secretary of Defense for Research and Engineering and funded by the Office of Naval Research through Grant No. N00014-16-1-2828. Y.-Z. S. and S.-Y. L. contributed equally to this work.Peer reviewedPublisher PD
Semi-Supervised Learning for Neural Machine Translation
While end-to-end neural machine translation (NMT) has made remarkable
progress recently, NMT systems only rely on parallel corpora for parameter
estimation. Since parallel corpora are usually limited in quantity, quality,
and coverage, especially for low-resource languages, it is appealing to exploit
monolingual corpora to improve NMT. We propose a semi-supervised approach for
training NMT models on the concatenation of labeled (parallel corpora) and
unlabeled (monolingual corpora) data. The central idea is to reconstruct the
monolingual corpora using an autoencoder, in which the source-to-target and
target-to-source translation models serve as the encoder and decoder,
respectively. Our approach can not only exploit the monolingual corpora of the
target language, but also of the source language. Experiments on the
Chinese-English dataset show that our approach achieves significant
improvements over state-of-the-art SMT and NMT systems.Comment: Corrected a typ
- …