28 research outputs found

    Mutually Uncorrelated Primers for DNA-Based Data Storage

    Full text link
    We introduce the notion of weakly mutually uncorrelated (WMU) sequences, motivated by applications in DNA-based data storage systems and for synchronization of communication devices. WMU sequences are characterized by the property that no sufficiently long suffix of one sequence is the prefix of the same or another sequence. WMU sequences used for primer design in DNA-based data storage systems are also required to be at large mutual Hamming distance from each other, have balanced compositions of symbols, and avoid primer-dimer byproducts. We derive bounds on the size of WMU and various constrained WMU codes and present a number of constructions for balanced, error-correcting, primer-dimer free WMU codes using Dyck paths, prefix-synchronized and cyclic codes.Comment: 14 pages, 3 figures, 1 Table. arXiv admin note: text overlap with arXiv:1601.0817

    Bag-of-words representations for computer audition

    Get PDF
    Computer audition is omnipresent in everyday life, in applications ranging from personalised virtual agents to health care. From a technical point of view, the goal is to robustly classify the content of an audio signal in terms of a defined set of labels, such as, e.g., the acoustic scene, a medical diagnosis, or, in the case of speech, what is said or how it is said. Typical approaches employ machine learning (ML), which means that task-specific models are trained by means of examples. Despite recent successes in neural network-based end-to-end learning, taking the raw audio signal as input, models relying on hand-crafted acoustic features are still superior in some domains, especially for tasks where data is scarce. One major issue is nevertheless that a sequence of acoustic low-level descriptors (LLDs) cannot be fed directly into many ML algorithms as they require a static and fixed-length input. Moreover, also for dynamic classifiers, compressing the information of the LLDs over a temporal block by summarising them can be beneficial. However, the type of instance-level representation has a fundamental impact on the performance of the model. In this thesis, the so-called bag-of-audio-words (BoAW) representation is investigated as an alternative to the standard approach of statistical functionals. BoAW is an unsupervised method of representation learning, inspired from the bag-of-words method in natural language processing, forming a histogram of the terms present in a document. The toolkit openXBOW is introduced, enabling systematic learning and optimisation of these feature representations, unified across arbitrary modalities of numeric or symbolic descriptors. A number of experiments on BoAW are presented and discussed, focussing on a large number of potential applications and corresponding databases, ranging from emotion recognition in speech to medical diagnosis. The evaluations include a comparison of different acoustic LLD sets and configurations of the BoAW generation process. The key findings are that BoAW features are a meaningful alternative to statistical functionals, offering certain benefits, while being able to preserve the advantages of functionals, such as data-independence. Furthermore, it is shown that both representations are complementary and their fusion improves the performance of a machine listening system.Maschinelles Hören ist im täglichen Leben allgegenwärtig, mit Anwendungen, die von personalisierten virtuellen Agenten bis hin zum Gesundheitswesen reichen. Aus technischer Sicht besteht das Ziel darin, den Inhalt eines Audiosignals hinsichtlich einer Auswahl definierter Labels robust zu klassifizieren. Die Labels beschreiben bspw. die akustische Umgebung der Aufnahme, eine medizinische Diagnose oder - im Falle von Sprache - was gesagt wird oder wie es gesagt wird. Übliche Ansätze hierzu verwenden maschinelles Lernen, d.h., es werden anwendungsspezifische Modelle anhand von Beispieldaten trainiert. Trotz jüngster Erfolge beim Ende-zu-Ende-Lernen mittels neuronaler Netze, in welchen das unverarbeitete Audiosignal als Eingabe benutzt wird, sind Modelle, die auf definierten akustischen Merkmalen basieren, in manchen Bereichen weiterhin überlegen. Dies gilt im Besonderen für Einsatzzwecke, für die nur wenige Daten vorhanden sind. Allerdings besteht dabei das Problem, dass Zeitfolgen von akustischen Deskriptoren in viele Algorithmen des maschinellen Lernens nicht direkt eingespeist werden können, da diese eine statische Eingabe fester Länge benötigen. Außerdem kann es auch für dynamische (zeitabhängige) Klassifikatoren vorteilhaft sein, die Deskriptoren über ein gewisses Zeitintervall zusammenzufassen. Jedoch hat die Art der Merkmalsdarstellung einen grundlegenden Einfluss auf die Leistungsfähigkeit des Modells. In der vorliegenden Dissertation wird der sogenannte Bag-of-Audio-Words-Ansatz (BoAW) als Alternative zum Standardansatz der statistischen Funktionale untersucht. BoAW ist eine Methode des unüberwachten Lernens von Merkmalsdarstellungen, die von der Bag-of-Words-Methode in der Computerlinguistik inspiriert wurde, bei der ein Textdokument als Histogramm der vorkommenden Wörter beschrieben wird. Das Toolkit openXBOW wird vorgestellt, welches systematisches Training und Optimierung dieser Merkmalsdarstellungen - vereinheitlicht für beliebige Modalitäten mit numerischen oder symbolischen Deskriptoren - erlaubt. Es werden einige Experimente zum BoAW-Ansatz durchgeführt und diskutiert, die sich auf eine große Zahl möglicher Anwendungen und entsprechende Datensätze beziehen, von der Emotionserkennung in gesprochener Sprache bis zur medizinischen Diagnostik. Die Auswertungen beinhalten einen Vergleich verschiedener akustischer Deskriptoren und Konfigurationen der BoAW-Methode. Die wichtigsten Erkenntnisse sind, dass BoAW-Merkmalsvektoren eine geeignete Alternative zu statistischen Funktionalen darstellen, gewisse Vorzüge bieten und gleichzeitig wichtige Eigenschaften der Funktionale, wie bspw. die Datenunabhängigkeit, erhalten können. Zudem wird gezeigt, dass beide Darstellungen komplementär sind und eine Fusionierung die Leistungsfähigkeit eines Systems des maschinellen Hörens verbessert

    Code design for multiple-input multiple-output broadcast channels

    Get PDF
    Recent information theoretical results indicate that dirty-paper coding (DPC) achieves the entire capacity region of the Gaussian multiple-input multiple-output (MIMO) broadcast channel (BC). This thesis presents practical code designs for Gaussian BCs based on DPC. To simplify our designs, we assume constraints on the individual rates for each user instead of the customary constraint on transmitter power. The objective therefore is to minimize the transmitter power such that the practical decoders of all users are able to operate at the given rate constraints. The enabling element of our code designs is a practical DPC scheme based on nested turbo codes. We start with Cover's simplest two-user Gaussian BC as a toy example and present a code design that operates 1.44 dB away from the capacity region boundary at the transmission rate of 1 bit per sample per dimension for each user. Then we consider the case of the multiple-input multiple-output BC and develop a practical limit-approaching code design under the assumption that the channel state information is available perfectly at the receivers as well as at the transmitter. The optimal precoding strategy in this case can be derived by invoking duality between the MIMO BC and MIMO multiple access channel (MAC). However, this approach requires transformation of the optimal MAC covariances to their corresponding counterparts in the BC domain. To avoid these computationally complex transformations, we derive a closed-form expression for the optimal precoding matrix for the two-user case and use it to determine the optimal precoding strategy. For more than two users we propose a low-complexity suboptimal strategy, which, for three transmit antennas at the base station and three users (each with a single receive antenna), performs only 0.2 dB worse than the optimal scheme. Our obtained results are only 1.5 dB away from the capacity limit. Moreover simulations indicate that our practical DPC based scheme significantly outperforms the prevalent suboptimal strategies such as time division multiplexing and zero forcing beamforming. The drawback of DPC based designs is the requirement of channel state information at the transmitter. However, if the channel state information can be communicated back to the transmitter effectively, DPC does indeed have a promising future in code designs for MIMO BCs

    UbiPhysio: Support Daily Functioning, Fitness, and Rehabilitation with Action Understanding and Feedback in Natural Language

    Full text link
    We introduce UbiPhysio, a milestone framework that delivers fine-grained action description and feedback in natural language to support people's daily functioning, fitness, and rehabilitation activities. This expert-like capability assists users in properly executing actions and maintaining engagement in remote fitness and rehabilitation programs. Specifically, the proposed UbiPhysio framework comprises a fine-grained action descriptor and a knowledge retrieval-enhanced feedback module. The action descriptor translates action data, represented by a set of biomechanical movement features we designed based on clinical priors, into textual descriptions of action types and potential movement patterns. Building on physiotherapeutic domain knowledge, the feedback module provides clear and engaging expert feedback. We evaluated UbiPhysio's performance through extensive experiments with data from 104 diverse participants, collected in a home-like setting during 25 types of everyday activities and exercises. We assessed the quality of the language output under different tuning strategies using standard benchmarks. We conducted a user study to gather insights from clinical experts and potential users on our framework. Our initial tests show promise for deploying UbiPhysio in real-life settings without specialized devices.Comment: 27 pages, 14 figures, 5 table

    Wyner-Ziv coding based on TCQ and LDPC codes and extensions to multiterminal source coding

    Get PDF
    Driven by a host of emerging applications (e.g., sensor networks and wireless video), distributed source coding (i.e., Slepian-Wolf coding, Wyner-Ziv coding and various other forms of multiterminal source coding), has recently become a very active research area. In this thesis, we first design a practical coding scheme for the quadratic Gaussian Wyner-Ziv problem, because in this special case, no rate loss is suffered due to the unavailability of the side information at the encoder. In order to approach the Wyner-Ziv distortion limit D??W Z(R), the trellis coded quantization (TCQ) technique is employed to quantize the source X, and irregular LDPC code is used to implement Slepian-Wolf coding of the quantized source input Q(X) given the side information Y at the decoder. An optimal non-linear estimator is devised at the joint decoder to compute the conditional mean of the source X given the dequantized version of Q(X) and the side information Y . Assuming ideal Slepian-Wolf coding, our scheme performs only 0.2 dB away from the Wyner-Ziv limit D??W Z(R) at high rate, which mirrors the performance of entropy-coded TCQ in classic source coding. Practical designs perform 0.83 dB away from D??W Z(R) at medium rates. With 2-D trellis-coded vector quantization, the performance gap to D??W Z(R) is only 0.66 dB at 1.0 b/s and 0.47 dB at 3.3 b/s. We then extend the proposed Wyner-Ziv coding scheme to the quadratic Gaussian multiterminal source coding problem with two encoders. Both direct and indirect settings of multiterminal source coding are considered. An asymmetric code design containing one classical source coding component and one Wyner-Ziv coding component is first introduced and shown to be able to approach the corner points on the theoretically achievable limits in both settings. To approach any point on the theoretically achievable limits, a second approach based on source splitting is then described. One classical source coding component, two Wyner-Ziv coding components, and a linear estimator are employed in this design. Proofs are provided to show the achievability of any point on the theoretical limits in both settings by assuming that both the source coding and the Wyner-Ziv coding components are optimal. The performance of practical schemes is only 0.15 b/s away from the theoretical limits for the asymmetric approach, and up to 0.30 b/s away from the limits for the source splitting approach

    Systematic hybrid analog/digital signal coding

    Get PDF
    Thesis (Ph.D.)--Massachusetts Institute of Technology, Dept. of Electrical Engineering and Computer Science, 2000.Includes bibliographical references (p. 201-206).This thesis develops low-latency, low-complexity signal processing solutions for systematic source coding, or source coding with side information at the decoder. We consider an analog source signal transmitted through a hybrid channel that is the composition of two channels: a noisy analog channel through which the source is sent unprocessed and a secondary rate-constrained digital channel; the source is processed prior to transmission through the digital channel. The challenge is to design a digital encoder and decoder that provide a minimum-distortion reconstruction of the source at the decoder, which has observations of analog and digital channel outputs. The methods described in this thesis have importance to a wide array of applications. For example, in the case of in-band on-channel (IBOC) digital audio broadcast (DAB), an existing noisy analog communications infrastructure may be augmented by a low-bandwidth digital side channel for improved fidelity, while compatibility with existing analog receivers is preserved. Another application is a source coding scheme which devotes a fraction of available bandwidth to the analog source and the rest of the bandwidth to a digital representation. This scheme is applicable in a wireless communications environment (or any environment with unknown SNR), where analog transmission has the advantage of a gentle roll-off of fidelity with SNR. A very general paradigm for low-latency, low-complexity source coding is composed of three basic cascaded elements: 1) a space rotation, or transformation, 2) quantization, and 3) lossless bitstream coding. The paradigm has been applied with great success to conventional source coding, and it applies equally well to systematic source coding. Focusing on the case involving a Gaussian source, Gaussian channel and mean-squared distortion, we determine optimal or near-optimal components for each of the three elements, each of which has analogous components in conventional source coding. The space rotation can take many forms such as linear block transforms, lapped transforms, or subband decomposition, all for which we derive conditions of optimality. For a very general case we develop algorithms for the design of locally optimal quantizers. For the Gaussian case, we describe a low-complexity scalar quantizer, the nested lattice scalar quantizer, that has performance very near that of the optimal systematic scalar quantizer. Analogous to entropy coding for conventional source coding, Slepian-Wolf coding is shown to be an effective lossless bitstream coding stage for systematic source coding.by Richard J. Barron.Ph.D

    Multiterminal source coding: sum-rate loss, code designs, and applications to video sensor networks

    Get PDF
    Driven by a host of emerging applications (e.g., sensor networks and wireless video), distributed source coding (i.e., Slepian-Wolf coding, Wyner-Ziv coding and various other forms of multiterminal source coding), has recently become a very active research area. This dissertation focuses on multiterminal (MT) source coding problem, and consists of three parts. The first part studies the sum-rate loss of an important special case of quadratic Gaussian multi-terminal source coding, where all sources are positively symmetric and all target distortions are equal. We first give the minimum sum-rate for joint encoding of Gaussian sources in the symmetric case, and then show that the supremum of the sum-rate loss due to distributed encoding in this case is 1 2 log2 5 4 = 0:161 b/s when L = 2 and increases in the order of º L 2 log2 e b/s as the number of terminals L goes to infinity. The supremum sum-rate loss of 0:161 b/s in the symmetric case equals to that in general quadratic Gaussian two-terminal source coding without the symmetric assumption. It is conjectured that this equality holds for any number of terminals. In the second part, we present two practical MT coding schemes under the framework of Slepian-Wolf coded quantization (SWCQ) for both direct and indirect MT problems. The first, asymmetric SWCQ scheme relies on quantization and Wyner-Ziv coding, and it is implemented via source splitting to achieve any point on the sum-rate bound. In the second, conceptually simpler scheme, symmetric SWCQ, the two quantized sources are compressed using symmetric Slepian-Wolf coding via a channel code partitioning technique that is capable of achieving any point on the Slepian-Wolf sum-rate bound. Our practical designs employ trellis-coded quantization and turbo/LDPC codes for both asymmetric and symmetric Slepian-Wolf coding. Simulation results show a gap of only 0.139-0.194 bit per sample away from the sum-rate bound for both direct and indirect MT coding problems. The third part applies the above two MT coding schemes to two practical sources, i.e., stereo video sequences to save the sum rate over independent coding of both sequences. Experiments with both schemes on stereo video sequences using H.264, LDPC codes for Slepian-Wolf coding of the motion vectors, and scalar quantization in conjunction with LDPC codes for Wyner-Ziv coding of the residual coefficients give slightly smaller sum rate than separate H.264 coding of both sequences at the same video quality

    Distributed signal processing using nested lattice codes

    No full text
    Multi-Terminal Source Coding (MTSC) addresses the problem of compressing correlated sources without communication links among them. In this thesis, the constructive approach of this problem is considered in an algebraic framework and a system design is provided that can be applicable in a variety of settings. Wyner-Ziv problem is first investigated: coding of an independent and identically distributed (i.i.d.) Gaussian source with side information available only at the decoder in the form of a noisy version of the source to be encoded. Theoretical models are first established and derived for calculating distortion-rate functions. Then a few novel practical code implementations are proposed by using the strategy of multi-dimensional nested lattice/trellis coding. By investigating various lattices in the dimensions considered, analysis is given on how lattice properties affect performance. Also proposed are methods on choosing good sublattices in multiple dimensions. By introducing scaling factors, the relationship between distortion and scaling factor is examined for various rates. The best high-dimensional lattice using our scale-rotate method can achieve a performance less than 1 dB at low rates from the Wyner-Ziv limit; and random nested ensembles can achieve a 1.87 dB gap with the limit. Moreover, the code design is extended to incorporate with distributed compressive sensing (DCS). Theoretical framework is proposed and practical design using nested lattice/trellis is presented for various scenarios. By using nested trellis, the simulation shows a 3.42 dB gap from our derived bound for the DCS plus Wyner-Ziv framework

    Wyner-Ziv coding based on TCQ and LDPC codes and extensions to multiterminal source coding

    Get PDF
    Driven by a host of emerging applications (e.g., sensor networks and wireless video), distributed source coding (i.e., Slepian-Wolf coding, Wyner-Ziv coding and various other forms of multiterminal source coding), has recently become a very active research area. In this thesis, we first design a practical coding scheme for the quadratic Gaussian Wyner-Ziv problem, because in this special case, no rate loss is suffered due to the unavailability of the side information at the encoder. In order to approach the Wyner-Ziv distortion limit D??W Z(R), the trellis coded quantization (TCQ) technique is employed to quantize the source X, and irregular LDPC code is used to implement Slepian-Wolf coding of the quantized source input Q(X) given the side information Y at the decoder. An optimal non-linear estimator is devised at the joint decoder to compute the conditional mean of the source X given the dequantized version of Q(X) and the side information Y . Assuming ideal Slepian-Wolf coding, our scheme performs only 0.2 dB away from the Wyner-Ziv limit D??W Z(R) at high rate, which mirrors the performance of entropy-coded TCQ in classic source coding. Practical designs perform 0.83 dB away from D??W Z(R) at medium rates. With 2-D trellis-coded vector quantization, the performance gap to D??W Z(R) is only 0.66 dB at 1.0 b/s and 0.47 dB at 3.3 b/s. We then extend the proposed Wyner-Ziv coding scheme to the quadratic Gaussian multiterminal source coding problem with two encoders. Both direct and indirect settings of multiterminal source coding are considered. An asymmetric code design containing one classical source coding component and one Wyner-Ziv coding component is first introduced and shown to be able to approach the corner points on the theoretically achievable limits in both settings. To approach any point on the theoretically achievable limits, a second approach based on source splitting is then described. One classical source coding component, two Wyner-Ziv coding components, and a linear estimator are employed in this design. Proofs are provided to show the achievability of any point on the theoretical limits in both settings by assuming that both the source coding and the Wyner-Ziv coding components are optimal. The performance of practical schemes is only 0.15 b/s away from the theoretical limits for the asymmetric approach, and up to 0.30 b/s away from the limits for the source splitting approach
    corecore