1,733 research outputs found

    Self-Supervised Intrinsic Image Decomposition

    Full text link
    Intrinsic decomposition from a single image is a highly challenging task, due to its inherent ambiguity and the scarcity of training data. In contrast to traditional fully supervised learning approaches, in this paper we propose learning intrinsic image decomposition by explaining the input image. Our model, the Rendered Intrinsics Network (RIN), joins together an image decomposition pipeline, which predicts reflectance, shape, and lighting conditions given a single image, with a recombination function, a learned shading model used to recompose the original input based off of intrinsic image predictions. Our network can then use unsupervised reconstruction error as an additional signal to improve its intermediate representations. This allows large-scale unlabeled data to be useful during training, and also enables transferring learned knowledge to images of unseen object categories, lighting conditions, and shapes. Extensive experiments demonstrate that our method performs well on both intrinsic image decomposition and knowledge transfer.Comment: NIPS 2017 camera-ready version, project page: http://rin.csail.mit.edu

    Deep Learning Techniques for Music Generation -- A Survey

    Full text link
    This paper is a survey and an analysis of different ways of using deep learning (deep artificial neural networks) to generate musical content. We propose a methodology based on five dimensions for our analysis: Objective - What musical content is to be generated? Examples are: melody, polyphony, accompaniment or counterpoint. - For what destination and for what use? To be performed by a human(s) (in the case of a musical score), or by a machine (in the case of an audio file). Representation - What are the concepts to be manipulated? Examples are: waveform, spectrogram, note, chord, meter and beat. - What format is to be used? Examples are: MIDI, piano roll or text. - How will the representation be encoded? Examples are: scalar, one-hot or many-hot. Architecture - What type(s) of deep neural network is (are) to be used? Examples are: feedforward network, recurrent network, autoencoder or generative adversarial networks. Challenge - What are the limitations and open challenges? Examples are: variability, interactivity and creativity. Strategy - How do we model and control the process of generation? Examples are: single-step feedforward, iterative feedforward, sampling or input manipulation. For each dimension, we conduct a comparative analysis of various models and techniques and we propose some tentative multidimensional typology. This typology is bottom-up, based on the analysis of many existing deep-learning based systems for music generation selected from the relevant literature. These systems are described and are used to exemplify the various choices of objective, representation, architecture, challenge and strategy. The last section includes some discussion and some prospects.Comment: 209 pages. This paper is a simplified version of the book: J.-P. Briot, G. Hadjeres and F.-D. Pachet, Deep Learning Techniques for Music Generation, Computational Synthesis and Creative Systems, Springer, 201

    A Hardware Implementation of a Coherent SOQPSK-TG Demodulator for FEC Applications

    Get PDF
    This thesis presents a hardware design of a coherent demodulator for shaped offset quadrature phase shift keying, telemetry group version (SOQPSK-TG) for use in forward error correction (FEC) applications. Implementation details for data sequence detection, symbol timing synchronization, carrier phase synchronization, and block recovery are described. This decision-directed demodulator is based on maximum likelihood principles, and is efficiently implemented by the soft output Viterbi algorithm (SOVA). The design is intended for use in a field-programmable gate array (FPGA). Simulation results of the demodulator's performance in the additive white Gaussian noise channel are compared with a Matlab reference model that is known to be correct. In addition, hardware-specific parameters are presented. Finally, suggestions for future work and improvements are discussed

    Self-concatenated coding for wireless communication systems

    No full text
    In this thesis, we have explored self-concatenated coding schemes that are designed for transmission over Additive White Gaussian Noise (AWGN) and uncorrelated Rayleigh fading channels. We designed both the symbol-based Self-ConcatenatedCodes considered using Trellis Coded Modulation (SECTCM) and bit-based Self- Concatenated Convolutional Codes (SECCC) using a Recursive Systematic Convolutional (RSC) encoder as constituent codes, respectively. The design of these codes was carried out with the aid of Extrinsic Information Transfer (EXIT) charts. The EXIT chart based design has been found an efficient tool in finding the decoding convergence threshold of the constituent codes. Additionally, in order to recover the information loss imposed by employing binary rather than non-binary schemes, a soft decision demapper was introduced in order to exchange extrinsic information withthe SECCC decoder. To analyse this information exchange 3D-EXIT chart analysis was invoked for visualizing the extrinsic information exchange between the proposed Iteratively Decoding aided SECCC and soft-decision demapper (SECCC-ID). Some of the proposed SECTCM, SECCC and SECCC-ID schemes perform within about 1 dB from the AWGN and Rayleigh fading channels’ capacity. A union bound analysis of SECCC codes was carried out to find the corresponding Bit Error Ratio (BER) floors. The union bound of SECCCs was derived for communications over both AWGN and uncorrelated Rayleigh fading channels, based on a novel interleaver concept.Application of SECCCs in both UltraWideBand (UWB) and state-of-the-art video-telephone schemes demonstrated its practical benefits.In order to further exploit the benefits of the low complexity design offered by SECCCs we explored their application in a distributed coding scheme designed for cooperative communications, where iterative detection is employed by exchanging extrinsic information between the decoders of SECCC and RSC at the destination. In the first transmission period of cooperation, the relay receives the potentially erroneous data and attempts to recover the information. The recovered information is then re-encoded at the relay using an RSC encoder. In the second transmission period this information is then retransmitted to the destination. The resultant symbols transmitted from the source and relay nodes can be viewed as the coded symbols of a three-component parallel-concatenated encoder. At the destination a Distributed Binary Self-Concatenated Coding scheme using Iterative Decoding (DSECCC-ID) was employed, where the two decoders (SECCC and RSC) exchange their extrinsic information. It was shown that the DSECCC-ID is a low-complexity scheme, yet capable of approaching the Discrete-input Continuous-output Memoryless Channels’s (DCMC) capacity.Finally, we considered coding schemes designed for two nodes communicating with each other with the aid of a relay node, where the relay receives information from the two nodes in the first transmission period. At the relay node we combine a powerful Superposition Coding (SPC) scheme with SECCC. It is assumed that decoding errors may be encountered at the relay node. The relay node then broadcasts this information in the second transmission period after re-encoding it, again, using a SECCC encoder. At the destination, the amalgamated block of Successive Interference Cancellation (SIC) scheme combined with SECCC then detects and decodes the signal either with or without the aid of a priori information. Our simulation results demonstrate that the proposed scheme is capable of reliably operating at a low BER for transmission over both AWGN and uncorrelated Rayleigh fading channels. We compare the proposed scheme’s performance to a direct transmission link between the two sources having the same throughput

    Detection and Combining Techniques for Asynchronous Random Access with Time Diversity

    Full text link
    Asynchronous random access (RA) protocols are particularly attractive for their simplicity and avoidance of tight synchronization requirements. Recent enhancements have shown that the use of successive interference cancellation (SIC) can largely boost the performance of these schemes. A further step forward in the performance can be attained when diversity combining techniques are applied. In order to enable combining, the detection and association of the packets to their transmitters has to be done prior to decoding. We present a solution to this problem, that articulates into two phases. Non-coherent soft-correlation as well as interference-aware soft-correlation are used for packet detection. We evaluate the detection capabilities of both solutions via numerical simulations. We also evaluate numerically the spectral efficiency achieved by the proposed approach, highlighting its benefits.Comment: 6 pages, 7 figures. Work has been submitted to the 11th International ITG Conference on Systems, Communications and Coding 201

    LATR: 3D Lane Detection from Monocular Images with Transformer

    Full text link
    3D lane detection from monocular images is a fundamental yet challenging task in autonomous driving. Recent advances primarily rely on structural 3D surrogates (e.g., bird's eye view) built from front-view image features and camera parameters. However, the depth ambiguity in monocular images inevitably causes misalignment between the constructed surrogate feature map and the original image, posing a great challenge for accurate lane detection. To address the above issue, we present a novel LATR model, an end-to-end 3D lane detector that uses 3D-aware front-view features without transformed view representation. Specifically, LATR detects 3D lanes via cross-attention based on query and key-value pairs, constructed using our lane-aware query generator and dynamic 3D ground positional embedding. On the one hand, each query is generated based on 2D lane-aware features and adopts a hybrid embedding to enhance lane information. On the other hand, 3D space information is injected as positional embedding from an iteratively-updated 3D ground plane. LATR outperforms previous state-of-the-art methods on both synthetic Apollo, realistic OpenLane and ONCE-3DLanes by large margins (e.g., 11.4 gain in terms of F1 score on OpenLane). Code will be released at https://github.com/JMoonr/LATR .Comment: Accepted by ICCV2023 (Oral

    Cryptanalysis of the Fuzzy Vault for Fingerprints: Vulnerabilities and Countermeasures

    Get PDF
    Das Fuzzy Vault ist ein beliebter Ansatz, um die Minutien eines menschlichen Fingerabdrucks in einer Sicherheitsanwendung geschĂŒtzt zu speichern. In dieser Arbeit werden verschiedene Implementationen des Fuzzy Vault fĂŒr FingerabdrĂŒcke in verschiedenen Angriffsszenarien untersucht. Unsere Untersuchungen und Analysen bestĂ€tigen deutlich, dass die grĂ¶ĂŸte SchwĂ€che von Implementationen des Fingerabdruck Fuzzy Vaults seine hohe AnfĂ€lligkeit gegen False-Accept Angriffe ist. Als Gegenmaßnahme könnten mehrere Finger oder sogar mehrere biometrische Merkmale eines Menschen gleichzeitig verwendet werden. Allerdings besitzen traditionelle Fuzzy Vault Konstruktionen eine wesentliche SchwĂ€che: den Korrelationsangriff. Es ist bekannt, dass das Runden von Minutien auf ein starres System, diese SchwĂ€che beheben. Ausgehend davon schlagen wir eine Implementation vor. WĂŒrden nun Parameter traditioneller Konstruktionen ĂŒbernommen, so wĂŒrden wir einen signifikanten Verlust an Verifikations-Leistung hinnehmen mĂŒssen. In einem Training wird daher eine gute Parameterkonfiguration neu bestimmt. Um den Authentifizierungsaufwand praktikabel zu machen, verwenden wir einen randomisierten Dekodierer und zeigen, dass die erreichbaren Raten vergleichbar mit den Raten einer traditionellen Konstruktion sind. Wir folgern, dass das Fuzzy Vault ein denkbarer Ansatz bleibt, um die schwierige Aufgabe ein kryptographisch sicheres biometrisches Kryptosystem in Zukunft zu implementieren.The fuzzy fingerprint vault is a popular approach to protect a fingerprint's minutiae as a building block of a security application. In this thesis simulations of several attack scenarios are conducted against implementations of the fuzzy fingerprint vault from the literature. Our investigations clearly confirm that the weakest link in the fuzzy fingerprint vault is its high vulnerability to false-accept attacks. Therefore, multi-finger or even multi-biometric cryptosystems should be conceived. But there remains a risk that cannot be resolved by using more biometric information of an individual if features are protected using a traditional fuzzy vault construction: The correlation attack remains a weakness of such constructions. It is known that quantizing minutiae to a rigid system while filling the whole space with chaff makes correlation obsolete. Based on this approach, we propose an implementation. If parameters were adopted from a traditional fuzzy fingerprint vault implementation, we would experience a significant loss in authentication performance. Therefore, we perform a training to determine reasonable parameters for our implementation. Furthermore, to make authentication practical, the decoding procedure is proposed to be randomized. By running a performance evaluation on a dataset generally used, we find that achieving resistance against the correlation attack does not have to be at the cost of authentication performance. Finally, we conclude that fuzzy vault remains a possible construction for helping in solving the challenging task of implementing a cryptographically secure multi-biometric cryptosystem in future

    A vector quantization approach to universal noiseless coding and quantization

    Get PDF
    A two-stage code is a block code in which each block of data is coded in two stages: the first stage codes the identity of a block code among a collection of codes, and the second stage codes the data using the identified code. The collection of codes may be noiseless codes, fixed-rate quantizers, or variable-rate quantizers. We take a vector quantization approach to two-stage coding, in which the first stage code can be regarded as a vector quantizer that “quantizes” the input data of length n to one of a fixed collection of block codes. We apply the generalized Lloyd algorithm to the first-stage quantizer, using induced measures of rate and distortion, to design locally optimal two-stage codes. On a source of medical images, two-stage variable-rate vector quantizers designed in this way outperform standard (one-stage) fixed-rate vector quantizers by over 9 dB. The tail of the operational distortion-rate function of the first-stage quantizer determines the optimal rate of convergence of the redundancy of a universal sequence of two-stage codes. We show that there exist two-stage universal noiseless codes, fixed-rate quantizers, and variable-rate quantizers whose per-letter rate and distortion redundancies converge to zero as (k/2)n -1 log n, when the universe of sources has finite dimension k. This extends the achievability part of Rissanen's theorem from universal noiseless codes to universal quantizers. Further, we show that the redundancies converge as O(n-1) when the universe of sources is countable, and as O(n-1+ϵ) when the universe of sources is infinite-dimensional, under appropriate conditions

    Hardness of decoding quantum stabilizer codes

    Full text link
    In this article we address the computational hardness of optimally decoding a quantum stabilizer code. Much like classical linear codes, errors are detected by measuring certain check operators which yield an error syndrome, and the decoding problem consists of determining the most likely recovery given the syndrome. The corresponding classical problem is known to be NP-complete, and a similar decoding problem for quantum codes is also known to be NP-complete. However, this decoding strategy is not optimal in the quantum setting as it does not take into account error degeneracy, which causes distinct errors to have the same effect on the code. Here, we show that optimal decoding of stabilizer codes is computationally much harder than optimal decoding of classical linear codes, it is #P
    • 

    corecore