3,637 research outputs found
RawNet: Fast End-to-End Neural Vocoder
Neural networks based vocoders have recently demonstrated the powerful
ability to synthesize high quality speech. These models usually generate
samples by conditioning on some spectrum features, such as Mel-spectrum.
However, these features are extracted by using speech analysis module including
some processing based on the human knowledge. In this work, we proposed RawNet,
a truly end-to-end neural vocoder, which use a coder network to learn the
higher representation of signal, and an autoregressive voder network to
generate speech sample by sample. The coder and voder together act like an
auto-encoder network, and could be jointly trained directly on raw waveform
without any human-designed features. The experiments on the Copy-Synthesis
tasks show that RawNet can achieve the comparative synthesized speech quality
with LPCNet, with a smaller model architecture and faster speech generation at
the inference step.Comment: Submitted to Interspeech 2019, Graz, Austri
Attention-Based End-to-End Speech Recognition on Voice Search
Recently, there has been a growing interest in end-to-end speech recognition
that directly transcribes speech to text without any predefined alignments. In
this paper, we explore the use of attention-based encoder-decoder model for
Mandarin speech recognition on a voice search task. Previous attempts have
shown that applying attention-based encoder-decoder to Mandarin speech
recognition was quite difficult due to the logographic orthography of Mandarin,
the large vocabulary and the conditional dependency of the attention model. In
this paper, we use character embedding to deal with the large vocabulary.
Several tricks are used for effective model training, including L2
regularization, Gaussian weight noise and frame skipping. We compare two
attention mechanisms and use attention smoothing to cover long context in the
attention model. Taken together, these tricks allow us to finally achieve a
character error rate (CER) of 3.58% and a sentence error rate (SER) of 7.43% on
the MiTV voice search dataset. While together with a trigram language model,
CER and SER reach 2.81% and 5.77%, respectively
Clothing Retrieval with Visual Attention Model
Clothing retrieval is a challenging problem in computer vision. With the
advance of Convolutional Neural Networks (CNNs), the accuracy of clothing
retrieval has been significantly improved. FashionNet[1], a recent study,
proposes to employ a set of artificial features in the form of landmarks for
clothing retrieval, which are shown to be helpful for retrieval. However, the
landmark detection module is trained with strong supervision which requires
considerable efforts to obtain. In this paper, we propose a self-learning
Visual Attention Model (VAM) to extract attention maps from clothing images.
The VAM is further connected to a global network to form an end-to-end network
structure through Impdrop connection which randomly Dropout on the feature maps
with the probabilities given by the attention map. Extensive experiments on
several widely used benchmark clothing retrieval data sets have demonstrated
the promise of the proposed method. We also show that compared to the trivial
Product connection, the Impdrop connection makes the network structure more
robust when training sets of limited size are used.Comment: 4 pages, to be presented at IEEE VCIP 201
Gas pressure sintering of BN/Si3N4 wave-transparent material with Y2O3–MgO nanopowders addition
AbstractBN/Si3N4 ceramics performed as wave-transparent material in spacecraft were fabricated with boron nitride powders, silicon nitride powders and Y2O3–MgO nanopowders by gas pressure sintering at 1700°C under 6MPa in N2 atmosphere. The effects of Y2O3–MgO nanopowders on densification, phase evolution, microstructure and mechanical properties of BN/Si3N4 material were investigated. The addition of Y2O3–MgO nanopowders was found beneficial to the mechanical properties of BN/Si3N4 composites. The BN/Si3N4 ceramics with 8wt% Y2O3–MgO nanopowders showed a relative density of 80.2%, combining a fracture toughness of 4.6MPam1/2 with an acceptable flexural strength of 396.5MPa
- …