16 research outputs found

    MnTTS: An Open-Source Mongolian Text-to-Speech Synthesis Dataset and Accompanied Baseline

    Full text link
    This paper introduces a high-quality open-source text-to-speech (TTS) synthesis dataset for Mongolian, a low-resource language spoken by over 10 million people worldwide. The dataset, named MnTTS, consists of about 8 hours of transcribed audio recordings spoken by a 22-year-old professional female Mongolian announcer. It is the first publicly available dataset developed to promote Mongolian TTS applications in both academia and industry. In this paper, we share our experience by describing the dataset development procedures and faced challenges. To demonstrate the reliability of our dataset, we built a powerful non-autoregressive baseline system based on FastSpeech2 model and HiFi-GAN vocoder, and evaluated it using the subjective mean opinion score (MOS) and real time factor (RTF) metrics. Evaluation results show that the powerful baseline system trained on our dataset achieves MOS above 4 and RTF about 3.30×1013.30\times10^{-1}, which makes it applicable for practical use. The dataset, training recipe, and pretrained TTS models are freely available \footnote{\label{github}\url{https://github.com/walker-hyf/MnTTS}}.Comment: Accepted at the 2022 International Conference on Asian Language Processing (IALP2022

    Contour detection network for zero-shot sketch-based image retrieval

    No full text
    Abstract Zero-shot sketch-based image retrieval (ZS-SBIR) is a challenging task that involves searching natural images related to a given hand-drawn sketch under the zero-shot scene. The previous approach projected image and sketch features into a low-dimensional common space for retrieval, and used semantic features to transfer the knowledge of seen to unseen classes. However, it is not effective enough to align multimodal features when projecting them into a common space, since the styles and contents of sketches and natural images are different and they are not one-to-one correspondence. To solve this problem, we propose a novel three-branch joint training network with contour detection network (called CDNNet) for the ZS-SBIR task, which uses contour maps as a bridge to align sketches and natural images to alleviate the domain gap. Specifically, we use semantic metrics to constrain the relationship between contour images and natural images and between contour images and sketches, so that natural image and sketch features can be aligned in the common space. Meanwhile, we further employ second-order attention to capture target subject information to increase the performance of retrieval descriptors. In addition, we use a teacher model and word embedding method to transfer the knowledge of the seen to the unseen classes. Extensive experiments on two large-scale datasets demonstrate that our proposed approach outperforms state-of-the-art CNN-based models: it improves by 2.6% on the Sketchy and 1.2% on TU-Berlin datasets in terms of mAP

    Prevention of Bone Cement Displacement in Kümmell Disease without Neurological Deficits through Treatment with a Novel Hollow Pedicle Screw Combined with Kyphoplasty

    No full text
    Objective Displacement of bone cement following percutaneous vertebral augmentation for Kümmell disease (KD) presents a significant concern, resulting in increasing back pain and compromising daily activities. Unfortunately, current literature does not yet establish a validated and minimally invasive surgical intervention for this issue. This study aims to investigate the effects of a novel hollow pedicle screw combined with kyphoplasty (HPS‐KP) in preventing bone cement displacement following simply percutaneous kyphoplasty for the management of KD. Methods A total of 22 patients (six males, 16 females, averagely aged 77.18 ± 7.63 years) with KD without neurological deficits treated by HPS‐KP at the hospital between March 2021 and June 2022 were hereby selected, among which, there were three stage I KD cases, 12 stage II KD cases, and seven stage III KD cases according to Li's classification. Bone mineral density (BMD), spinal X‐ray, computed tomography (CT), and magnetic resonance imaging (MRI) were examined before the operation. The operation time, intraoperative blood loss, and postoperative complications were all recorded. The follow‐up focused on visual analog scale (VAS) score, Oswestry dysfunction index (ODI), anterior vertebral height (AVH), middle vertebral height (MVH), posterior vertebral height (PVH), wedge‐shape affected vertebral Cobb angle (WCA), and bisegmental Cobb angle (BCA). One‐way analysis of variance (ANOVA) followed by Bonferroni post‐hoc test was employed for performing multiple comparisons in the present study. Results All patients having received the operation successfully were followed up for more than 8 months (ranging from 8 to 18 months). The operation time, intraoperative blood loss, and BMD (T‐score) were 39.09 ± 5.64 min, 14.09 ± 3.98 ml, and − 3.30 ± 0.90 g/cm3, respectively. Statistically significant differences were observed in the VAS score, ODI, AVH, MVH, and WCA (All p  0.05). During follow‐up, five patients suffered from bone cement leakage, and one presented an adjacent vertebral fracture and no bone cement displacement. Conclusion HPS‐KP could be safe and effective in the treatment of KD without neurological deficits, effectively relieving the symptoms of patients, restoring partial vertebral height, and preventing the occurrence of bone cement displacement

    Modeling Prosodic Phrasing With Multi-Task Learning in Tacotron-Based TTS

    No full text

    Comparative Study for Multi-Speaker Mongolian TTS with a New Corpus

    No full text
    Low-resource text-to-speech synthesis is a very promising research direction. Mongolian is the official language of the Inner Mongolia Autonomous Region and is spoken by more than 10 million people worldwide. Mongolian, as a representative low-resource language, has a relative lack of open-source datasets for its TTS. Therefore, we make public an open-source multi-speaker Mongolian TTS dataset, named MnTTS2, for related researchers. In this work, we invited three Mongolian announcers to record topic-rich speeches. Each announcer recorded 10 h of Mongolian speech, and the whole dataset was 30 h in total. In addition, we built two baseline systems based on state-of-the-art neural architectures, including a multi-speaker Fastspeech 2 model with HiFi-GAN vocoder and a full end-to-end VITS model for multi-speakers. On the system of FastSpeech2+HiFi-GAN, the three speakers scored 4.0 or higher on both naturalness evaluation and speaker similarity. In addition, the three speakers achieved scores of 4.5 or higher on the VITS model for naturalness evaluation and speaker similarity scores. The experimental results show that the published MnTTS2 dataset can be used to build robust Mongolian multi-speaker TTS models

    Prediction of the Ibuprofen Loading Capacity of MOFs by Machine Learning

    No full text
    Metal-organic frameworks (MOFs) have been widely researched as drug delivery systems due to their intrinsic porous structures. Herein, machine learning (ML) technologies were applied for the screening of MOFs with high drug loading capacity. To achieve this, first, a comprehensive dataset was gathered, including 40 data points from more than 100 different publications. The organic linkers, metal ions, and the functional groups, as well as the surface area and the pore volume of the investigated MOFs, were chosen as the model’s inputs, and the output was the ibuprofen (IBU) loading capacity. Thereafter, various advanced and powerful machine learning algorithms, such as support vector regression (SVR), random forest (RF), adaptive boosting (AdaBoost), and categorical boosting (CatBoost), were employed to predict the ibuprofen loading capacity of MOFs. The coefficient of determination (R2) of 0.70, 0.72, 0.66, and 0.76 were obtained for the SVR, RF, AdaBoost, and CatBoost approaches, respectively. Among all the algorithms, CatBoost was the most reliable, exhibiting superior performance regarding the sparse matrices and categorical features. Shapley additive explanations (SHAP) analysis was employed to explore the impact of the eigenvalues of the model’s outputs. Our initial results indicate that this methodology is a well generalized, straightforward, and cost-effective method that can be applied not only for the prediction of IBU loading capacity, but also in many other biomaterials projects

    How Early-Life Gut Microbiota Alteration Sets Trajectories for Health and Inflammatory Bowel Disease?

    Get PDF
    Inflammatory bowel disease (IBD) is a recurrent chronic inflammatory condition of the intestine without any efficient therapeutic regimens. Gut microbiota, which plays an instrumental role in the development and maturation of the immune system, has been implicated in the pathogenesis of IBD. Emerging evidence has established that early-life events particularly maternal influences and antibiotic treatment are strongly correlated with the health or susceptibility to disease of an individual in later life. Thus, it is proposed that there is a critical period in infancy, during which the environmental exposures bestow a long-term pathophysiological imprint. This notion sheds new light on the development of novel approaches for the treatment, i.e., early interventions, more precisely, the prevention of many uncurable chronic inflammatory diseases like IBD. In this review, we have integrated current evidence to describe the feasibility of the "able-to-be-regulated microbiota," summarized the underlying mechanisms of the "microbiota-driven immune system education," explored the optimal intervention time window, and discussed the potential of designing early-probiotic treatment as a new prevention strategy for IBD.Feilong Guo and Demin Cai share first authorship.Correction in: Frontiers in Nutrition, Volume 8, Article Number 760443, DOI 10.3389/fnut.2021.760443</p

    Polychromatic full-polarization control in mid-infrared light

    No full text
    Abstract Objects with different shapes, materials and temperatures can emit distinct polarizations and spectral information in mid-infrared band, which provides a unique signature in the transparent window for object identification. However, the crosstalk among various polarization and wavelength channels prevents from accurate mid-infrared detections at high signal-to-noise ratio. Here, we report full-polarization metasurfaces to break the inherent eigen-polarization constraint over the wavelengths in mid-infrared. This recipe enables to select arbitrary orthogonal polarization basis at individual wavelength independently, therefore alleviating the crosstalk and efficiency degradation. A six-channel all-silicon metasurface is specifically presented to project focused mid-infrared light to distinct positions at three wavelengths, each with a pair of arbitrarily chosen orthogonal polarizations. An isolation ratio of 117 between neighboring polarization channels is experimentally recorded, exhibiting detection sensitivity one order of magnitude higher than existing infrared detectors. Remarkably, the high aspect ratio ~30 of our meta-structures manufactured by deep silicon etching technology at temperature −150 °C guarantees the large and precise phase dispersion control over a broadband from 3 to 4.5 μm. We believe our results would benefit the noise-immune mid-infrared detections in remote sensing and space-to-ground communications
    corecore