157 research outputs found

    The Royalflush System for VoxCeleb Speaker Recognition Challenge 2022

    Full text link
    In this technical report, we describe the Royalflush submissions for the VoxCeleb Speaker Recognition Challenge 2022 (VoxSRC-22). Our submissions contain track 1, which is for supervised speaker verification and track 3, which is for semi-supervised speaker verification. For track 1, we develop a powerful U-Net-based speaker embedding extractor with a symmetric architecture. The proposed system achieves 2.06% in EER and 0.1293 in MinDCF on the validation set. Compared with the state-of-the-art ECAPA-TDNN, it obtains a relative improvement of 20.7% in EER and 22.70% in MinDCF. For track 3, we employ the joint training of source domain supervision and target domain self-supervision to get a speaker embedding extractor. The subsequent clustering process can obtain target domain pseudo-speaker labels. We adapt the speaker embedding extractor using all source and target domain data in a supervised manner, where it can fully leverage both domain information. Moreover, clustering and supervised domain adaptation can be repeated until the performance converges on the validation set. Our final submission is a fusion of 10 models and achieves 7.75% EER and 0.3517 MinDCF on the validation set

    Multi-frame image restoration method for novel rotating synthetic aperture imaging system

    Get PDF
    Abstract The novel rotating synthetic aperture (RSA) optical imaging system is an important development direction for future high-resolution optical remote sensing satellites in geostationary orbit. However, owing to the rotating rectangular pupil, the point spread function of the RSA system has an asymmetric spatial distribution, and the images obtained using the primary mirror from different rotation angles have nonuniform blur degradation. Moreover, platform vibration and pupil rotation have coupling effects on the RSA imaging, resulting in further radiometric and geometric quality degradation. To address these problems, the image degradation characteristics are first analyzed according to the imaging mechanism. Then, combined with the theory of mutual information, an image registration method is suggested by introducing the orientation gradient information. From this, a multi-frame image restoration model is proposed based on the directional gradient prior of the RSA system image. From the perspective of interpretation and application, when the aspect ratio is less than 3, the proposed inversion restoration method can achieve a satisfactory processing performance. This work can provide engineering application reference for the future space application of RSA imaging technology

    LE-SSL-MOS: Self-Supervised Learning MOS Prediction with Listener Enhancement

    Full text link
    Recently, researchers have shown an increasing interest in automatically predicting the subjective evaluation for speech synthesis systems. This prediction is a challenging task, especially on the out-of-domain test set. In this paper, we proposed a novel fusion model for MOS prediction that combines supervised and unsupervised approaches. In the supervised aspect, we developed an SSL-based predictor called LE-SSL-MOS. The LE-SSL-MOS utilizes pre-trained self-supervised learning models and further improves prediction accuracy by utilizing the opinion scores of each utterance in the listener enhancement branch. In the unsupervised aspect, two steps are contained: we fine-tuned the unit language model (ULM) using highly intelligible domain data to improve the correlation of an unsupervised metric - SpeechLMScore. Another is that we utilized ASR confidence as a new metric with the help of ensemble learning. To our knowledge, this is the first architecture that fuses supervised and unsupervised methods for MOS prediction. With these approaches, our experimental results on the VoiceMOS Challenge 2023 show that LE-SSL-MOS performs better than the baseline. Our fusion system achieved an absolute improvement of 13% over LE-SSL-MOS on the noisy and enhanced speech track. Our system ranked 1st and 2nd, respectively, in the French speech synthesis track and the challenge's noisy and enhanced speech track.Comment: accepted in IEEE-ASRU202

    Strong enhancement of photoresponsivity with shrinking the electrodes spacing in few layer GaSe photodetectors

    Full text link
    A critical challenge for the integration of the optoelectronics is that photodetectors have relatively poor sensitivities at the nanometer scale. It is generally believed that a large electrodes spacing in photodetectors is required to absorb sufficient light to maintain high photoresponsivity and reduce the dark current. However, this will limit the optoelectronic integration density. Through spatially resolved photocurrent investigation, we find that the photocurrent in metal-semiconductor-metal (MSM) photodetectors based on layered GaSe is mainly generated from the photoexcited carriers close to the metal-GaSe interface and the photocurrent active region is always close to the Schottky barrier with higher electrical potential. The photoresponsivity monotonically increases with shrinking the spacing distance before the direct tunneling happen, which was significantly enhanced up to 5,000 AW-1 for the bottom contacted device at bias voltage 8 V and wavelength of 410 nm. It is more than 1,700-fold improvement over the previously reported results. Besides the systematically experimental investigation of the dependence of the photoresponsivity on the spacing distance for both the bottom and top contacted MSM photodetectors, a theoretical model has also been developed to well explain the photoresponsivity for these two types of device configurations. Our findings realize shrinking the spacing distance and improving the performance of 2D semiconductor based MSM photodetectors simultaneously, which could pave the way for future high density integration of 2D semiconductor optoelectronics with high performances.Comment: 25 pages, 4 figure

    A Risk Prediction Model Based on Machine Learning for Cognitive Impairment Among Chinese Community-Dwelling Elderly People With Normal Cognition: Development and Validation Study

    Get PDF
    Background: Identifying cognitive impairment early enough could support timely intervention that may hinder or delay the trajectory of cognitive impairment, thus increasing the chances for successful cognitive aging.Objective: We aimed to build a prediction model based on machine learning for cognitive impairment among Chinese community-dwelling elderly people with normal cognition.Methods: A prospective cohort of 6718 older people from the Chinese Longitudinal Healthy Longevity Survey (CLHLS) register, followed between 2008 and 2011, was used to develop and validate the prediction model. Participants were included if they were aged 60 years or above, were community-dwelling elderly people, and had a cognitive Mini-Mental State Examination (MMSE) score >= 18. They were excluded if they were diagnosed with a severe disease (eg, cancer and dementia) or were living in institutions. Cognitive impairment was identified using the Chinese version of the MMSE. Several machine learning algorithms (random forest, XGBoost, naive Bayes, and logistic regression) were used to assess the 3-year risk of developing cognitive impairment. Optimal cutoffs and adjusted parameters were explored in validation data, and the model was further evaluated in test data. A nomogram was established to vividly present the prediction model.Results: The mean age of the participants was 80.4 years (SD 10.3 years), and 50.85% (3416/6718) were female. During a 3-year follow-up, 991 (14.8%) participants were identified with cognitive impairment. Among 45 features, the following four features were finally selected to develop the model: age, instrumental activities of daily living, marital status, and baseline cognitive function. The concordance index of the model constructed by logistic regression was 0.814 (95% CI 0.781-0.846). Older people with normal cognitive functioning having a nomogram score of less than 170 were considered to have a low 3-year risk of cognitive impairment, and those with a score of 170 or greater were considered to have a high 3-year risk of cognitive impairment.Conclusions: This simple and feasible cognitive impairment prediction model could identify community-dwelling elderly people at the greatest 3-year risk for cognitive impairment, which could help community nurses in the early identification of dementia

    A chromosomelevel genome assembly of the Asian arowana, Scleropages formosus

    Get PDF
    Asian arowana (Scleropages formosus), an ancient teleost belonging to the Order Osteoglossomorpha, has been a valuable ornamental fish with some varieties. However, its biological studies and breeding germplasm have been remarkably limited by the lack of a reference genome. To solve these problems, here we report high-quality genome sequences of three common varieties of Asian arowana (the golden, red and green arowana). We firstly generated a chromosome-level genome assembly of the golden arowana, on basis of the genetic linkage map constructed with the restriction site-associated DNA sequencing (RAD-seq). In addition, we obtained draft genome assemblies of the red and green varieties. Finally, we annotated 22,016, 21,256 and 21,524 protein-coding genes in the genome assemblies of golden, red and green varieties respectively. Our data were deposited in publicly accessible repositories to promote biological research and molecular breeding of Asian arowana

    Beam test of a 180 nm CMOS Pixel Sensor for the CEPC vertex detector

    Full text link
    The proposed Circular Electron Positron Collider (CEPC) imposes new challenges for the vertex detector in terms of pixel size and material budget. A Monolithic Active Pixel Sensor (MAPS) prototype called TaichuPix, based on a column drain readout architecture, has been developed to address the need for high spatial resolution. In order to evaluate the performance of the TaichuPix-3 chips, a beam test was carried out at DESY II TB21 in December 2022. Meanwhile, the Data Acquisition (DAQ) for a muti-plane configuration was tested during the beam test. This work presents the characterization of the TaichuPix-3 chips with two different processes, including cluster size, spatial resolution, and detection efficiency. The analysis results indicate the spatial resolution better than 5 μm\mu m and the detection efficiency exceeds 99.5 % for both TaichuPix-3 chips with the two different processes
    corecore