302 research outputs found

    Towards Joint Modeling of Dialogue Response and Speech Synthesis based on Large Language Model

    Full text link
    This paper explores the potential of constructing an AI spoken dialogue system that "thinks how to respond" and "thinks how to speak" simultaneously, which more closely aligns with the human speech production process compared to the current cascade pipeline of independent chatbot and Text-to-Speech (TTS) modules. We hypothesize that Large Language Models (LLMs) with billions of parameters possess significant speech understanding capabilities and can jointly model dialogue responses and linguistic features. We conduct two sets of experiments: 1) Prosodic structure prediction, a typical front-end task in TTS, demonstrating the speech understanding ability of LLMs, and 2) Further integrating dialogue response and a wide array of linguistic features using a unified encoding format. Our results indicate that the LLM-based approach is a promising direction for building unified spoken dialogue systems

    Learning to Auto Weight: Entirely Data-driven and Highly Efficient Weighting Framework

    Full text link
    Example weighting algorithm is an effective solution to the training bias problem, however, most previous typical methods are usually limited to human knowledge and require laborious tuning of hyperparameters. In this paper, we propose a novel example weighting framework called Learning to Auto Weight (LAW). The proposed framework finds step-dependent weighting policies adaptively, and can be jointly trained with target networks without any assumptions or prior knowledge about the dataset. It consists of three key components: Stage-based Searching Strategy (3SM) is adopted to shrink the huge searching space in a complete training process; Duplicate Network Reward (DNR) gives more accurate supervision by removing randomness during the searching process; Full Data Update (FDU) further improves the updating efficiency. Experimental results demonstrate the superiority of weighting policy explored by LAW over standard training pipeline. Compared with baselines, LAW can find a better weighting schedule which achieves much more superior accuracy on both biased CIFAR and ImageNet.Comment: Accepted by AAAI 202

    Simulation of ultrasonic vibration in a liquid aluminum bath for sapphire surface modification

    Get PDF
    Ultrasonic vibration has been found to play a significant role in promoting surface nano-crystallization of sapphire in a liquid aluminum bath. And the distribution of the vibration field is critical in controlling the modification procedure. Here, distribution of the ultrasonic vibration in a liquid aluminum bath was investigated by finite element method (FEM). Effects of shape of the ultrasonic horn and distance between the horn and the sapphire plates were investigated. It was found that the ultrasonic vibration density is high in the area adjacent to the ultrasonic horn. The distance between the horn and the plates significantly influence the vibration distribution. And the vibration density decreased significantly at the liquid/solid interface, indicating obvious energy absorption there. Vibration energy grads can be formed on sapphire surface. And this phenomenon shall be used to achieve different aims

    A new study of multi-phase mass and heat transfer in natural gas hydrate reservoir with an embedded discrete fracture model

    Get PDF
    Acknowledgments The authors are grateful to the National Natural Science Foundation of China (51991365), China Geological Survey Project (No. DD20211350), and Guangdong Major Project of Basic and Applied Basic Research (No. 2020B0301030003).Peer reviewedPublisher PD

    Association of Lumican Gene with Susceptibility to Pathological Myopia in the Northern Han Ethnic Chinese

    Get PDF
    Pathological myopia is a severe hereditary ocular disease leading to blindness. It is urgent and very important to find the pathogenesis and therapy for this disease. The purpose of the study is to analyze sequences of lumican and decorin genes with pathological myopia(PM) and control subjects to verify the relationship between lumican, decorin genes and PM in Northern Han Chinese. We collected and analyzed the blood samples of 94 adults (including 12 pedigree cases and 82 sporadic cases) with PM and 90 controls in the northern Han ethnic Chinese. Genotyping was performed by direct sequencing after polymerase chain reaction(PCR) amplification and allele frequencies were tested for Hardy-Weinberg equilibrium. Univariate analysis revealed significant differences between two groups for three SNPs: rs3759223 (C → T) and rs17853500 (T → C) of the lumican gene and rs74419 (T → C) of decorin gene with (P < .05) for all their genotype distribution and allele frequency. There is no significant difference for incidence of these mutations between pedigree and sporadic group (P > .05). The results suggested that the sequence variants in 5′-regulatory region of lumican gene and 3'UTR of decorin gene were associated significantly with PM in Northern Han Chinese. Further studies are needed to confirm finally whether the two genes are the virulence genes of PM

    Asymmetric Transfer Hashing with Adaptive Bipartite Graph Learning

    Full text link
    Thanks to the efficient retrieval speed and low storage consumption, learning to hash has been widely used in visual retrieval tasks. However, existing hashing methods assume that the query and retrieval samples lie in homogeneous feature space within the same domain. As a result, they cannot be directly applied to heterogeneous cross-domain retrieval. In this paper, we propose a Generalized Image Transfer Retrieval (GITR) problem, which encounters two crucial bottlenecks: 1) the query and retrieval samples may come from different domains, leading to an inevitable {domain distribution gap}; 2) the features of the two domains may be heterogeneous or misaligned, bringing up an additional {feature gap}. To address the GITR problem, we propose an Asymmetric Transfer Hashing (ATH) framework with its unsupervised/semi-supervised/supervised realizations. Specifically, ATH characterizes the domain distribution gap by the discrepancy between two asymmetric hash functions, and minimizes the feature gap with the help of a novel adaptive bipartite graph constructed on cross-domain data. By jointly optimizing asymmetric hash functions and the bipartite graph, not only can knowledge transfer be achieved but information loss caused by feature alignment can also be avoided. Meanwhile, to alleviate negative transfer, the intrinsic geometrical structure of single-domain data is preserved by involving a domain affinity graph. Extensive experiments on both single-domain and cross-domain benchmarks under different GITR subtasks indicate the superiority of our ATH method in comparison with the state-of-the-art hashing methods

    Deep learning in crowd counting: A survey

    Get PDF
    Counting high-density objects quickly and accurately is a popular area of research. Crowd counting has significant social and economic value and is a major focus in artificial intelligence. Despite many advancements in this field, many of them are not widely known, especially in terms of research data. The authors proposed a three-tier standardised dataset taxonomy (TSDT). The Taxonomy divides datasets into small-scale, large-scale and hyper-scale, according to different application scenarios. This theory can help researchers make more efficient use of datasets and improve the performance of AI algorithms in specific fields. Additionally, the authors proposed a new evaluation index for the clarity of the dataset: average pixel occupied by each object (APO). This new evaluation index is more suitable for evaluating the clarity of the dataset in the object counting task than the image resolution. Moreover, the authors classified the crowd counting methods from a data-driven perspective: multi-scale networks, single-column networks, multi-column networks, multi-task networks, attention networks and weak-supervised networks and introduced the classic crowd counting methods of each class. The authors classified the existing 36 datasets according to the theory of three-tier standardised dataset taxonomy and discussed and evaluated these datasets. The authors evaluated the performance of more than 100 methods in the past five years on different levels of popular datasets. Recently, progress in research on small-scale datasets has slowed down. There are few new datasets and algorithms on small-scale datasets. The studies focused on large or hyper-scale datasets appear to be reaching a saturation point. The combined use of multiple approaches began to be a major research direction. The authors discussed the theoretical and practical challenges of crowd counting from the perspective of data, algorithms and computing resources. The field of crowd counting is moving towards combining multiple methods and requires fresh, targeted datasets. Despite advancements, the field still faces challenges such as handling real-world scenarios and processing large crowds in real-time. Researchers are exploring transfer learning to overcome the limitations of small datasets. The development of effective algorithms for crowd counting remains a challenging and important task in computer vision and AI, with many opportunities for future research.BHF, AA/18/3/34220Hope Foundation for Cancer Research, RM60G0680GCRF, P202PF11;Sino‐UK Industrial Fund, RP202G0289LIAS, P202ED10, P202RE969Data Science Enhancement Fund, P202RE237Sino‐UK Education Fund, OP202006Fight for Sight, 24NN201Royal Society International Exchanges Cost Share Award, RP202G0230MRC, MC_PC_17171BBSRC, RM32G0178B
    corecore