122 research outputs found

    Comparison for Improvements of Singing Voice Detection System Based on Vocal Separation

    Full text link
    Singing voice detection is the task to identify the frames which contain the singer vocal or not. It has been one of the main components in music information retrieval (MIR), which can be applicable to melody extraction, artist recognition, and music discovery in popular music. Although there are several methods which have been proposed, a more robust and more complete system is desired to improve the detection performance. In this paper, our motivation is to provide an extensive comparison in different stages of singing voice detection. Based on the analysis a novel method was proposed to build a more efficiently singing voice detection system. In the proposed system, there are main three parts. The first is a pre-process of singing voice separation to extract the vocal without the music. The improvements of several singing voice separation methods were compared to decide the best one which is integrated to singing voice detection system. And the second is a deep neural network based classifier to identify the given frames. Different deep models for classification were also compared. The last one is a post-process to filter out the anomaly frame on the prediction result of the classifier. The median filter and Hidden Markov Model (HMM) based filter as the post process were compared. Through the step by step module extension, the different methods were compared and analyzed. Finally, classification performance on two public datasets indicates that the proposed approach which based on the Long-term Recurrent Convolutional Networks (LRCN) model is a promising alternative.Comment: 15 page

    Music Artist Classification with WaveNet Classifier for Raw Waveform Audio Data

    Full text link
    Models for music artist classification usually were operated in the frequency domain, in which the input audio samples are processed by the spectral transformation. The WaveNet architecture, originally designed for speech and music generation. In this paper, we propose an end-to-end architecture in the time domain for this task. A WaveNet classifier was introduced which directly models the features from a raw audio waveform. The WaveNet takes the waveform as the input and several downsampling layers are subsequent to discriminate which artist the input belongs to. In addition, the proposed method is applied to singer identification. The model achieving the best performance obtains an average F1 score of 0.854 on benchmark dataset of Artist20, which is a significant improvement over the related works. In order to show the effectiveness of feature learning of the proposed method, the bottleneck layer of the model is visualized.Comment: 12 page

    Analysis of HER2 Gene Amplification and Certain Prognostic Factors in Breast Cancer

    Get PDF
    Objective: The HER2 gene amplification and certain prognostic factors in breast cancer were analyzed. Method: The gene amplification and protein expression of human epidermal growth factor receptor in 100 breast cancer tissues detected by FISH and IHC detection method in the hospital from January 2020 to December 2021 were analyzed. To analyze some breast cancer prognostic factors. Result: 0 is 8 cases of HER-2 protein breast cancer, (1+) is 11 cases, (2+) is 49 cases, (3+) is 32 cases. The HER2 gene was amplified in 49 cases, of which 23 cases showed red signals in clusters, and 26 cases showed red signals in dots. 51 cases of HER-2 gene were not amplified. There are differences in the detection results of FISH and IHC detection methods (P>0.05). ER, PR and polysomy of chromosome 17 are prognostic factors associated with HER2 gene amplification in certain breast cancers. (P<0.05) Conclusion: To analyze the HER2 gene amplification in breast cancer and targeted select FISH and IHC detection methods can improve the therapeutic effect and prognostic factor, which deserves clinical attention

    Mechanical deformation mechanism and verification of sections at junctions of light and dark tunnel in a mountain area

    Get PDF
    Projects involving junctions of light and dark tunnel in mountainous areas are complex engineering problems that combine tunnel structure, slope rock-soil mass and protection projects. Such junctions suffer from a complex and changeable load. The stress and deformation of the junction varies under different conditions. Thus, it is a major source of inconvenience for construction and monitoring operations. In this paper, according to the load conditions at a junction of light and dark tunnel, we divide the junction hole into thrust, compression, and combined thrust-compression types. Three types of structures were simulated by numerical analysis, and we explored the structural deformation and stress of these types of tunnel under different condition. Thus, in any construction process, the mechanical deformation mechanism and the weak point in the structure should be worked out. Based on the weak parts, some monitoring points were installed, and four fields for monitoring were chosen. The monitoring results show that the actual deformation, stress and structural failure location are basically consistent with the numerical simulation results. The deformation mechanism of light and dark tunnel junction obtained can provide the basis for selecting the treatment measures and controlling the structural deformation. Furthermore, the results can also be used as a reference for similar engineering design, construction and site monitoring projects

    YOLObile: Real-Time Object Detection on Mobile Devices via Compression-Compilation Co-Design

    Full text link
    The rapid development and wide utilization of object detection techniques have aroused attention on both accuracy and speed of object detectors. However, the current state-of-the-art object detection works are either accuracy-oriented using a large model but leading to high latency or speed-oriented using a lightweight model but sacrificing accuracy. In this work, we propose YOLObile framework, a real-time object detection on mobile devices via compression-compilation co-design. A novel block-punched pruning scheme is proposed for any kernel size. To improve computational efficiency on mobile devices, a GPU-CPU collaborative scheme is adopted along with advanced compiler-assisted optimizations. Experimental results indicate that our pruning scheme achieves 14×\times compression rate of YOLOv4 with 49.0 mAP. Under our YOLObile framework, we achieve 17 FPS inference speed using GPU on Samsung Galaxy S20. By incorporating our proposed GPU-CPU collaborative scheme, the inference speed is increased to 19.1 FPS, and outperforms the original YOLOv4 by 5×\times speedup. Source code is at: \url{https://github.com/nightsnack/YOLObile}

    Contrastive Latent Space Reconstruction Learning for Audio-Text Retrieval

    Full text link
    Cross-modal retrieval (CMR) has been extensively applied in various domains, such as multimedia search engines and recommendation systems. Most existing CMR methods focus on image-to-text retrieval, whereas audio-to-text retrieval, a less explored domain, has posed a great challenge due to the difficulty to uncover discriminative features from audio clips and texts. Existing studies are restricted in the following two ways: 1) Most researchers utilize contrastive learning to construct a common subspace where similarities among data can be measured. However, they considers only cross-modal transformation, neglecting the intra-modal separability. Besides, the temperature parameter is not adaptively adjusted along with semantic guidance, which degrades the performance. 2) These methods do not take latent representation reconstruction into account, which is essential for semantic alignment. This paper introduces a novel audio-text oriented CMR approach, termed Contrastive Latent Space Reconstruction Learning (CLSR). CLSR improves contrastive representation learning by taking intra-modal separability into account and adopting an adaptive temperature control strategy. Moreover, the latent representation reconstruction modules are embedded into the CMR framework, which improves modal interaction. Experiments in comparison with some state-of-the-art methods on two audio-text datasets have validated the superiority of CLSR.Comment: Accepted by The 35th IEEE International Conference on Tools with Artificial Intelligence. (ICTAI 2023

    Multimodal Wearable Intelligence for Dementia Care in Healthcare 4.0: A Survey

    Get PDF
    As a new revolution of Ubiquitous Computing and Internet of Things, multimodal wearable intelligence technique is rapidly becoming a new research topic in both academic and industrial fields. Owning to the rapid spread of wearable and mobile devices, this technique is evolving healthcare from traditional hub-based systems to more personalised healthcare systems. This trend is well-aligned with recent Healthcare 4.0 which is a continuous process of transforming the entire healthcare value chain to be preventive, precise, predictive and personalised, with significant benefits to elder care. But empowering the utility of multimodal wearable intelligence technique for elderly care like people with dementia is significantly challenging considering many issues, such as shortage of cost-effective wearable sensors, heterogeneity of wearable devices connected, high demand for interoperability, etc. Focusing on these challenges, this paper gives a systematic review of advanced multimodal wearable intelligence technologies for dementia care in Healthcare 4.0. One framework is proposed for reviewing the current research of wearable intelligence, and key enabling technologies, major applications, and successful case studies in dementia care, and finally points out future research trends and challenges in Healthcare 4.0
    corecore