122 research outputs found
Comparison for Improvements of Singing Voice Detection System Based on Vocal Separation
Singing voice detection is the task to identify the frames which contain the
singer vocal or not. It has been one of the main components in music
information retrieval (MIR), which can be applicable to melody extraction,
artist recognition, and music discovery in popular music. Although there are
several methods which have been proposed, a more robust and more complete
system is desired to improve the detection performance. In this paper, our
motivation is to provide an extensive comparison in different stages of singing
voice detection. Based on the analysis a novel method was proposed to build a
more efficiently singing voice detection system. In the proposed system, there
are main three parts. The first is a pre-process of singing voice separation to
extract the vocal without the music. The improvements of several singing voice
separation methods were compared to decide the best one which is integrated to
singing voice detection system. And the second is a deep neural network based
classifier to identify the given frames. Different deep models for
classification were also compared. The last one is a post-process to filter out
the anomaly frame on the prediction result of the classifier. The median filter
and Hidden Markov Model (HMM) based filter as the post process were compared.
Through the step by step module extension, the different methods were compared
and analyzed. Finally, classification performance on two public datasets
indicates that the proposed approach which based on the Long-term Recurrent
Convolutional Networks (LRCN) model is a promising alternative.Comment: 15 page
Music Artist Classification with WaveNet Classifier for Raw Waveform Audio Data
Models for music artist classification usually were operated in the frequency
domain, in which the input audio samples are processed by the spectral
transformation. The WaveNet architecture, originally designed for speech and
music generation. In this paper, we propose an end-to-end architecture in the
time domain for this task. A WaveNet classifier was introduced which directly
models the features from a raw audio waveform. The WaveNet takes the waveform
as the input and several downsampling layers are subsequent to discriminate
which artist the input belongs to. In addition, the proposed method is applied
to singer identification. The model achieving the best performance obtains an
average F1 score of 0.854 on benchmark dataset of Artist20, which is a
significant improvement over the related works. In order to show the
effectiveness of feature learning of the proposed method, the bottleneck layer
of the model is visualized.Comment: 12 page
Analysis of HER2 Gene Amplification and Certain Prognostic Factors in Breast Cancer
Objective: The HER2 gene amplification and certain prognostic factors in breast cancer were analyzed. Method: The gene amplification and protein expression of human epidermal growth factor receptor in 100 breast cancer tissues detected by FISH and IHC detection method in the hospital from January 2020 to December 2021 were analyzed. To analyze some breast cancer prognostic factors. Result: 0 is 8 cases of HER-2 protein breast cancer, (1+) is 11 cases, (2+) is 49 cases, (3+) is 32 cases. The HER2 gene was amplified in 49 cases, of which 23 cases showed red signals in clusters, and 26 cases showed red signals in dots. 51 cases of HER-2 gene were not amplified. There are differences in the detection results of FISH and IHC detection methods (P>0.05). ER, PR and polysomy of chromosome 17 are prognostic factors associated with HER2 gene amplification in certain breast cancers. (P<0.05) Conclusion: To analyze the HER2 gene amplification in breast cancer and targeted select FISH and IHC detection methods can improve the therapeutic effect and prognostic factor, which deserves clinical attention
Mechanical deformation mechanism and verification of sections at junctions of light and dark tunnel in a mountain area
Projects involving junctions of light and dark tunnel in mountainous areas are complex engineering problems that combine tunnel structure, slope rock-soil mass and protection projects. Such junctions suffer from a complex and changeable load. The stress and deformation of the junction varies under different conditions. Thus, it is a major source of inconvenience for construction and monitoring operations. In this paper, according to the load conditions at a junction of light and dark tunnel, we divide the junction hole into thrust, compression, and combined thrust-compression types. Three types of structures were simulated by numerical analysis, and we explored the structural deformation and stress of these types of tunnel under different condition. Thus, in any construction process, the mechanical deformation mechanism and the weak point in the structure should be worked out. Based on the weak parts, some monitoring points were installed, and four fields for monitoring were chosen. The monitoring results show that the actual deformation, stress and structural failure location are basically consistent with the numerical simulation results. The deformation mechanism of light and dark tunnel junction obtained can provide the basis for selecting the treatment measures and controlling the structural deformation. Furthermore, the results can also be used as a reference for similar engineering design, construction and site monitoring projects
YOLObile: Real-Time Object Detection on Mobile Devices via Compression-Compilation Co-Design
The rapid development and wide utilization of object detection techniques
have aroused attention on both accuracy and speed of object detectors. However,
the current state-of-the-art object detection works are either
accuracy-oriented using a large model but leading to high latency or
speed-oriented using a lightweight model but sacrificing accuracy. In this
work, we propose YOLObile framework, a real-time object detection on mobile
devices via compression-compilation co-design. A novel block-punched pruning
scheme is proposed for any kernel size. To improve computational efficiency on
mobile devices, a GPU-CPU collaborative scheme is adopted along with advanced
compiler-assisted optimizations. Experimental results indicate that our pruning
scheme achieves 14 compression rate of YOLOv4 with 49.0 mAP. Under our
YOLObile framework, we achieve 17 FPS inference speed using GPU on Samsung
Galaxy S20. By incorporating our proposed GPU-CPU collaborative scheme, the
inference speed is increased to 19.1 FPS, and outperforms the original YOLOv4
by 5 speedup. Source code is at:
\url{https://github.com/nightsnack/YOLObile}
Contrastive Latent Space Reconstruction Learning for Audio-Text Retrieval
Cross-modal retrieval (CMR) has been extensively applied in various domains,
such as multimedia search engines and recommendation systems. Most existing CMR
methods focus on image-to-text retrieval, whereas audio-to-text retrieval, a
less explored domain, has posed a great challenge due to the difficulty to
uncover discriminative features from audio clips and texts. Existing studies
are restricted in the following two ways: 1) Most researchers utilize
contrastive learning to construct a common subspace where similarities among
data can be measured. However, they considers only cross-modal transformation,
neglecting the intra-modal separability. Besides, the temperature parameter is
not adaptively adjusted along with semantic guidance, which degrades the
performance. 2) These methods do not take latent representation reconstruction
into account, which is essential for semantic alignment. This paper introduces
a novel audio-text oriented CMR approach, termed Contrastive Latent Space
Reconstruction Learning (CLSR). CLSR improves contrastive representation
learning by taking intra-modal separability into account and adopting an
adaptive temperature control strategy. Moreover, the latent representation
reconstruction modules are embedded into the CMR framework, which improves
modal interaction. Experiments in comparison with some state-of-the-art methods
on two audio-text datasets have validated the superiority of CLSR.Comment: Accepted by The 35th IEEE International Conference on Tools with
Artificial Intelligence. (ICTAI 2023
Multimodal Wearable Intelligence for Dementia Care in Healthcare 4.0: A Survey
As a new revolution of Ubiquitous Computing and Internet of Things, multimodal wearable intelligence technique is rapidly becoming a new research topic in both academic and industrial fields. Owning to the rapid spread of wearable and mobile devices, this technique is evolving healthcare from traditional hub-based systems to more personalised healthcare systems. This trend is well-aligned with recent Healthcare 4.0 which is a continuous process of transforming the entire healthcare value chain to be preventive, precise, predictive and personalised, with significant benefits to elder care. But empowering the utility of multimodal wearable intelligence technique for elderly care like people with dementia is significantly challenging considering many issues, such as shortage of cost-effective wearable sensors, heterogeneity of wearable devices connected, high demand for interoperability, etc. Focusing on these challenges, this paper gives a systematic review of advanced multimodal wearable intelligence technologies for dementia care in Healthcare 4.0. One framework is proposed for reviewing the current research of wearable intelligence, and key enabling technologies, major applications, and successful case studies in dementia care, and finally points out future research trends and challenges in Healthcare 4.0
- …